I want to securely convey a PDF document containing sensitive information from my system to another system. The latter is operated by a different person, whom I trust. However, this person does not have a lot of knowledge about information security and cryptography and thus not very advanced security mechanisms in place, e. g. no FDE and I cannot really ensure that the disks in that system will be securely wiped before being discarded either.
I don't want to resort to public-key cryptography. (It's complex, difficult to review and often you will have to trust certificate authorities, that you cannot really verify either.) Shared secrets generated from key-grade material have already been established by secure means.
I plan to use an AES-256 encrypted container, (un)locked by a strong key file stored on external media (for the untrusted system - on my system it's just stored in a secure container on an FDE encrypted disk), for conveying the message, to ensure that it is not read while in transport. However, what I worry about is that the document itself will leak somewhere into temp space after being extracted from the container to be read by the viewing application (e. g. Adobe Reader). For this reason, I want to use encryption on the document-level as well and for this reason derived a document password from the pre-established key-grade material as well. (The material has enough entropy to derive multiple keys from it without them being related.)
My PDF generating software outputs a %PDF-1.4
file that says /Encrypt 64 0
in the trailer
.
Now when I look up the 64 0 obj
, I find a <</Filter/Standard/V 2/Length 128/R 3/O(...)/U(...)/P ....>>
directive. I had a look at the PDF specification and found that the encryption scheme that's used here (Version 2, Revision 3) is based on RC4 and MD5, both of which have known security flaws. (I know about the issues with the statistical biases and linear combinations of certain key stream bytes yielding statistical information about the key with RC4, etc.) How long do you think will that still be adequate to protect the information? (Bruce Schneier said there's probably already methods around that can "efficiently" break RC4, even though that looks like a pretty bold claim, considering that there have been weaknesses found but so far not very practical attacks mounted, meaning attacks that do not require things like millions of messages encrypted with similar keys for releated-key attacks.) Do you think my security policy is inadequate and I should convey the information by different means? If so, how? Like I said, the recipient is not a "crypto-nerd" and will blame me for any more key exchange or software to install, so apart from security, an "easy" and "usable" solution is really core here.
Also I've seen that a lot of structural information (PDF is basically a tree structure) is available unencrypted, even in an "encrypted" PDF document. Could that possibly leak important information? As far as I can see, information about the page structure, bounding boxes, checksums, font descriptors, transparency and position of certain objects, cross-reference tables, etc. are all plain-text, possibly along with a lot of metadata. However, every tool for PDF-analysis I've tried failed to process the file without the correct key supplied and said it couldn't do anything with it since it was encrypted. However, given that so much information is available in plain-text form, I'd think that a determined attacker could possibly find out a lot about the document structure even without knowing the key. Is there any "easy" way of finding out what could possibly be seen? I mean it's clear that, for example, the page dictionary is in plain-text, so an attacker could for example see how many pages the document has even without the encryption key. However, so far I haven't seen a PDF analysis tool reporting "this document has 50 pages" on an encrypted PDF document. They all failed when no correct password was supplied, even though this information is principally available without knowing the password and thus should be extracted and displaced by a "proper" analysis tool even without the file key supplied. Is there an easy way for seeing what information could be "extracted" from the file without supplying the correct encryption key?
Another thing that bothers me is that the streams are all compressed (/Filter/FlateDecode
). How can I actually be sure that they are all encrypted and not, for example, only the document root node is encrypted (remember the document is a tree structure) thus preventing the document from being "opened", however, leaving all information basically accessible, because the "leaf-nodes", which contain actual information, are "plain" (or rather only compressed, but not encrypted)? I know the PDF specification says that, in an encrypted PDF document, all streams must be encrypted. However, given that PDF is a very complex format, how can I actually trust software to follow that specification and actually encrypt all streams? After all, I cannot easily distinguish an encrypted stream from a compressed stream, since both are "random looking". All I could do is figure out the compression method and try to decompress the stream and look if if yields something useful, but that's a rather impractical solution. What is the easiest method to determine (in the sense of a "distinguisher") that each stream is in fact encrypted and not just compressed? Would a "partially encrypted" (some streams "plain", or compressed only, some encrypted) PDF somehow "fail to open" or otherwise "show" that it's not "completely encrypted" or would it just behave like any encrypted PDF in a viewer?
Are there other (potential) flaws and/or weaknesses in PDF security that I should be aware of? I know that the restrictions based on the "owner password" are easily bypassed and that you should not use a weak "owner password", since either the "user password" OR the "owner password" can be used, along with information from the file, to derive the symmetric encryption key that was used for the streams in the file. Anything else I should be aware of?