The Zombie Stack Exchanges That Just Won't Die
Encrypting content could help ease fears about unauthorized access (say to copyrighted materials or sensitive information) but it results in a dependency on encryption keys. I would be curious to know what people see as the pros and cons of encrypting content a stewardship organization is preserving. With the pros and cons in mind, when (if ever) do you think a responsible organization should be encrypting files they are preserving and when do you think they shouldn't?
Trevor Owens
Pros:
Cons:
Increased vulnerability to bit rot - Bit rot in an encrypted object would result in severe loss, since the object could only be decrypted up to the bit loss or it would be completely illegible. This is mitigated by a good fixity check and backup system.
Maintaining the keys - Using keys to access contents introduces more dependencies for continued access to the object. This is mitigated by a good fixity check and backup system and recording the relationship in metadata.
Encryptions getting cracked - AES is a very strong encryption standard; however, because it's a math problem, mathematicians are constantly trying to create faster-than-brute-force solutions. Public-key encryption might be stronger against brute force attacks, but quantum computers would render it trivial. Once any encryption is cracked, all restricted files would have to be re-encrypted. (Maybe with quantum encryption.)
Personally, encryption for restricted files is not appealing. It increases the repository's exposure to catastrophic losses and the demands on the repository's internal and external monitoring processes. I would prefer to restrict access with locked-down terminals in controlled locations, strong user authentication requirements for remote access, or other solutions.
An intermediary solution might be the iTunes solution. Songs on its central server are encrypted, but the key is stored in the file. When you buy a song, this key is encrypted with a random key unique to your account.
Only in cases where information must remain protected (e.g. NSA servers) does a repository of encrypted data warrant the preservation risks.
Pros:
A reiteration of the above, which I think is generally agreed upon: strong encryption reduces worries associated with unauthorized access to preservation copies of materials (such as copyrighted data). This may in turn enable relatively insecure (read: "cheap" or "cloud") infrastructure to be used for the preservation of highly sensitive materials.
Encryption doubles as an authenticity check, and in fact, some encryption methods involve the creation of a digital signature that can be used for provenance or bit rot detection.
Cons:
Encryption causes file size bloat to the tune of 20-30%.
For light archives, encryption imparts a performance penalty for systems that need to extract the content from the preservation archive for access purposes.
Another re-iteration: long-term secure preservation of the encryption keys themselves is typically raised as a legitimate concern. Fundamentally this problem is the result of two conflicting requirements: that the encryption keys be held by as few entities as possible to maintain their security, but also that they be easily acquired in disaster scenarios. I suggest that digital preservation repositories can mitigate (yet not fully eliminate) this concern by developing a management system for the encryption keys that leverages the technology frameworks (such as that for maintaining multiple copies with integrity checks) and policy frameworks (such as robust succession plans) that they ostensibly should have in place by virtue of being qualified digital preservation repositories. One can imagine an architecture, for example, where encryption keys are safely stored within the repository itself and encrypted using a Shamir's Secret Sharing scheme that would require the consensual participation of any seven of thirteen parties named in the succession plan in order to obtain them.
Not-cons: :)
As with almost every other design aspect of digital preservation repositories, the use of encryption presents both utility and risk that should be carefully considered.
I recently have been doing some experimentation with encryption. I think at this moment that encryption is a useful and easy way to add one extra layer of protection, in the (rare) cases where this is felt necessary above the established level of security.
All security measures can lose their effectiveness over time, so all measures must be actively managed.
There is indeed a 'problem' of the management of the decryption keys. This problem should not be exaggerated. I have been looking for best practices for managing decryption keys, but up until now found no good reference for this. Any references welcome!
I can offer these points for consideration: - the decryption keys should be stored in way that is technically and logically sufficiently separated from the storage of the corresponding files, so that an intruder who achieves access to the encrypted files is not likely to have access to the decryption keys. This is a very important point and should be considered carefully and repeatedly (external security audit?). - the decryption keys should never be transported together with the related encrypted files - storage of the decryption keys does not / should not have a higher level of security as the storage of the encrypted files. The reason for this is that the organization is probably not familiar with higher levels of security, and higher levels of security introduce risks (too few people informed; unfamiliar technical solutions; no security audits because it is so little data) - don't use generic decryption keys which apply to a lot of files
Whether these suggestion really apply depends very much on the scenario you have in mind. At this moment I am thinking of a small number of selected files in our archive which are encrypted.
A completely different scenario would be an extra copy of all the files in our archive stored somewhere in the cloud, with all the files having the same decryption key. In this scenario it is not the problem of the management of decryption keys, but the management of this single one decryption key. Probably known only within a small circle of technical supporters. A loss of the key is no problem, as long as it is discovered soon. In that case the level of added data security because of the extra copy of all the files is lowered until a new copy is produced. The same goes for the account information and the passwords you need to access the extra copy. A seed based on the file name and path can help here. If you don't trust the cloud environment all encrypting and decrypting should take place in the original data environment.