Digital Preservation: Lessons Learned?

I find working in digital preservation fascinating.

It’s not where I expected to end up. I started off interested in computing and science, and happened to find out about what was then a fairly young MPhys degree in Computation Physics offered by the University of York¹. I then did a Ph.D. in Computational Physics at Edinburgh University, working in statistical physics. After that, I spent my time oscillating between being a post-graduate researcher who used large-scale computational methods, and being a computational specialist who helped other scientists make use of those kinds of techniques.

I’d decided to move away from research and get a ’normal’ industry programmer job, so when we moved to Leeds I applied for a few different positions. One of them turned out to be for the PLANETS Project, based at the British Library. I liked the place and the people, and the work sounded interesting, allowing me to expand my previous experience (not just in computation, but also the information theory that underlies statistical physics) to a new field. And Industry was spared my woolly ways.

I spent a happy few years working on the PLANETS Project and helping kick-off the follow-on SCAPE Project. I saw ‘Significant Properties’ peak, only to be fatally wounded shortly afterwards, learned that DSHR is usually right, found that it’s fun to mash bits, and that keeping the bits safe ain’t as easy as I thought (even with these folks around).

But I also grew frustrated with working on digital preservation in the abstract. The research work was fun and challenging, but the gap between that and helping these old institutions navigate the digital turn seemed too vast. This same schism seemed to cause tension within the digital preservation community, for example during iPres conferences, where I’d hear completely opposite ideas as to what the conference was really for. One group, more academic in composition, was drawn to the ‘big picture’ of very long time-scales, worst-cases, and grand solutions. Another group featured a higher concentration of individuals caring for digital collections that needed better preservation now, and needed to share research results and best practices about what they should be doing.

I felt like I’d spent too much time on the ‘grand solutions’ side of things, and so when the role of technical lead for the UK Web Archive came up I, leapt at the chance. Being responsible for the actual digital preservation of a large, complex collection of national importance would surely help focus the mind? Indeed it has.

My career trajectory might be unusual, but because the field of digital preservation is all interface, there’s a wide range of people from a lot of different backgrounds working in this nexus between the cultural and the technical. This range is powerful, but also ripe for miscommunication, and I worry we fail to learn from our own history. Sometimes it feels like the same mistakes are being made over and over again, and I can’t tell if we are failing to pool our knowledge, or if each of us needs to fall many times before we can learn to walk, and then to run(.exe)².

I feel like I’ve learned a lot in my decade of digital preservation, and I want to know if what I’ve learned might help others. I’d also like to know whether or not my own theories and opinions are worth the bytes and/or neurons they are encoded in. To this end, I’m going to try to use this blog series as a way of communicating the things I think I’ve learned over the years, in the hope that others find it useful. Or at least interestingly wrong.

Sometime between 1997 and 2004 it had become ‘Physics with Computer Simulation’, but then by 2006 it had been merged with the theory course, becoming ‘Theoretical & Computational Physics’. This didn’t last long, and later that year had been fully subsumed into the ‘Theoretical Physics’ course. These days, all physicist are at least somewhat computational, and almost all theorists use computational methods of some sort, even if they don’t all use simulation. ↩︎
Sorrynotsorry ↩︎