Lessons Learned

on Digital Preservation

Lessons Learned

Language as critical as it/dp boundsary is tense Format obsolescence is not as urgent and terrible as advertised Bit preservation is not solved We don’t need registries for truth We don’t need registries for what to do, we need each other. It takes a village…

Prescriptive v Descriptive Linguistics If we only accept valid PDFs, we are saying. We know best. We understand PDF and we know what’s worth preserving. In this analogy: We know what’s best. We understand the Queen’s English and any documents will need to be translated before we archive them HERE. Speculation on easy/’preservable’ formats and correlated use with different social/economic subgroups.

Every format has two faces/parsing is hard

When you press save, what happens? Lots of decision. It’s not just the document. What about the undo buffer? What about the window position? What about the language? These can be installation/user/file level, it depends.

Format standardisation is precisely a designated community coming together to determine which properties of a Digital (Performance) Object they want to preserve over time.

But constant work is required to generate the illusion of a stable image. Fifty times a second, the ULA reads the screen memory and sends the result out to the television screen, and this process is interlaced with the data loading as the whole thing hums along at 3.5MHz (about a thousand times slower than a modern machine).

OODT is closest thing to ref impl of OAIS

A light-weight pre-premis/mets would be very useful. Extend bagit with orthogonal data in linewise files

Open source and digital preservation

Poor cohesion of reading lists

More automation possibilities e.g. UI script Acrobat Reader to test, I.e. Simulate user interactions in

Validation, nope Validation, fast and thorough

How to help practitioners help?! http://anjackson.net/2016/06/08/frontiers-in-format-identification/#comment-2723081343

Re-deriving significant properties Note that OAIS Ingest as odd notion Normalisation as prejudice

It’s also been interesting to compare the web archiving community with the broader digital preservation community. There’s many familiar faces due to the strong overlap between the fields, but there’s also a stronger sense of a unified vision, a preference for practical results, and a more constructive colllaboration between researchers and content-holding organisations. On the flip-side, there is something of a silo effect, where web archive material is often poorly integrating into the bigger picture, both in the abstract (e.g. the bigger picture of digital preservation) and the concrete (e.g. we’re only just learning how to integrate web archives with our other holdings).

dd if=/dev/zero bs=1M count=1000 | openssl dgst -sha512
 
# dd if=/dev/zero bs=1M count=1000 | openssl dgst -sha512
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB) copied, 3.94019 s, 266 MB/s
(stdin)= a7d483bb9af2ca4b064420d1911d9116b6b609ca312fd7ed919fc1b8be7d1eb57c46f2a6f13380b6dc38f024d17442b4c7b8ecb8c121dc88227d588fc2e04297
# hdparm -tT /dev/sda1
/dev/sda1:
 Timing cached reads:   17696 MB in  2.00 seconds = 8861.16 MB/sec
 Timing buffered disk reads: 332 MB in  3.01 seconds = 110.42 MB/sec
[root@crawler06 python-shepherd]# dd if=/dev/zero bs=1M count=1000 > /dev/null
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB) copied, 0.0997883 s, 10.5 GB/s

MP3

Impossible Standards

c.f. DP benchmarking paper

Is it possible? i.e.

Even if is possible, it is feasible?

 

Fighting entropy since 1993

© Dr Andrew N. Jackson — CC-BY

Elsewhere

Contact