Lessons Learned

Language as critical as it/dp boundsary is tense Format obsolescence is not as urgent and terrible as advertised Bit preservation is not solved We don’t need registries for truth We don’t need registries for what to do, we need each other. It takes a village…

But constant work is required to generate the illusion of a stable image. Fifty times a second, the ULA reads the screen memory and sends the result out to the television screen, and this process is interlaced with the data loading as the whole thing hums along at 3.5MHz (about a thousand times slower than a modern machine).

OODT is closest thing to ref impl of OAIS

A light-weight pre-premis/mets would be very useful. Extend bagit with orthogonal data in linewise files

Open source and digital preservation

Poor cohesion of reading lists

More automation possibilities e.g. UI script Acrobat Reader to test, I.e. Simulate user interactions in

Validation, nope Validation, fast and thorough

How to help practitioners help?! http://anjackson.net/2016/06/08/frontiers-in-format-identification/#comment-2723081343

Re-deriving significant properties Note that OAIS Ingest as odd notion Normalisation as prejudice

It’s also been interesting to compare the web archiving community with the broader digital preservation community. There’s many familiar faces due to the strong overlap between the fields, but there’s also a stronger sense of a unified vision, a preference for practical results, and a more constructive colllaboration between researchers and content-holding organisations. On the flip-side, there is something of a silo effect, where web archive material is often poorly integrating into the bigger picture, both in the abstract (e.g. the bigger picture of digital preservation) and the concrete (e.g. we’re only just learning how to integrate web archives with our other holdings).

dd if=/dev/zero bs=1M count=1000 | openssl dgst -sha512
# dd if=/dev/zero bs=1M count=1000 | openssl dgst -sha512
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB) copied, 3.94019 s, 266 MB/s
(stdin)= a7d483bb9af2ca4b064420d1911d9116b6b609ca312fd7ed919fc1b8be7d1eb57c46f2a6f13380b6dc38f024d17442b4c7b8ecb8c121dc88227d588fc2e04297
# hdparm -tT /dev/sda1
 Timing cached reads:   17696 MB in  2.00 seconds = 8861.16 MB/sec
 Timing buffered disk reads: 332 MB in  3.01 seconds = 110.42 MB/sec
[root@crawler06 python-shepherd]# dd if=/dev/zero bs=1M count=1000 > /dev/null
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB) copied, 0.0997883 s, 10.5 GB/s


Impossible Standards

c.f. DP benchmarking paper

Is it possible? i.e.

Even if is possible, it is feasible?


Fighting entropy since 1993

