My Format Identification Misunderstandings
As the ZIP scanning issue keeps getting updated, I realised I made some errors in my analysis of how DROID works.
Posted: 2023-03-22
As the ZIP scanning issue keeps getting updated, I realised I made some errors in my analysis of how DROID works.
Posted: 2023-03-22
Following on from the previous post, I was experimenting with Siegfried and found it to be even faster than I was expecting!
UPDATED: 2023-03-22
Posted: 2023-03-21
In the last few days, I’ve been going through the process of updating my Nanite wrapper for DROID, which I built to make it easier to re-use DROID’s identification engine in other contexts – especially in large-scale Hadoop jobs where we want to process every record in our WARCs.
UPDATED: 2023-03-22
Posted: 2023-03-21
A recent comment from the #DHNB2023 conference caught my eye…
If you just follow web archiving friendly standards, you’ll have a really boring website.
Which made me laugh. But it also made me think about how we archive websites now, and I realized this isn’t necessarily true anymore.
Posted: 2023-03-10
Usually we don’t let search engines index web archives. When sites are still live, we don’t want the archived version to intefere with the search results. And the ethical implications of making dead sites available via search engines are not clear. (In fact, unless I’m missing something, that seems like an area that needs more analysis and research.)
But in the case of the European Dialogue NGO, they have not only given us permission to make the site openly accessible, but also explicitly requested that we make it accessible to search engines.
Posted: 2023-03-09
This is a summary of what’s been going on since the update at the start of the autumn.
Posted: 2023-01-16Digital Preservation Lessons Learned 7
Websites (13) Travels (47) General (1) Development (7) Top Tips (4) Science (7) Rants (3) Top Links (2) Reviews (2) Visualisation (3) Digital Preservation (45) Procrastination (2) Data Mining (16) Open Access (1) Web Archives (35) Representation Information (2) Format Registry (4) SCAPE (3) webarchive-discovery (7) War Stories (1) Preservation Actions (2) BUDDAH (5) Publications (3) Digital Humanities (1) Collaboration (1) Keeping Codes (6) Lessons Learned (6) Reports (5)