The Web Archive and the Catalogue
Abstract
The British Library has a long tradition of preserving the heritage of the United Kingdom, and processes for handling and cataloguing print-based media are deeply ingrained in the organisations structure and thinking. However, as an increasing number of government and other publications move towards online-only publication, we are force to revisit these processes and explore what needs to be changed in order to avoid the web archive becoming an massive, isolated silo, poorly integrated with other collection material. We have started this journey by looking at how we collect official documents, like government publications and e-journals. As we are already tasked with archiving UK web publications, the question is not so much ‘how to we collect these documents?’ rather ‘how to we find the documents we’ve already collected?’. Our current methods for combining curatorial expertise with machine-generated metadata will be discussed, leading to an outline of the lessons we have learned. Finally, we will explore how the ability to compare the library’s print catalogue data with the web archive enables us to study the steps institutions and organisations have taken as they have moved online.
Read More
Posted: 2017-06-28
Data Mining
Web Archives
Digital Preservation
webarchive-discovery
Revitalising the UK Web Archive
Originally published on the UK Web Archive blog on the 8th of June 2017.
It’s been over a year since we made our historical search system available, and it’s proven itself to be stable and useful. Since then, we’ve been largely focussed on changes to our crawl system, but we’ve also been planning how to take what we learned in the Big UK Domain Data for the Arts and Humanities project and use it to re-develop the UK Web Archive.
Our current website has not changed much since 2013, and doesn’t describe who we are and what we do now that the UK Legal Deposit regulations are in place. It only describes the sites we have crawled by permission, and does not reflect the tens of thousands of sites and URLs that we have curated and categorised under Legal Deposit, nor the billions of web pages in the full collection. To try to address these issues, we’re currently developing a new website that will open-up and refresh our archives.
One of the biggest challenges is the search index. The 3.5 billion resources we’ve indexed for SHINE represents less than a third of...
Read More
Posted: 2017-06-09
Data Mining
Web Archives
BUDDAH
webarchive-discovery
More than just a copy
Following my previous post, a tweet from Raffaele Messuti lead me to this quote:
“Computers, by their nature, copy. Typing this line, the computer has copied the text multiple times in a variety of memory registers. I touch a button to type a letter, this releases a voltage that is then translated into digital value, which is then copied into a memory buffer and sent to another part of the computer, copied again into RAM and sent to the graphics card where it is copied again, and so on. The entire operation of a computer is built around copying data: copying is one of the most essential characteristics of computer science. One of the ontological facts of digital storage is that there is no difference between a computer program, a video, mp3-song, or an e-book. They are all composed of voltage represented by ones and zeros. Therefore they are all subject to the same electronic fact: they exist to be copied and can only ever exist as copies.” From Radical Tactics of the Offline Library via an annotation by @atomotic.
Copying is indeed fundamental to how computers function, and we need to understand...
Read More
Posted: 2017-04-30
Digital Preservation
Keeping Codes
Lessons Learned
Access starts with 'Loading...'
So what was going on in our little experiment in data destruction? Well, to understand what happens when we open up digital files, I want to take you back to my childhood, back when ‘Loading…’ really meant something…
Read More
Posted: 2017-04-19
Digital Preservation
Keeping Codes
Lessons Learned
Unsafe Device Removal: The Results
Following my proposed experiment in data destruction, a few kind readers tried it out and let me know what happened. I’ve summarised the results below, to try and see if there’s any common pattern.
Read More
Posted: 2017-04-14
Digital Preservation
Keeping Codes
Lessons Learned
Unsafe Device Removal
Let’s start with an experiment…
Read More
Posted: 2017-04-10
Digital Preservation
Keeping Codes
Lessons Learned