Experimenting with Hadoop
Thanks to our web archiving team (who lead the uk web archive project), I was given a day of training on using Hadoop today. I was already fairly familiar with the map-reduce and HDFS architecture, but I’d not had a chance to actually develop a map-reduce task or run one on a real cluster with some real data. Today’s training gave me that chance, and I’m really pleased the results…
Read More
Posted: 2010-12-14
Data Mining
Web Archives
Digital Preservation
OPF Blog: Breaking Down The Format Registry
A new OPF blog entry: Breaking Down The Format Registry. Reproduced below the fold.
Read More
Posted: 2010-12-09
Digital Preservation
OPF Blog: In the room
A new OPF blog entry: In the room. Reproduced below…
Read More
Posted: 2010-11-17
Development
Digital Preservation
OPF Blog: Community and code
As well as blogging about digital preservation here, I’ve also got a blog on the Open Planets Foundation website where I’ll post about OPF issues. I’ve just posted my first blog entry there: Community and code. Reproduced below the fold.
Read More
Posted: 2010-08-20
Development
Digital Preservation
convergent evolution of curation services
I don’t know if it was coincidence, or by design, but the latest issue of the International Journal of Digital Curation contains two very similar articles. One is from the Planets Project, is co-authored by me, and is called A Framework for Distributed Preservation Workflows. The other is called An Emergent Micro-Services Approach to Digital Curation Infrastructure, and is by Stephen Abrams et al from CDL. These two separate papers describe two approaches to building preservation systems, which despite having been developed independently appear to be converging towards a single, consistent design.
Read More
Posted: 2010-07-27
Development
Digital Preservation
digital preservation news aggregation
Inspired by the excellent @dhnow social news service, I’ve set up a news aggregator for digital preservation. The @digipresnews account follows people involved in digital preservation, and I use the Twitter Tim.es service to extract the top links of interest. This also provides an RSS feed, which is piped back into the Twitter account via twitterfeed.
Of course, this type of aggregation is only as good as its sources, so if I’ve missed a digital preservation tweeter, or included someone you think I shouldn’t have done, contact me @here or here.
Anj
Read More
Posted: 2010-07-22
Digital Preservation