The Zombie Stack Exchanges That Just Won't Die
Which tools are you using for identification of epub or mobi formats?
For my private library I save every book I've bought (either epub or mobi), removing drm. I would keep also detailed metadata about these files.
Fido, the tool I prefer, has some issues. Either jhove or fits are recognizing these files as bitstreams only. The only working tool seems to be epubcheck, but only for epub files.
raffaele messuti
During last year's file format ID hack, the British Library team came up with some Apache Tika signatures for some eBook formats (you can find them in this magic file). Although set up on Tika, these should be easy to port to Fido.
I tend to use the command line. All modern computers except for those
running Windows have the file
tool which can be used like this:
$ file *.epub *.mobi
Natural Language Processing with Python - Steven Bird.epub: EPUB ebook data
pg8086.mobi: Mobipocket E-book "Down_and_Out-_Magic_Kingdom"
Here, I used file
to identify the format of all files that have the
.epub
or .mobi
file extension, but I could have used the asterisk
alone to identify all non-hidden files in the current directory. So in
this little experiment, file
successfully identified the two e-book
formats, and for the mobi(pocket) format, it was able to extract the
title (or a short form of it).