The Zombie Stack Exchanges That Just Won't Die
Which tools are you using for identification of epub or mobi formats?
For my private library I save every book I've bought (either epub or mobi), removing drm. I would keep also detailed metadata about these files.
Fido, the tool I prefer, has some issues. Either jhove or fits are recognizing these files as bitstreams only. The only working tool seems to be epubcheck, but only for epub files.
raffaele messuti
During last year's file format ID hack, the British Library team came up with some Apache Tika signatures for some eBook formats (you can find them in this magic file). Although set up on Tika, these should be easy to port to Fido.
I tend to use the command line. All modern computers except for those
running Windows have the file tool which can be used like this:
$ file *.epub *.mobi
Natural Language Processing with Python - Steven Bird.epub: EPUB ebook data
pg8086.mobi: Mobipocket E-book "Down_and_Out-_Magic_Kingdom"
Here, I used file to identify the format of all files that have the
.epub or .mobi file extension, but I could have used the asterisk
alone to identify all non-hidden files in the current directory. So in
this little experiment, file successfully identified the two e-book
formats, and for the mobi(pocket) format, it was able to extract the
title (or a short form of it).