Format Identification

Contents

Format Identification#

subtitle: What’s in the box?

Understanding your digital collections.

I think this is really an experiment?

Binder

Ideas#

ML and confidence and my prior work on ID based on punctuation plus co-location? plus whole-words-or-tokens note that char-type-coloc could be shared publicly as no actual data remains and UI and a standard for format data, like a BagIt thing