The Zombie Stack Exchanges That Just Won't Die
I'm running some experiments with BagIt and noticing a surprising amount of hidden files being captured in my bags such as .ds_store, thumbs.db, and *.shs. Since these aren't essential files needed to understand the content of the bag, I'd rather not keep them around. Even worse, because some are generated systematically, my bags can become invalid when a thumbs.db or .ds-store sneaks in.
Is there a list of system files like thumbs.db that I can use to recursively search and delete before bagging?
How do I ensure that my bags aren't invalidated as they sit on a portable hard drive? (My planned Linux-based NAS is still a few months out.)
Nick Krabbenhoeft
It is definitely not a complete solution, but one approach could be to check the example .gitignore files here: https://github.com/github/gitignore/tree/master/Global
You can skim through some of them (which seem relevant to your problem) and compile yourself a list with common files and extensions for the different operating systems. It could be a good start to filter the undesired files.
Once you have stripped these files out, and made a clean bag, I recommend making the whole bag read-only. This will prevent the OS from dropping these hidden files back in again.