Skip to main content
  1. digipres-lessons-learned/

Story of a Bad Deed

A tiny digital mystery| ·1075 words
Digipres-Lessons-Learned Digital Preservation Keeping Codes Lessons Learned
Andy Jackson
Author
Andy Jackson
Fighting entropy since 1993

I love a digital preservation mystery, and this one started with question from @joe on digipres.club:

A mystery file, starting with 0x0baddeed, eh? Fascinating. Those hex digits didn’t happen be accident. Using four-digit hex patterns to signal format is an extremely common design pattern, but no authority hands them out – each format designer mints them independently. There must be a story here…

The first step is to find other examples to work with. For exactly this reason, I deliberately built a special feature into our search indexes: the ability to seach for files based on the first four bytes. Gratifyingly, someone else beat me to it:

Poking around in the underlying data it was clear that the 179 files that matched this query all appeared to be PowerPoint files based on the file extension, but neither DROID nor Apache Tika could say any more.

A search of PRONOM showed two separate records for Microsoft PowerPoint for Macintosh 4.0 and Microsoft Powerpoint Presentation 4.0, but no earlier versions. In this case, Wikipedia faired better, linking to this nice overview of the format compatability between PowerPoint versions.

While the File Format Wiki did not have much detail for the early versions of PowerPoint, it did link to a source of sample files1. This proved to be very fortunate indeed…

I downloaded some of the old sample files from there, and compared them against the 0x0baddeed files. Here’s the start of one of the sample file:

$ hexdump -C nii.ppt | head
00000000  ed de ad 0b 03 00 00 00  45 17 00 00 3f 01 31 17  |........E...?.1.|
00000010  6f 20 0f 00 50 00 3e 01  28 17 00 00 28 00 00 00  |o ..P.>.(...(...|
00000020  79 00 00 00 5b 00 00 00  01 00 04 00 00 00 00 00  |y...[...........|
00000030  c0 16 00 00 00 00 00 00  00 00 00 00 10 00 00 00  |................|
00000040  00 00 00 00 00 00 00 00  00 00 80 00 00 80 00 00  |................|
00000050  00 80 80 00 80 00 00 00  80 00 80 00 80 80 00 00  |................|
00000060  c0 c0 c0 00 80 80 80 00  00 00 ff 00 00 ff 00 00  |................|
00000070  00 ff ff 00 ff 00 00 00  ff 00 ff 00 ff ff 00 00  |................|
00000080  ff ff ff 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000090  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|

and here’s the start of one of the 0x0baddeed files…

$ hexdump -C BidStrat.ppt | head
00000000  0b ad de ed 00 00 00 03  00 00 00 1e 00 7b 00 0a  |.............{..|
00000010  00 00 be cd 00 50 00 7a  00 00 00 00 00 00 80 00  |.....P.z........|
00000020  00 18 00 00 03 f6 80 00  00 00 00 00 04 0e 80 00  |................|
00000030  03 c0 00 00 04 0e 80 00  01 a4 00 00 07 ce 80 00  |................|
00000040  0d 16 00 00 09 72 80 00  00 26 00 00 16 88 80 00  |.....r...&......|
00000050  00 00 00 00 16 ae 80 00  00 40 00 00 16 ae 80 00  |.........@......|
00000060  00 00 00 00 16 ee 80 00  00 60 00 00 16 ee 80 00  |.........`......|
00000070  00 26 00 00 17 4e 80 00  00 20 00 00 17 74 80 00  |.&...N... ...t..|
00000080  00 18 00 00 17 94 80 00  00 00 00 00 17 ac 80 00  |................|
00000090  03 40 00 00 17 ac 80 00  01 54 00 00 1a ec 80 00  |[email protected]......|

Do you see it? Look closer…

$ hexdump -C BidStrat.ppt | head -1
00000000  0b ad de ed 00 00 00 03  00 00 00 1e 00 7b 00 0a  |.............{..|
$ hexdump -C nii.ppt | head -1
00000000  ed de ad 0b 03 00 00 00  45 17 00 00 3f 01 31 17  |........E...?.1.|

Both the first and second four bytes match, but are reversed! Welcome to the confusing world of endianness (see also Apple’s docs on byte ordering)2.

Most computers use a byte-ordering called ’little-endian’, but the older Mac used an alternative ordering called ‘big-endian’. This is just two different conventions for storing data, and I can’t look at 0x0baddeed and know which ordering it is. However, the discovery of ppt files starting with either 0x0baddeed or 0xeddead0b is consistent with the same type of data being stored in different endian-orders.

Indeed, searching for the reversed pattern finds 430 files, and better still the online version of TRiD determines these to be early PowerPoint files.

TRiD Result

In fact, it looks like TRiD is also able to distinguish between versions 2.0 and 3.0, but only for the more common byte-ordering.

Sadly, there is no trivial ‘fix’ for this. You can’t just go through the whole file and flip the bytes, because only some chunks are stored like that. If you use the strings command to extract the text, it’s in the expected order, not half-flipped, because it’s stored as a byte stream not 32-bit ‘words’. The only way to open these files will be to use PowerPoint 2.0 or 3.0 in an emulator, and although either should be able to open both, they are effectively distinct formats. I’m not able to test this, but maybe someone else can?

But why 0x0baddeed? Well, I started to speculate that this was a statement from a disgruntled developer. PowerPoint 2.0 was the first version of PowerPoint that ran on both Macs and PCs, and a joke that only reveals itself on one platform would be just the kind of thing I’d expect from a community that relishes Rick-rolls. But then after I thought I had this idea, I realised it feels more like a memory. Can any one else remember a story like this? Please let me know!



  1. Thanks to Nick Krabbenhöft for pointing out that I’d mis-remembered where I’d got these samples from. This updated blog post should now be accurate! ↩︎

  2. I remain frustrated that I still find endianness confusing. In the past, I managed to cobble together a port of a little-endian platform emulation that ran on my big-endian Mac, and despite the fact I got it working I never felt like I really understood it properly! ↩︎