Unsafe Device Removal#

Introduction#

Let’s start with an little experiment called ā€œUnsafe Device Removalā€[10]…

Materials#

For this experiment, you will need:

  1. An USB flash drive of little importance. One of those old sub-GB ones you got from that conference will do.

  2. A copy of a digital file of great importance. Any format will do, as long as it’s in a format you can open.

I’m going to use this drive:

Test Drive

…and this JPEG:

My father and my son, alike.

Method#

  1. Copy the test file to the USB flash drive. Do not use your only copy of the precious file!

  2. Open up the test file from the USB drive, as you usually would (i.e. using the usual app for that format).

  3. Pull out the USB flash drive. Do Not Eject It Properly! Just yank it right out![1]

  4. Observe what happens.

My Results#

In my experiment, the first thing that happened was…

Disk Not Ejected Properly

…but beside this admonishment, the image was still there…

But Still There

The bitstream was gone (optionally blended into oblivion – the Digital Object destroyed). But the image was still on the screen. I bet yours is still there too.

But right now, it’s at risk. All it takes is loss of power to this machine, and the file will blink out of existence.[2]

Can you press ā€˜Save as…’, and get a new bitstream back? It depends on the software.

When I tried this with Apple Preview, I couldn’t save the image, even though I could see it.

Apple Preview Says No

The only way to save it seemed to be as a desktop screenshot, which I would then need to crop to get back an acceptable image.

But re-running the same experiment with image editing software (specifically the GIMP), I could press ā€˜Save as…’ and a new bitstream was written. Not exactly the same as the original, but good enough.[3]

Over to you#

I’d be fascinated to know what happens on other platforms and with other software, so please get in touch if you’ve tried this. I’d also be curious to know how the choice of format affects the outcome. If anyone has any results to share, I’ll collect them together in a follow-up post.

Your Results#

Following my proposed experiment in data destruction, a few kind readers tried it out and let me know what happened[4]. I’ve summarised the results below, to try and see if there’s any common pattern.

Software

Format

Was recovery possible?

Apple Preview

JPEG

No (rendered image still shown and could be captured via screenshot)[5]

GIMP

JPEG

Yes (with minor alterations to the data, likely within allowed limits for JPEG)[5]

Imagemagick display

JPEG

Yes (result not binary-identical)[6]

Ubuntu Image Viewer

JPEG

No[7]

Ubuntu Document Viewer

PDF

Yes[7]

PDF reader

PDF

PDF from a browser, stay in a PDF reader after the browser closes but can’t be saved[8]

Word (Windows 95)

DOC (on a floppy!)

No (but re-inserting the floppy worked!)[9]

As far as I can tell from this data, there isn’t much of a pattern here. Broadly, the observed behaviour seems to depend on the software rather than the format, and ā€˜viewer’ style applications appear less likely to allow re-saving than ā€˜editor’ apps (but the behaviour of the Ubuntu Document Viewer shows this is not a robust finding). All we can be sure of at this point is this: ā€œIt’s complicatedā€.

To find out what’s going on, we’ll need to look more closely at what happens when we open a file…

Conclusion#

So what was going on in our little experiment in data destruction? Well, to understand what happens when we open up digital files, I want to take you back to my childhood, back when ā€˜Loading…’ really meant something…

I’d like you to watch the following video. Please enjoy the sweet ā€˜music’ of the bytes of the bitstream as they stream off the tape and into the memory of the machine.

And no skipping to the end! Sit through the whole damn thing, just like I had to, all those years ago!

I particularly like the bit from about 0:24s in, as the loading screen loads…

JETPAC: loading the loading screen

First, we can see a monochrome image being loaded, section-by-section, with individual pixels flowing in row-after-row. The ones and zeros you can see are the same one as the ones you can hear, but they are being copied from the tape, unpacked by the CPU, and being stored in a special part of the machine’s memory, called the screen memory.

This screen memory is special because another bit of hardware (called the ULA) can see what’s there, and uses it to compose the signal that gets sent to the television screen. As well as forming the binary pixels, it also uses the last chunk of memory to define what colours should be used, and combines these two sets of information to make the final image. You can see this as the final part of the screen-loading process happens, and the monochrome image suddenly fills with colour. You can even hear the difference between the pixel data and the colour information.

After that, the tape moves on and we have to wait even longer while the actual game loads.[31]

The point I want to emphasize is that this is just a slow-motion version of what still happens today. The notion of ā€˜screen memory’ has become more complex and layered, and it all happens much faster, but you’re still interacting with the computer’s memory, not the persistent bitstream.

Because working with memory is faster and simpler than working directly with storage devices, the kind of software that creates and edits files is much easier to write if you can load the whole file into memory to work on it there. The GIMP works like this, and that’s why I was able to re-save my test image out of it.

However, Apple Preview works differently. Based on my results, it seems likely that Preview retains a reference to the original file, which it uses to generate an intermediate in-memory image for display purposes (e.g. a scale-down version). The cached intermediate image can still be shown, even if future operations may fail because the software can no longer find the original file.

These results only make sense because the thing you are interacting with via the computer screen is not the original bitstream, but a version of that data that has been loaded into the computer’s memory. The relationship between these two representations depends on the software involved, can be quite complicated, and the two forms can be quite different.[32] My suspicion is that we need a better understanding of this relationship in order to better understand what it is we are actually trying to preserve.