Friday, June 19, 2009

Data, Metadata, and Statistics

Digital objects exist in a different way beyond mere objects in the physical world. They're created and the information by which they are described is added to the object, so it can be found.

This is "metadata" - kind of like the stuff that gets stuck to your shoe that you simply can't rub off.

Every digital object collects this as it moves, gets copied, is altered - even deleted. No fingerprints remain invisible. (Yes, even an object that isn't there still declares itself, if only by virtue of the fact that it is no longer present.) Lots of times metadata is intentionally added to an object. Titles, dates, to-do lists ("Delete after end of quarter," "Save for blog," "unused takes").

But just because this digital object has collected all this extra descriptive information doesn't mean it's the better for it. The object becomes larger as it travels, and it costs time and energy to preserve all this stuff on the object, not just the object itself.

And just because there's all this new information on it doesn't mean it's good info. Much of it may be wrong. Or incomplete. Or mean different things to different people, programs, or systems.

The signal to noise ratio begins to change. And just because it's all info about the object itself also doesn't mean that it's metadata, either. Maybe the info is part of the object's creation, but doesn't actually describe it. It might not be about the object, just riding along, attached accidentally or through someone's alterior or altruistic motives.

Once an object collects information about itself, that doesn't mean it should all be preserved with the object.But figuring out what belongs, what might be needed in the future, and what's merely a parasitic piece of code costs resources to deduce.

Not all metadata is created equal. It has a lifecyle, and some becomes obsolete at a certain point in the various iterations of the object, as it moves from VHS to laserdisc to DVD to Blu-Ray, for example. Just because you got it, just because it's right, doesn't mean anyone's gonna give a damn.

Metadata lives and dies and people get paid a lot to create, preserve, and migrate it. But it's invisible and of unknown value. So we spend more time worrying about it than what it is describing. We shouldn't lose sight of the underlying artistic creation that makes it necessary in the first place, in this world of digital access. A page of poetry or cut of music, a clip of film that people fell in love with, 100 years ago. And maybe 100 years from now.

In the future there will be no record players. You'll want to be able to find Miles Davis, won't you?

