The TV News Archive: A Crystal Ball for the Media

Earlier this year, the Internet Archive launched a timely and important service: TV News Search & Borrow. I’ve been meaning to write about this for some time.

If you are a scholar, media critic, educator, journalist, or activist—if you have any interest about at all in the media and how it works—this is a very big deal.

On paper, TV News Search & Borrow is a searchable archive of 350,000 TV news broadcasts. At this time it spans 20 broadcast stations. It’s all the news that’s been broadcast from 2009 until 24 hours ago.

But the TV Archive is more than that—it’s all the engineering and operations muscle of the world’s largest library on show. It’s a bold statement about how the news media should work. It’s a crystal ball for understanding the media.

We’ve had searchable indexes of newspapers, the law (LexisNexis), books (Open Library, Google Books) and of course for the web itself (Google, Wayback Machine) but never for the most influential communications medium: television. YouTube is a useful trove of TV content, but it’s not designed as an archive, and as a commercial entity it’s subject to greater copyright pressures. (How often have you tried to watch a video that was removed by request of the copyright owner?)

Some use cases

The TV Archive began its public life as the Remembering 9/11 project, which presents all the news from the week of September 10th, 2001 on every major international channel.

Take time to peer into that looking glass, and try to interpret 3-4 days of monumentous history that set the course of history. How amazing it is, with ten years hindsight, to pull out memes that began emerging in those formative first days after the attack? To follow the threads of public opinion to where they originate? To really understand the zeitgeist? That’s something that we need more of.

The hundreds of billions of hours we spend watching television deeply shapes our perception of our politics, of our social norms, of reality as we know it. The TV Archive provides a set of tools to research these phenomena. It’s a place to study systemic editorial biases, commercial pressures, silos and slants—and to interpret the news as a shaper of reality, not merely a reflection of it.

I’ve always loved things like Mosaic, the LinkTV show that synthesizes what various media outlets are saying in the Middle East on any given day; and Media Matters, the liberal watchdog blog that tries to catch conservative politicians contradicting themselves on video.

In its current form, the TV Archive gives these kinds of analytical tools to anyone with a web browser. Just pull up the same 30 minutes of news coverage as broadcast on Fox and MSNBC and compare. This is an easy way to take a sample slice from the media spectrum and explore how different networks are covering the same issue.

Web apps, mashups, and automation

The most awesome thing about the TV Archive is that it’s part of the web. Vanderbilt University and others have been archiving television for some time, but the Internet Archive is making television hyperlinkable—and by extension, a potential building block for future web applications.

Think about use cases that emerge when you’re able to combine television with data from open APIs across the web.

Let’s say we want users to better understand the kind of pundits that C-SPAN or Fox News are inviting on-air. We could parse captions from these networks and extract the pundits’ names, run searches on the names, and present interesting comparisons astride the videos.

Or say we’re trying to study politicians’ behavior on Sunday morning talk shows. We could reference Sunlight Labs’ Influence Explorer API to fetch contextual metadata about campaign contributions, and try to correlate the money they’ve raised with positions they’ve taken. How statistically similar are their talking points to other politicians with a similar donor composition?

If we feel like going farther out and embracing computer vision, let’s capture every nth frame from a given political speech, run it through OpenCV, and determine how often certain politicians blink. Does this ever correlate with making unfactual statements, as judged by Poltifact?

These are just a few ideas. Open web hackers will think of much more surprising and delighting experiences.

@bertez, for instance, built a sentiment tracker for political speeches in one day; why not automate this across the entire archived history of television?

A bigger surface area for the mind

Earlier this year, Kate Hudson mocked up an TV Archive interface concept for comparative media studies. In her example, she imagines a student visualizing and studying 50 years of McDonald’s commercials:

And at the Mozilla Ignite Hack Day at the Internet Archive in SF, she and a few others built out the video wall concept as a hacky experiment on top of the TV News Archive. Just search for any term and the app automatically creates a video wall, complete with an instant Chat Roulette-style webcam sharing function so you can compare clips with a friend.

I was pretty impressed by this demo. But I felt I’d seen it or experienced it somewhere, as in a dream. Then I remembered.

In Alan Moore’s Watchmen comic, the arch-villian Ozymandias is “the smartest man in the world.” What does the smartest man in the world do with his free time? He sits in his lair, astride a massive grid of television screens. He pores over comparative media coverage from Los Angeles, New York, London, Milan, and Tokyo. This helps him understand human affairs in their entirety.

It’s true—with a little elbow grease, a few hours of JavaScript hacking, Popcorn and the TV News Archive—you too can be like Ozymandias, the villain of the Watchmen comics, the smartest person in the world.

The significance of the TV Archive

As much as television’s shaped our democracy, it’s probably winding down. In 50 years, television won’t make sense as a communications medium. Television’s sun is setting as internet protocol is swallowing everything on the horizon. Roger MacDonald at the Internet Archive put this in context for me, saying: “from our vantage point at the peak of this hill, we can see the beginning and the end of the communications medium known as television. So why not start making a copy?”

This fact that someone like Brewster Kahle can come along and embark on the ambitious project of “making a copy of television” is a strange and wonderful thing. To hear him say it, the storage and computational challenge isn’t that great. All of the television ever produced must be a few petabytes. For the Archive, this boils down to cold hard math: some data center operations, some ingenuity, some preservation strategy.

An endeavor like this calls for new units of measurement, like the “channel-year.” A channel-year is apparently about 10TB. Not bad—”That’s two or three disk drives that you can buy off the shelf,” says Brewster.

This is the kind of public interest innovation that reminds you what’s so wonderful about the web.

About this entry