Now’s the time for collaborative video on Wikipedia

This post outlines one step toward bringing collaborative video editing to Wikipedia.  For years, the idea of a video-rich Wikipedia has animated free knowledge advocates, public media, and educators. But the technical complexity involved in enabling collaborative video editing has always put this promise just out of reach.  Today, many of the key blockers to making video a first class citizen of Wikimedia have been mitigated or removed. Now is the right time to bring Wikipedia to motion with a combination of clever technological implementation and peer production.

But first, some background—to better understand why the time is now, it’s helpful to first look backward.

It’s 2009. The average U.S. broadband speed is 4Mbps. YouTube has been around for about 4 years, but has not yet transformed the entire online landscape. HTML5 video is still under development and not yet deployed to users. Smartphones don’t really do video yet.

By this time it’s already known that the future of the web is video. Forward-thinking technologists are aware that video will soon eclipse all other Internet applications in terms of bandwidth and attention.

Against this backdrop, a handful of open knowledge hackers are thinking ahead to a rich media future for Wikipedia—the crown jewel of the free culture movement. Technology Review, June 2009:

“The organization behind Wikipedia is close to launching an editable online video encyclopedia to enhance the current textual one… Within two to three months, a person editing a Wikipedia article will find a new button labeled “Add Media.” Clicking it will bring up an interface allowing her to search for video–initially from three repositories containing copyright-free material–and drag chosen portions into the article, without having to install any video-editing software or do any conversions herself. The results will appear as a clickable video clip embedded within the article.”

My friend and colleague Peter Kaufman, a deep thinker on public media, captured the promise well, noting: “To have people be able to go in and annotate your video, edit your video, and improve upon it–in the same way people have been doing to your text posts–is pretty outstanding, and will create an audio-visual representation of our world that will rapidly become as definitive and collaborative as Wikipedia is in the textual world.”

By 2010, despite the promise—and significant excitement—this vision was no closer to reality. There were several reasons for this:

  • frameworks for client-side video playback were still very weak and poorly distributed (with the exception of Flash, which was deeply proprietary and therefore incompatible with the free software values of the Wikimedia Foundation)
  • the toolchain for open video codecs was immature—transcoding videos from their native format into Theora was highly failure prone and expensive.
  • it was hard to grow a collection of properly licensed video assets. This was the very dawn of the camera phone and DIY videos were likely to be shaky and poor lit. When it came to professionally produced content, rights-holders were even more afraid of open content licensing then they are now.
  • the norms and best practices around embedding video content were not well defined.
  • critically, all the mechanisms for actually preparing and editing video clips were offline. This meant that video assets that made it onto Wikipedia were inherently static, with very high friction to make even minor changes. That didn’t seem very “wiki.”

All of these challenges collided into each other and multiplied. Michael Dale, a media hacker with a joint appointment at Wikimedia and Kaltura, shipped an early media sequencer, but the immaturity of the broader ecosystem meant it stayed within the stables of Wikimedia Labs. Where are we today, five years later?

  • HTML5 video support is fairly ubiquitous across desktop and mobile browsers.
  • WebM emerged as a viable open codec, with the backing of an industry-led consortium organized by Google;
  • Snapchat and Vine have thoroughly democratized video production and normalized the practices of uploading and sharing DIY video. Everyone carries a camera in their pocket.
  • inside of Wikipedia, some hard-won progress has been made on video best practices. Andrew Lih, Pharos of Wikimedia NYC and others have been working and making progress on the content-creation and community norms side of things.
  • Mozilla spent several years building Popcorn Maker, the first viable cloud-based video editor. This is a big one and represents a significant investment in open source innovation (Brett Gaylor, Bobby Richter and I went deep on all aspects of this system with the smart hackers at the Seneca College CDOT program).

The technology underlying Popcorn Maker is the key to bringing video collaboration to Wikimedia in a meaningful way.

Why is Popcorn Maker perfect for enabling collaborative video on Wikipedia?

The technology stack, time, and skills required to prepare and upload video are already pretty heavy. Peer production of video is even more complicated, discouraging, and cumbersome. One reason is that once the video is ready, it’s “flattened” or “rendered.” To make any changes to the video requires that you have access to the editing environment, all the assets, and then to take the time to re-render, re-upload, and update the video. That’s almost impossible when coordinating volunteers who may be collaborating from different time zones and across the world. It’s so much friction that it bears almost no resemblance to editing a wiki. How can we make editing video as easy as editing hypertext?

Popcorn Maker takes a smart approach to this problem. It is as a browser-based tool that enables users to stitch videos together from any addressable media out there on the web (an advantage over desktop based video-editors, which reference locally stored media).

Instead of actually arranging, rendering, and re-encoding and hosting derivatives of the original media on a local machine, we simply point to them where they live on the web. Assets are loaded up over http and are sequenced and manipulated through the Popcorn Maker web app. The user composes her work entirely in the browser, and when she’s done her work is stored and represented in Javascript (JSON) which enables the work to be reconstructed, on-demand, in any HTML5 client. The raw source files stay exactly where they are, addressable by HTTP.

What does this mean for collaborative video editing? When an editor wants to make a change to a Popcorn project, all she has to do is click “remix.” The editor is dropped into the browser-based authoring environment, with all the assets loaded up over the web. When the editor is finished making changes, she clicks “publish” and all of the Popcorn players for this video project embedded across the web now serve this new version.

In other words, Popcorn is a non-destructive, non-linear editor for the entire web. It’s the way video editing should work in 2015—fully cloud based, fast and fluid. And much better suited to networked, peer production.

This is a much more manageable and extensible approach than some of the alternatives (such as manually uploading an replacing videos, editing proxy files or edit decision lists). Because all you need is a web browser, it’s much more accessible. It doesn’t require fast hardware for encoding, fast internet for uploading media, or to install any extra software or plugins, Just a reasonably modern web browser. And it’s been ready for more than two years.

From here it is very easy to see how Wikimedia editors around the world could be collaboratively creating and improving encyclopedic video content. Building on some of the existing source for Popcorn Maker, we could be piloting this approach in a matter of months, not years.

How would Popcorn work on Wikipedia and Wikimedia Commons?

It would work a lot like Michael Dale and company originally envisaged in 2009, actually:


Users would upload and identify media through the Wikimedia Commons. Popcorn would work as a client-based media sequencer, stitching those media into a runtime and with some simple additions (like title cards).

The new runtime would be saved as JSON, with MediaWiki’s built-in revision system enabling collaboration and reversion when necessary. When another user wants to edit the video, she’d just click on that version and the Popcorn editor would be loaded up.

For playback, we could either use Popcorn’s web based player (in which case no rendering pipeline would be necessary, because all the assets can be loaded “just-in-time”), or we could use an ffmpeg pipeline to “flatten” these videos on demand. The latter would enable offline viewing of these videos.

Mozilla Popcorn editor

Mike Nolan, who is currently interning at Mozilla, has actually gotten started on this work. His Popcorn Editor repo strips Popcorn Maker clean and enables a white-label installation:

And he’s also building an ffmpeg backend, to enable “flattening” of these videos.

What’s next and how to help

Mike will be at Wikimedia in Mexico City this week, joining the hackathon, meeting Wikimedia hackers, and moving closer to a functioning prototype at Wikimedia Labs. If you’re there, say hi. We’re looking for help on things like:

  • using the Wikimedia authentication system with Popcorn
  • storing Popcorn’s JSON as a MediaWiki edit
  • building a Wikimedia Commons search/viewer for Popcorn
  • standing up the necessary tooling for a Popcorn player embed in Wiki pages and/or ffmpeg rendering pipeline
  • UI customization

At the Wikimedia conference, we’ll be joining the discussion in these sessions:

In the future, we also intend to host a small working group with folks from Internet Archive, Creative Commons, and Columbia University’s Center for New Media Teaching and Learning (they make the very complimentary MediaThread). Watch this space.

No time like now

The world has changed a lot since 2010, when we led a campaign from the Open Video Alliance: the optimistically titled “Let’s Get Video on Wikipedia!”  Among the activities we led in that campaign was a white paper and tutorial for cultural institutions thinking of donating video to the Commons. Because of the lack of UI or true support, it was a hilariously complicated non-starter, which read to archivists like a bit of incomprehensible Greek, i.e.: “here’s a brief tutorial on how to get your video uploaded to Wikipedia, starting with converting your videos to Theora using Firefogg, an browser add-on that enables client-side transcoding, or Miro Converter, a stand-alone video converter application, after which you’ll configure your settings and enable an experimental mwEmbed extension, then upload to Wikimedia Commons…”

Even so, things are moving: a brief study in early 2013 with Ward Cunningham found that roughly 4,000 out of 4 million articles in English Wikipedia had associated videos. And Wikimedia has a much healthier outreach to the GLAM sector now.

With technology like Popcorn, we can drive down the complexity of collaboratively editing video, making it more fun, accessible, and participatory. We hope that can add even more momentum to the cause of making Wikipedia a more multimedia resource for its next few decades.

About this entry