July 31, 2007
A journalist's guide to crowdsourcing
OJR's editor answers basic questions about how news organizations can improve investigative reports by using the Internet to gather information from their readers. By Robert Niles Last week, I had the pleasure of conducting some training sessions for the staff at the Orlando Sentinel in Florida. I spent the morning and lunch sessions talking with Sentinel reporters and editors about blogging and discussion forums, and the final session of the day was on my favorite online journalism topic: crowdsourcing. Few journalists, at the Sentinel or elsewhere, know much about this topic, save, perhaps, for the fact that it's become one of the industry's hotter buzzwords. But I believe that crowdsourcing might, in the end, have more of an effect on all forms of journalism than anything else that's come out of the online journalism revolution. That's why I decided to put together this introductory Q&A about crowdsourcing, for OJR readers. What is crowdsourcing? Crowdsourcing, in journalism, is the use of a large group of readers to report a news story. It differs from traditional reporting in that the information collected is gathered not manually, by a reporter or team of reporters, but through some automated agent, such as a website.
Stripped to its core, though, it's still just another way of reporting, one that will stand along the traditional "big three" of interviews, observation and examining documents. The core concept is not new in journalism. At its heart, modern crowdsourcing is the descendent of hooking an answering machine to a telephone "tip line," where a news organization asks readers to phone suggestions for stories. Or asking readers to send in photos of events in their community. Such methods require substantial manual labor to sift through submitted material, looking for information that can be used well in a story. Which makes them only marginally more effective than traditional news reporting. True crowdsourcing involves online applications that enable the collection, analysis and publication of reader-contributed incident reports, in real time. What are some examples of crowdsourcing? My favorite example comes not from a news organization, but the U.S. Geological Survey. Its "Did You Feel It" feature builds detailed "shake maps" illustrating the intensity of earthquakes by zip code, through thousands of volunteer reports submitted online by readers. A simpler example, but very popular this summer, is GasBuddy.com The site won't win any awards for soothing graphic design, but it allows readers in more than 100 communities to share real-time reports on gas prices in their area. I built my first crowdsourcing news feature in 2001, on my theme park website. "Accident Watch" built a reader-written database of injury
accidents at U.S. theme parks, in the absence of federal or significant state incident data. Readers submitted reports of injury accidents that they'd witnessed or read about, with reports from just one reader labeled "unverified." A second report of the same incident from another reader or link to an official police, court or park report or a news story was required for a report to be labeled "verified." How can I be sure this information isn't bogus? In a true crowdsourced project, information is not verified manually by a reporter between submission and publication. Which inspired concern from many traditional reporters. A well-designed crowdsourcing project, like a well-edited newsroom, can discourage bogus submissions while minimizing their influence if accepted. Here are my suggestions to avoid bogus data in a crowdsourced project: Request the reader submit personal identification along with the report. On "Accident Watch," readers must be registered with the site, which requires e-mail verification, in order to submit a report. The earthquake project requires a zip code and requests a reader's name, phone, e-mail and street address. Asking readers to identify themselves sends the message that you take this project seriously and that you wish them to do the same. Obviously bogus ID allows you to flag bogus records for deletion with ease. If your project publishes individual reports, provide other readers with an opportunity to dispute or verify each individual report. The empowers your readers to help clean your data for you. Even if you are publishing data only in aggregate, be aggressive about encouraging readers who dispute that data to add their report to
the database, as more data should help move the mean toward the true value. How is crowdsourcing different from polling? Obviously, you do not have a controlled random sample of the population in a crowdsourced project, as you would with a carefully executed poll. But that does not prohibit you from collecting accurate and engaging data through crowdsourcing. You just need to be careful in identifying whether a specific project works better with polling or crowdsourcing. Polling's great for constructing an accurate portrait of a community's demographics, attitudes and behavior. Crowdsourcing's great for incident reports, which might be incomplete if limited to a small random sample. Either the incident (the roller coaster crash, the bottled falling from your kitchen shelves, the three-buck gasoline) happened, or it didn't. But the more people you have "on the ground" as potential sources in your crowd, the more data points you can collect. If you poll only a few hundred people, you'll miss incidents. Think of another great crowdsourcing project: missing/safe person lists following a disaster, such as Hurricane Katrina or 9/11. A random sample would get you only the family and friends of your sample, instead of the many thousands more who want and need information about their loved ones. At the same time, be careful about drawing broad conclusions about community behavior based on your crowdsourced incident reports. Don't ask people about their income, education or even exercise habits, them
claim that your numbers represent the entire community you cover. Use a traditional random sample survey to reveal that kind of descriptive data, instead. Do I have to learn programming to do this? No, but if you want to attempt a true crowdsourcing project, someone in your newsroom will. Free online survey tools and mapping websites can help you collect and publish great reader-contributed data. But if you want custom information to move from survey form to published report in real time, you can't do that yet without a programmer on your team. Ultimately, journalism is social science, and journalists who want to make best use of crowdsourcing need to get familiar with the mathematics of social science. The interviewing and document searches of 20th-century investigative reporting will look incomplete as savvy journalists and newsrooms learn to harness the Internet's wide reach and interactivity to gather massive databases that only formal social science techniques can effectively manage and analyze. Isn't this just citizen journalism on steroids? Consider crowdsourcing a fork in the citizen journalism movement. Unlike more traditional notions of "citizen journalism," crowdsourcing does not ask readers to become anything more than what they've always been: eyewitnesses to their daily lives. They need not learn advanced reporting skills, journalism ethics or how to be a better writer. It doesn't ask readers to commit hours of their lives in work for a publisher with little or no financial compensation. Nor does it allow any one reader's work to stand its own, without the context of many additional points of view.
For those reasons, I think, crowdsourcing ultimately will revolutionize journalism as "citizen journalism" efforts that rely on more traditional reporting methods fail.