From Algorithms To Stories.

From Algorithms
to Stories
Jonathan Stray
Columbia / ProPublica / Overview
or...
three hard lessons
in building tools for
computational
journalism
Science Journalism

Journalism through Science
Computational Journalism

Stories will emerge from stacks of nancial disclosure
forms, court records, legislative hearings, ocials'
calendars or meeting notes, and regulators' email messages
that no one today has time or money to mine. With a suite
of reporting tools, a journalist will be able to scan,
transcribe, analyze, and visualize the paFerns in these
documents.

- Cohen, Hamilton, Turner, 2011
Links links links!

bit.ly/OverviewHackers
Doc sets in journalism
and then...
nobody used it

three years later...
Finalist, 2014 PuliPer Prize in Public Service
Winner, 2014 PuliPer Prize in Public Service
Demo: Obama Form LeFers
Algorithm agnostic via visualization plugin API

Ships with clustering, word clouds, advanced search...
Lesson 1
Workow >> Algorithm
User testing!
Loaded confirmation link, which goes to /docsets. "Hmm. What do I do now?" Eventually
clicked import link. "I need more guidance what to do next." Import pane opened to DC
login. Looked like he was about to type in credentials. Then: "I can't really do any of these
now." Eventually saw "example document sets" and clicked.
Cloned caracas-cables example set. Waited. Understood when document set import
complete. Then hesitated. Didn't know where to click to open. Eventually clicked.
"In general, you could be way more communicative."
Moved mouse to document list immediately. "For some reason, this drew me." Clicked around
doc list. "What am I looking at?"
Moved to tree view. Clicked + without hesitation to open node. Saw document in viewer
change. "It's not clear what I'm looking at in the viewer." Eventually: "Which document is
showing when I click a node? Is it the first?"
A little later, more conversationally: "I don't know how useful the document list is." He said this
twice at different points. "Is this a comma separated list of documents? It just looks like one
block of text." Suggested a horizontal delimiter of some sort.
The hardest feature to implement

The most requested, the most used
Lesson 2
It's humans + machines
By Maria Kiselyova
(Reuters) - Russian mobile phone operator Vimpelcom has become the
latest company to come under scrutiny over its operations in Uzbekistan, an
authoritarian country where rival MTS had its assets confiscated.
U.S.-listed Vimpelcom, Uzbekistan's biggest mobile operator by subscribers,
said on Wednesday that it was being investigated by the U.S. Securities and
Exchange Commission (SEC) and Dutch authorities.

Demo: Uzbekistan's Telco Bribes
VIS: Visual Investigative Scenarios
Lesson 3
Real data is messy
What researchers choose

News articles
Academic literature
NLP test data sets
What journalists deal with
PDF dumps
Printed, scanned emails
Scraping thousands of pages from an antique site
CD full of Excel files
...
Standard Named Entity

Recognition not working
Test of OpenCalais against 5 random articles from various sources
versus hand-tagged entities

Overall precision = 77%
Overall recall = 30%

...and this is on the cleanest possible data
Meta-lesson
You don't know what
the user's problem is
Iterative design loop
A number of previous tools aim to help the user ex-

plore a document collection (such as [6, 9, 10, 12]),
though few of these tools have been evaluated with
users from a specic target domain who bring their own
data, making us suspect that this imprecise term often
masks a lack of understanding of actual user tasks.
Six case studies, four of which were "search" tasks

(journalist needed to locate known or suspected evidence)
There are surprisingly few papers that comment on the

adoption of a visualization tool without the prompting
of designers: in a recent survey of eight hundred
visualization papers containing an evaluation
component, only ve commented on adoption [31]
What are the metrics

that count?
Evaluation Methods for Topic Models

Wallach et. al. 2009
BeFer metrics?
How many stories got done?
o Are you solving a niche problem?
o Would resources have been better spent on reporting?
How long did it take to do the story?

o Is this faster than using text search?
o Is it even faster than just reading the documents?
o How much would it have cost to pay someone to do it?
What happened after the story was published?
Journalism as a cycle
Action
Data
Reporting
User
Distribution
Story
Use it!
overviewproject.org
Code it!
github.com/overview
Thank you!
Knight Foundation, Google Ideas, Open Syllabus Project

From Algorithms To Stories.

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

From Algorithms To Stories.

Hochgeladen von

Copyright:

Verfügbare Formate

From Algorithms

Links links links!

Doc sets in journalism

Finalist, 2014 PuliPer Prize in Public Service

Winner, 2014 PuliPer Prize in Public Service

Demo: Obama Form LeFers

Algorithm agnostic via visualization plugin API

The hardest feature to implement

Demo: Uzbekistan's Telco Bribes

VIS: Visual Investigative Scenarios

What researchers choose

What journalists deal with

Standard Named Entity

Iterative design loop

A number of previous tools aim to help the user ex-

Six case studies, four of which were "search" tasks

There are surprisingly few papers that comment on the

What are the metrics

Evaluation Methods for Topic Models

How long did it take to do the story?

What happened after the story was published?

Das könnte Ihnen auch gefallen