Sie sind auf Seite 1von 34

Fron%ers

of Computa%onal Journalism
Columbia Journalism School Week 3: Social Filtering September 25, 2013

Week 5: Social Filtering


Finding sources on social media Par%cipatory Journalism Informa%on Distribu%on on Social Networks Social SoHware

Classify Users
Classic machine learning problem. Classify each user as one of: journalist/blogger organiza%on ordinary individual First, need to encode as a vector / select features...

Features for user classier


# of followers / following # of posts, favorites percentage of posts that are RTs, @replies, links presence/absence of named en%%es topic distribu%on of tweets (IPTC top level topics)

Digression: IPTC Media Topic Codes


Interna%onal standard hierarchical taxonomy, part of the NewsML markup system. Dened by Reuters, AP, NYTimes...

K-nearest neighbor classier

Take K closest training points (in high dimensional feature space), choose majority label.

Crea%ng the training data


1,850 random users 1,532 known organiza%ons 1,490 known journalists and bloggers Hired Mechanical Turk workers to apply labels. Each user labeled by two workers, discarded if disagreement.

Classier Accuracy

Eyewitness classier
Goal is to nd individual tweets that are eyewitness reports. Started with LIWC (linguis%c inquiry and word count) dic%onary that classies English words along 70 dierent dimensions, including emo%on, cogni%on, %me, health...

Word Aspects

Used percep%on category words plus insight and certainty words

Eyewitness tweet classier


Its an eyewitness tweet if it contains any of these special words! (or their stems) High precision! Low recall.
89% of tweets classied as eyewitness actually were. But only 32% of eyewitness tweets detected.

Other dimensions
Tweet contains URL to photo or video (used table of domain names, e.g. ickr.com = photo) Posted from mobile device (from tweet metadata naming pos%ng app) Geocode users stated loca%on (this is painful and unreliable) Distribu%on of friends loca%ons. (Friend = mutual following)

Test user reac%ons


This gives you context you have the context for whether or not you think theyre reputable or whether or not theyre worth reaching out to. Its giving me a lot of context which is really useful when youre trying to verify if someone is reputable or not. I would tend to focus on the eyewitnesses and journalists/bloggers. Eventually Id look at everyone else but Id want to start my search with those two groups because they would normally provide me with the most informaCon.

Test user reac%ons


Popular features:
Eyewitness ltering, user loca%on, image/video lter

Unpopular features:
En%ty extrac%on not helpful, no ability to lter by loca%on and eyewitness status, focus on users instead of content

Week 5: Social Filtering


Finding sources on social media Par%cipatory Journalism Informa%on Distribu%on on Social Networks Social SoHware

How Journalism Works (a dierent model than last week)

User

stories not covered x

x x x

x x x ltering User

who user chooses to follow = social ltering x x x

x x

Week 5: Social Filtering


Finding sources on social media Par%cipatory Journalism Informa%on Distribu%on on Social Networks Social SoHware

Twioer follower network


We have crawled the en%re Twioer site and obtained 41.7 million user proles, 1.47 billion social rela%ons, 4, 262 trending topics, and 106 million tweets. In its follower-following topology analysis we have found a non-power-law follower distribu%on, a short eec%ve diameter, and low reciprocity, which all mark a devia%on from known characteris%cs of human social networks - Kwak et. al, What is TwiEer, a Social Network or a News Media?

More followings than followers

Small avg distance between two nodes


(why? and what does this mean?)

Its a news network Small number of high-degree hubs

Its a news network Small number of high-degree hubs Dierent network structure than e.g. Facebook. Dierent uses. why?

Week 5: Social Filtering


Finding sources on social media Par%cipatory Journalism Informa%on Distribu%on on Social Networks Social SoHware

Social SoHware
Basic assump%on: structure of soHware inuences how groups use it. or: architecture inuences behavior

Three ways to inuence behavior


Norms: culture, habits, e%queoe, the users sense of what is right or appropriate Laws: rules enforced by the administrator Code: what it is actually possible to do

Design problem...
What do we want the users to accomplish together? How do we encourage this? We can write the code, but the culture is to some degree beyond our predic%on or control.

Das könnte Ihnen auch gefallen