Beruflich Dokumente
Kultur Dokumente
Hao Zhang
Dr. Piotr Jankowski Joey Lee
Dr. Eric Buhi Chris Allen Rich Zhang
Dr. Xuan Shi Stephanie
Dr. Jean Mark Gawron
Nowinski
Jared Jashinsky
Principal Investigator: Dr. Ming-Hsiang Tsou mtsou@mail.sdsu.edu, (Geography), Co-PIs: Dr. Dipak K Gupta
(Political Science), Dr. Jean Marc Gawron (Linguistic), Dr. Brian Spitzberg (Communication), Dr. Li An (Geography).
Dr. Jay Lee (Kent State, Geography), Dr. Ruoming Jin (Kent State, Computer Science), Dr. Xinyue Ye (Kent State,
Geography, Dr. Heather Corliss (Public Health, SDSU), Dr. Xuan Shi (Geoscience, U of Arkansas).
San Diego State University, Kent State University, University of Arkansas, USA.
What is Human Dynamics?
Human Dynamics --
is a transdisciplinary research field
focusing on the understanding of
dynamic patterns, relationships,
narratives, changes, and transitions
Smart Phones - 2007 of human activities, behaviors, and
(The Mobile Age)
The most important communications.
scientific instrument
in the 21st Century.
(2014 Sales: 1.2 billion units)
Animated Image created by the HDMA Center (Hao Zhang).
Big Data is Human-Centered Data
The term, Big Data, refers to big ideas, big impacts, and big
changes for our society in addition to a big volume of datasets.
Big
Data
(information)
Time Place
Tsou, M. H. and Leitner, M. (2013). Editorial: Visualization of Social Media: Seeing a Mirage or a Message? In Special Content Issue: "Mapping
Cyberspace and Social Media". Cartography and Geographic Information Science. 40(2), pp. 55-60. DOI: 10.1080/15230406.2013.776754
Data Integration / Data Fusion
Explore their spatiotemporal relationships in both network
space (cyberspace) and geographical space (real world).
Health or Disaster
Data Layer
Image provided by
Dr. Atsushi Nara
(Associate Director of
HDMA Center).
Big Data Category (Tsou, 2015).
Social life data: social media services (Twitter, Flickr, Snapchat, YouTube,
Foursquare, etc.), online forums, online video games, and web blogs.
Health data: electronic medical records (EMR) from hospitals and health
centers, cancer registry data, disease outbreak tracking and epidemiology data.
Transportation and human traffic data: GPS tracks (from taxi, buses,
Uber, bike sharing programs, and mobile phones), traffic censor data (from
subways, trolleys, buses, bike lanes, highways), and mobile phone data (from
data transmission records and cellular network data).
Geography (place and time) is the KEY for understanding Big Data!
Research Showcase #1:
Geo-Targeting
Data Collection
(Twitter APIs)
Application Programming Interfaces (API)
Filter
Machine
Learning
Trend
Analysis
Spatial
Analysis
Analysis
SMART
Dashboard Visualization
What we can get from Twitter data?
Where to find geospatial information?
Example: Use Twitter Search API to search for keyword HIV test or HIV testing
Only 1% - 7% of Tweets have X, Y GEO-coordinates (from GPS or Geo-tagged devices).
But 50% - 60% Tweets have city-level locations provided by their User Profiles.
80% Tweets have Time Zone (limited spatial meaning)
Geocoding Engine for Social Media
The HDMA Center has built our own Internal
GeoCoder Engine for User Location Profile:
using GeoNames.org gazetteers (Creative
Commons Data).+ User defined rules.
Real-time social media analytics (Trend Analysis, Word Clouds, Top URL,
web pages, Top Hashtags/Mentions/Stories).
Collect Tweets from Top 31 U.S. Cities (17 miles radius)
31 different cities across the United States (chosen based on their population sizes): Atlanta, Austin,
Baltimore, Boston, Chicago, Cleveland, Columbus, Dallas, Denver, Detroit, El Paso, Fort Worth,
Houston, Indianapolis, Jacksonville, Los Angeles, Memphis, Milwaukee, Nashville-Davidson, New
Orleans, New York, Oklahoma City, Philadelphia, Phoenix, Portland, San Antonio, San Diego, San
Francisco, San Jose, Seattle, and Washington, D.C.
Number of tweets
10,678
5,398
4,947
4,944
3279
Machine
Learning
2013
2014
R= 0.90559
R= 0.5566
The comparison between National ILI Rate and the 32 Cities Tweeting Rate, with
prediction up to Week 15. Red National ILI, Purple Tweet Rate for 2015-2016.
How to Build a Flu Prediction Model ?
Daily Patterns or Weekly Patterns?
Time Scale: Daily, Weekly, Monthly.
Client/Server System
Design Framework
Next Step:
Open Source Initiative
(GPL license copy-left)
Research Showcase #2:
(Geo-tagged Tweets)
Monitor Disaster
impacts, Recovery
Activities and
Victims Needs
Spatial Clustering (Wildfire Tweets)
Comparing Spatial Cluster of DUI Records (Red dots, Left side) and
Tweets with Drunk keyword (Right side).
GIS Map with DUI Records GeoViewer (Search drunk for two months)
@ReadySanDiego
@10News @SanDiegoCounty
@KPBSnews
@UTsandiego
Identify the network influence for each individuals (who are the opinion leaders?)
Predicting the Spreads (Speed, Scale, and Range) of Social Media Messages in
Different Social Networks. (following, retweets, and mentions relationships)
Hyperlocal Relationship
From Online Connections to Offline Locations
Funded by
NSF Cyber-Enabled Discovery and Innovation (CDI) program. Award # 1028177. (2010-
2015) http://mappingideas.sdsu.edu/
NSF Interdisciplinary Behavioral and Social Science (IBSS) Program, Award #1416509
(2014-2018): Spatiotemporal Modeling of Human Dynamics Across Social Media and
Social Networks. http://socialmedia.sdsu.edu/
Human Dynamic in the Mobile Age (HDMA)
Why Choose Twitter?
80% academic researchers are using Twitter APIs to get their social media data.
1. Free and Open Access Data from APIs (you can write a program in your desktop
to download Twitter data (tweets) automatically). But the free APIs has the 1%
data limit.
2. Large User Base (+500 million users) and very popular in U.S., Europe, and
Japan. But not in China, Taiwan, and Korea (China has a similar platform called
Weibo).
3. Easy to program in Python or PHP (Tweepy, TwitterSearch, etc.). Many available
API libraries to use now.
4. Historical data and 100% data can be purchased from Twitter (but very
expensive).
5. Rich [Metadata] tags in each tweet (time stamp, user, follower, platform, time
zone, text, URL, Retweet, language, devices).
Other possible social media APIs: Flickr, Instagram, Foursquare, Yelp, YouTube.
Why not Facebook? (Facebook Graph APIs are VERY LIMITED and PROTECTIVE. No
Public data feed). You need to have internal connections to Facebook staff to
conduct research.
Next Step: Syndromic Surveillance (Underdevelopment)
(tracking multiple Symptoms: fever, cold, cough, vomiting, etc. )
http://vision.sdsu.edu/hdma/smart/syndromic
Designed for
Early Detection of
unknown disease
outbreaks, such as
Swine Flu and SARS
Other Examples
Building a transformative research agenda for
Big Data Science
http://journals.plos.org/plosone/article?id=10.
1371/journal.pone.0141185
Chinese Twitter:
Sina Weibo
Mapping Dynamic Urban Land Use Patterns with Crowdsourced Geo-tagged Social
Media (Sina-Weibo) and Commercial Points of Interest (POI) Collections in Beijing,
China (submitted to the journal of CaGIS, under review now).
Residential Areas
+ College Dormitory
Building a Comprehensive Food Environment Databases
Yelp, Foursquare, Flickr, and Google Places.
Building a Dynamic Urban Population Model:
Building a hourly-based
population density
model in Urban area (for
disaster responses and
management)