Predicting Temporal Variance of Information Cascades in Online Social Networks

Department of Computer Science and Engineering
National Institute of Technology Karnataka, Surathkal
Predicting Temporal Variance of Information Cascades in Online Social Networks
Technical Report for
Major Project
Course - CO499
Authors: Avinash Das - 10CO18 Sriniketh Vijayaraghavan - 10CO86 Thejaswi M.I. - 10CO95
Supervisor: Dr. P. Santhi Thilagam Dept. of CSE NITK, Surathkal
January 31, 2014
Contents
1 Introduction 1.1 Origin of the Proposal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Denition of the Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Review of Status of Research and Development 2.1 International Status . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1.1 Modeling Information Diusion . . . . . . . . . . . . . . . . . 2.1.2 Information Diusion Process . . . . . . . . . . . . . . . . . . 2.1.3 Behavior of social networks in case of crisis situations . . . . 2.2 Novelty Importance of the proposed project in the context of current 3 Target Beneciaries of the Proposed Work 4 Work Plan 4.1 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Time Schedule of activities giving milestones . . . . . . . . . . . . . . . . . . . . . 2 2 3 3 4 4 4 5 5 5 7 8 8 9
. . . . . . . . . . . . . . . . status
. . . . .
. . . . .
. . . . .
. . . . .
Chapter 1
Introduction
1.1 Origin of the Proposal
Online social networks allow hundreds of millions of Internet users worldwide to produce and consume content. They provide access to a vast source of information on an unprecedented scale. Online social networks play a major role in the diusion of information by increasing the spread of novel information and diverse viewpoints. Events, issues, interests, etc. happen and evolve very quickly in social networks and their capture, understanding, visualization, and prediction are becoming critical expectations from both end users and researchers. This is motivated by the fact that understanding the dynamics of these networks may help in better following events (e.g. analyzing revolutionary waves), solving issues (e.g. preventing terrorist attacks, anticipating natural hazards), optimizing business performance (e.g. optimizing social marketing campaigns), Therefore researchers have in recent years developed a variety of techniques and models to capture information diusion in online social networks, analyze it, extract knowledge from it and predict it. Modeling how information spreads is of outstanding interest for stopping the spread of viruses, analyzing how misinformation spread, etc. This leads to our motivation for performing this research. In our analysis, we hope to discover the intrinsic patterns leading to large and periodic cascades in a social network.
CHAPTER 1. INTRODUCTION
1.2
Denition of the Problem
After performing an extensive literature survey, we understand the nature of the problem of modeling social cascades. What we plan to accomplish through the course of this major project is to be able to locate the seeds/triggers that are initially active in the social network and then model the cascade that follows. By being able to model and observe the cascade that propagates from the choice of seeds, we can get snapshots of the network which can be used to predict the occurrence of such cascades. Hence, our problem statement is to
Predict the temporal variance of social cascades and thus curb the spread of misinformation in online social networks by denying environments favorable to the spread of the cascade.
1.3
Objectives
While a substantial amount of research has been done in the context of inuence maximization, a problem that has not received much attention is limiting the inuence of a misinformation campaign. One strategy to deal with a misinformation campaign is to limit the number of users who are willing to accept and spread it. Depending on the context that the inuence limitation problem is introduced in, we need to consider dierent objective functions. The objective can be to try and save as many nodes as possible, to limit the lifespan of the bad information campaign or to maximize the eect of the good campaign in the presence of the bad campaign. We intend to focus on minimizing the number of nodes that end up adopting bad campaign when the information cascades from two campaigns are over. We plan to understand the propagation of multiple campaigns in Twitter by modeling them and to curb the spread of misinformation by understanding its patterns of propagation through the network. In the end we hope of proposing a way to curb the spread of misinformation.
Chapter 2
Review of Status of Research and Development

2.1
2.1.1
International Status
Modeling Information Diusion
In [1], we study the information propagation in online social network with particular emphasis on tie-strength. They consider the following following 3 parameters for modeling; beta (the number of nodes that will republish information), weight(the weight of ties is dened in terms of the betweenness centrality and the strength of ties) and lastly alpha(with higher alpha, the model tends to select ties with higher weights to republish the information and lower weights are preferred when alpha is negative). The Flickr social network was used to model social cascades as in [2]. In this paper the authors investigate how the information propagates through social links in online social networks. They develop a model that mimics the spread and dissemination of information in the Flickr social network and produces cascades very similar to those found in real social networks. We have another model which is used to predict information diusion in online social networks. This is done in [3], the authors explore modeling information diusion in spatio-temporal dimensions and propose a partial dierential equation(PDE), specically, a diusive logistic(DL) equation to model the temporal and spatial characteristics of information diusion. It explores the answer to the question, For a given information m initiated from a particular user called source s after a time period t, what is the density of inuenced users at distance x from the source?. The Digg dataset is used to validate the proposed a DL model and successfully predict the density of inuenced users for a given distance and a given time for both distance metrics. They also present the temporal and spatial patterns of information diusion in a real dataset of Digg.networks. In [4], the authors proposed a linear inuence model which, rather than acquiring knowledge of the social network and then modeling the diusion by predicting which node will inuence which other nodes in the network, they focus on modeling the global inuence of a node on the rate of diusion through the network. They model the number of newly infected nodes as a function of which other nodes got aected in the past. For each node, they estimate an inuence function that quantities how many subsequent infections can be attributed to the inuence of that node over time. No explicit knowledge of the network is necessary. Accurately models not only the inuence each node has on diusion but also how the diusion unfolds overtime. Found out that twitter users with th most followers are not the most inuential in terms of information propagation. Also, it is seen that depending on the node type(newspaper, news agency, blog) and topic of information, inuence functions exhibit dierent shapes. Their model has several extension such as accounting for novelty and accounting for imitation, etc.
CHAPTER 2. REVIEW OF STATUS OF RESEARCH AND DEVELOPMENT
2.1.2
Information Diusion Process
The process of information diusion involves the spreading of information through the network resulting in the formation of information cascades. In [5], they use stochastic cellular automata to generate data and analyze the results. It discusses the impact of weak and strong ties and the eect of advertising in the spread of information. They found out that when the personal networks are large, then strong ties are better for spreading news and when we have small size personal networks, then weak ties are good for the spread of information. Advertising is good for starting the growth cycle but later on, its eects wane. In [6], the authors compare the information diusion structure in weblogs and microblogs such as twitter. Twitter feeds maintain a higher frequency in posting and both twitter(TW) and weblogs(WB) present super linear distribution of contribution across users. WB is more self sustained with large number of links pointing to other blog sites. In contrast, most URLs in TW are outbound. WB network is more coherent globally while the TW is more decentralized and connected locally. Thus the TW suggests a limited larger scale information diffusion. In [7], the authors go a step further and try to predict the information diusion in with only a partial information about the network. A general model is proposed in this paper which directly predicts the nal contamination values without going through the whole diusion process at each step and at each node like most models do. The instances of the general model include the linear model, the logistic model, the positive linear model and the graph based positive linear model. This predictive model makes use of the concept of machine learning by training the model with a set of parameters. It aims at predicting the nal state without modeling the whole diusion process over the network. Does not make the closed world assumption familiar to most information diusion models. It can learn to predict the nal contamination states from data generated by dierent models and outperforms those models as soon as the information about the network structure becomes unreliable.
2.1.3
Behavior of social networks in case of crisis situations
[8] Quanties social inuence and tries to measure it. Uses a breaking news such as the death of Michael Jackson to see the spread of information on twitter. Social inuence occurs when ones thoughts/actions are aected by other people. The metrics they use for social inuence on twitter are Follower inuence, reply inuence, reTweet inuence. They develop a ranking system for these inuences and then measure how social inuences change over time. Both the number of messages and the number of users can be used to assess the inuence. They measure the qualities which aect the social inuence the most and then order them, thus giving them a rank. This research in progress paper discusses the spread of tweets by means of popular hashtags such as #washooting. It catagorizes tweets into dierent catagories such as opinion related, information related, action related, etc. And analyzes the percentage of total tweets among them. It also measures the time after the event and the percentage of the dierent catagories of tweets within each time frame. This allows us to determine the nature of information spread and the sentiment of the people immediately and after a while. This can be used to predict this nature in the future based on the statistical analysis. Over half the tweets (55%) were retweets passed on word for word from one author to another. This enriches our knowledge of how microblogging is used during a violent crisis.
2.2
Novelty Importance of the proposed project in the context of current status
According to [9], there exists an epidemic threshold below which no epidemics form and above which it spreads to a signicant fraction of the network. The authors nd that two eects, pre-
CHAPTER 2. REVIEW OF STATUS OF RESEARCH AND DEVELOPMENT
viously studied in isolation, combine cooperatively to drastically limit the nal size of cascades. First, most people are aware of a story because they have been posed to it by many of their friends. Second, despite multiple opportunities for infection within a social group, people are less likely to become spreaders of information with multiple exposure. In an older paper [10], the authors try to minimize the spread of information by blocking a limited set of links. A method for eciently nding an approximate solution via a greedy algorithm is proposed. This problem is related to the inuence maximization problem for social networks. Spread of contamination was thought to be curbed by removing nodes in a network, but the authors propose to remove links between nodes and also suggest that blocking links between nodes with the highest out degree need not always be eective. The authors have validated this by using large scale blog and Wikipedia datasets. In [11], the authors address the problem of inuence limitation, where a bad campaign starts propagating from a certain node in the network and use the notion of limiting campaigns to counteract the eect of misinformation. This allows two parties to compete for adopting either a good or a bad campaign. This problem is considered to be NP-Hard just like the Inuence Maximization problem. The authors investigated ecient solutions to the question: Given a social network where a bad information campaign is spreading, who are the k inuential people to start a countercampaign if our goal is to minimize the eect of a bad campaign?. This is the eventual inuence limitation problem.
Chapter 3
Target Beneciaries of the Proposed Work

Our project is crucial to identify anomalies that occur in the social media. We aim to detect campaigns which are most likely to cause a ruckus and start counter campaigns in order to stem the ow of the chaos that will ensue. We do this by either creating our own campaign on the network or identifying the best alternative campaign which already exists and promote the spread of that campaign over everybody elses. This work is high relevance to internal security, national security and for disaster management. By being able to predict the future occurrences of such events, we will be able to take appropriate counter measures. Political campaigns or popular ideas which are spread amongst the people are crucial to the area of internal security. So, we believe that security analysts and disaster management teams would seek a high value in projects such as this one. It is aimed to give a glimpse of future occurrings of major causality events an suggests immediate counter measures to stem the ow of such events and prevent them from propagating.
Chapter 4
Work Plan
4.1 Methodology
We have been collecting twitter messages as they have been tweeted from December up until mid January. We plan to use a part of this data as a test data to run our algorithm on. We will then classify each of the campaigns as a good or a bad campaign. The moment we nd a good campaign, we deem all others as bad. The idea is to be able to classify what is good. By running several iterations over our test data, we implement a learning algorithm to classify good and bad campaigns in the remaining test data to validate our algorithm. The whole process is represented by a ow chart in the next page.
Figure 4.1: Flowchart depicting the methodology 8
CHAPTER 4. WORK PLAN
4.2
Time Schedule of activities giving milestones
Our timeline for the coming process is as follows:
Arrived at the Problem Statement 09/2013
Proposed a Theoretic MCICM Model for information Diusion 9/11/2013
Completed Collection of Twitter Stream Data 25/01/2014
Run simulations on test data and validate results 03/2014
Completed thorough Literature Survey 010/2013
Started Collecting Twitter Stream Data 12/2013
Plan to run learning on Algorithm on Training Data 02/2014
Publish Findings 04/2014
We have currently completed our data collection and will start to run the MCICM model on the training data. Once that is complete, we will test our algorithm with the test data in order to validate if we are actually able to nd out the good campaign and neutralize the spread of the bad campaigns. This is our goal and plan of action.
Bibliography
[1] Information propagation in online social network: a tie-strength perspective,Jichang Zhao, Junjie Wu, Xu Feng, Hui Ziong, Ke Xu [2] Modeling Social Cascades in the FLickr Social Network, Bai Yu, Hong Fei, 2009 [3] Diusive Logistic Model Towards Predicting Information Diusion in Online Social Networks, Feng Wang, Haiyan Wang, Kuai Xu, 2012 [4] Modeling Information Diusion in Implicit Networks, Jaewon Yang, Jure Leskovec, 2010 [5] Talk of the Network: A Complex Systems Look at the Underlying Process of Word-of-Mouth, Jacob Goldenburg , Barak Libai, Eitan Muller, 2001 [6] Comparing Information Diusion Structure in Weblogs and Microblogs, Jiang Yang, Scott Counts, 2009 [7] Predicting Information Diusion on Social Networks with Partial Knowledge, Anis Najar, Ludovic Denoyer, Patrick Gallinari, 2012 [8] Measuring Message Propagation and Social Inuence on Twitter.com, Shaozhi Ye and Felix Wu, 2010 [9] What Stops Social Epidemics? , Greg Ver Steeg, Rumi Ghosh and Kristina Lerman , 2011 [10] Minimizing the Spread of Contamination by Blocking Links in a Network, Masahiro Kimura , Kazumi Saito , Hiroshi Motoda, 2008 [11] Limiting the Spread of Misinformation in Social Networks, Ceren Budak, Divyakant Agrawal , Amr El Abbadi, 2011
10

Predicting Temporal Variance of Information Cascades in Online Social Networks

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Predicting Temporal Variance of Information Cascades in Online Social Networks

Hochgeladen von

Copyright:

Verfügbare Formate

Department of Computer Science and Engineering

National Institute of Technology Karnataka, Surathkal

Predicting Temporal Variance of Information Cascades in Online Social Networks

Technical Report for

Supervisor: Dr. P. Santhi Thilagam Dept. of CSE NITK, Surathkal

January 31, 2014

Denition of the Problem

Review of Status of Research and Development

CHAPTER 2. REVIEW OF STATUS OF RESEARCH AND DEVELOPMENT

Information Diusion Process

Behavior of social networks in case of crisis situations

Novelty Importance of the proposed project in the context of current status

CHAPTER 2. REVIEW OF STATUS OF RESEARCH AND DEVELOPMENT

Target Beneciaries of the Proposed Work

Figure 4.1: Flowchart depicting the methodology 8

CHAPTER 4. WORK PLAN

Time Schedule of activities giving milestones

Our timeline for the coming process is as follows:

Arrived at the Problem Statement 09/2013

Proposed a Theoretic MCICM Model for information Diusion 9/11/2013

Completed Collection of Twitter Stream Data 25/01/2014

Run simulations on test data and validate results 03/2014

Completed thorough Literature Survey 010/2013

Started Collecting Twitter Stream Data 12/2013

Plan to run learning on Algorithm on Training Data 02/2014

Publish Findings 04/2014

Das könnte Ihnen auch gefallen