Sie sind auf Seite 1von 27

Information Processing and Management xxx (2014) xxxxxx

Contents lists available at ScienceDirect

Information Processing and Management


journal homepage: www.elsevier.com/locate/infoproman

Large-scale evaluation framework for local inuence theories


in Twitter
Magdalini Kardara a,, George Papadakis b, Athanasios Papaoikonomou a, Konstantinos Tserpes a,
Theodora Varvarigou a
a
National Technical University of Athens, Greece
b
IMIS, Athena Research Center, Greece

a r t i c l e i n f o a b s t r a c t

Article history: Inuence theories constitute formal models that identify those individuals that are able to
Received 4 September 2013 affect and guide their peers through their activity. There is a large body of work on devel-
Received in revised form 18 June 2014 oping such theories, as they have important applications in viral marketing, recommenda-
Accepted 19 June 2014
tions, as well as information retrieval. Inuence theories are typically evaluated through a
Available online xxxx
manual process that cannot scale to data voluminous enough to draw safe, representative
conclusions. To overcome this issue, we introduce in this paper a formalized framework for
Keywords:
large-scale, automatic evaluation of topic-specic inuence theories that are specialized in
Evaluation framework
Social inuence
Twitter. Basically, it consists of ve conjunctive conditions that are indicative of real inu-
Topic communities ence exertion: the rst three determine which inuence theories are compatible with our
framework, while the other two estimate their relative effectiveness. At the core of these
two conditions lies a novel metric that assesses the aggregate sentiment of a group of users
and allows for estimating how close the behavior of inuencers is to that of the entire com-
munity. We put our framework into practice using a large-scale test-bed with real data
from 75 Twitter communities. In order to select the theories that can be employed in
our analysis, we introduce a generic, two-dimensional taxonomy that elucidates their func-
tionality. With its help, we ended up with ve established topic-specic theories that are
applicable to our settings. The outcomes of our analysis reveal signicant differences in
their performance. To explain them, we introduce a novel methodology for delving into
the internal dynamics of the groups of inuencers they dene. We use it to analyze the
implications of the selected theories and, based on the resulting evidence, we propose a
novel partition of inuence theories in three major categories with divergent performance.
2014 Elsevier Ltd. All rights reserved.

1. Introduction

In the context of a social network, inuencers are prominent individuals with special characteristics that enable them to
affect a disproportionately large number of their peers with their actions. Their special characteristics are related to their
individual activity and social background as well as to their position in the network (i.e., their connections with the other
members). These inuencers typically play a crucial role in a variety of scientic and business domains (Bakshy, Hofman,

Corresponding author. Address: National Technical University of Athens, 9 Iroon Polytechniou Str., 15773 Zografou Campus, Athens, Greece. Tel.: +30
2107772568.
E-mail address: nkardara@mail.ntua.gr (M. Kardara).

http://dx.doi.org/10.1016/j.ipm.2014.06.002
0306-4573/ 2014 Elsevier Ltd. All rights reserved.

Please cite this article in press as: Kardara, M., et al. Large-scale evaluation framework for local inuence theories in Twitter. Information
Processing and Management (2014), http://dx.doi.org/10.1016/j.ipm.2014.06.002
2 M. Kardara et al. / Information Processing and Management xxx (2014) xxxxxx

Mason, & Watts, 2011). For example, marketing campaigns could gain value from this process, since individual customers are
prone to imitate their highly inuential peers with respect to product adoption (Keller, Fay, & Berry, 2007). Instead of over-
whelming the entire customer base with massive, but blind advertisements, marketing campaigns could target a small num-
ber of inuential people. This cost-effective alternative is called viral marketing and is capable of achieving similar levels of
product diffusion with traditional approaches (Brown & Hayes, 2008).
To facilitate such applications, a lot of research has focused on the identication of inuencers. This effort led to the devel-
opment of inuence theories,1 which constitute formal models that estimate for every member of a social network the inuence
she exerts on her peers according to one or more criteria. Some theories study inuence on a global scale, considering the
activity and the user base of the entire social network. Most commonly, though, an individuals inuence is local: she may
be considered expert in a specic domain, but her opinion usually holds little weight outside this particular area. Based on this
principle, local inuence theories aim at identifying inuencers among the members of individual communities, which are
usually formed around a particular topic. Typically, the local theories are more accurate and efcient than the global ones,
as they exclusively consider the activity and the dynamics inside the individual communities.
A major issue in the study of local inuence theories is their evaluation. Over the recent years, On-line Social Networks
(OSNs) have provided researchers with powerful tools for studying the dynamics of inuence diffusion. They contain a vast
amount of user-generated content as well as explicit connections among their members, thus allowing for the analysis of
social inuence on an unprecedented scale. Still, inuence constitutes a subjective concept and, as such, it is very hard to
measure and track. Most works actually lack a formal methodology for evaluating the results produced by their theories.
Instead, they typically resort to selecting a small sample of the top ranked users in order to assess their authority in the real
world (e.g., their fame or the quality of their content) (Cha, Haddadi, Benevenuto, & Gummadi, 2010; Weng, Lim, Jiang, & He,
2010). This manual procedure, however, cannot scale to large volumes of data and, thus, is incapable of yielding represen-
tative, reproducible and generalizable results.
In this work, we aim to overcome this shortcoming, by establishing a principled framework that is capable of evaluating
local inuence theories for OSNs on a large-scale. It receives as input the groups of inuencers they dene called prominent
groups in the following along with the rest of the community and the corresponding activity. The goal of our framework is
to estimate the relative accuracy of inuence theories in predicting activity patterns that denote an imitation by the rest of
the community. Internally, our framework encompasses ve conditions that should be satised by a prominent group with
real inuence over the other members of the community. These conditions can be summarized as follows:

1. real inuencers comprise a small subset of the community,


2. they are able to affect their fellow members with limited cost, i.e., by accounting for a limited portion of the communitys
overall activity,
3. their activity is highly correlated with that of the remaining community with respect to an objectively measured metric,
4. their activity that is relevant to this metric chronologically precedes that of their peer community members, and
5. the volume of their activity that is relevant to this metric corresponds to a mere fraction of the overall activity this metric
takes into account.

Conditions 1, 2 and 5 actually correspond to the pre-processing requirements of our framework. Their goal is to ensure
that a prominent group is compatible with it, accounting for a limited portion of the activity and the user base of the under-
lying community. These are fundamental prerequisites for drawing safe conclusions from the analysis performed by our
framework. The remaining two conditions encapsulate the real functionality of our framework. At their core lies an objec-
tively measurable metric that correlates the activity of a prominent group with the rest of the community. To elucidate its
functionality, consider a metric that assesses the aggregate sentiment of a group of users; a high correlation between the
prominent group and the rest of the community members indicates that the stance of the former coincides with the overall
mood of the latter. Failure with respect to either of these conditions indicates a theory that is inadequate in identifying real
inuencers. In contrast, an inuence theory is effective if the individuals it marks as inuencers satisfy both conditions. The
stronger these conditions hold for them, the more effective the theory is. For instance, among two theories with similar
performance, the one that exhibits higher correlation with the rest of the community is preferred.
Given that all ve conditions rely on objectively measurable metrics, our framework allows for comparing the perfor-
mance of local inuence theories on a large-scale, without manual intervention. To put it into practice, we form a large-scale
benchmark dataset that consists of real-world data. We actually draw our data from Twitter,2 which was selected for several
reasons (Bakshy et al., 2011; Cha et al., 2010; Weng et al., 2010): it is one of the most popular OSNs in the eld, it conveys ad
hoc, yet clear and manageable rules for social interaction among its members, it abounds in dynamic topic communities and
nally, it provides easy access to large volumes of user-generated content. In total, our test-bed comprises 75 topic communities

1
It should be stressed at this point that the term inuence theory is used abusively in this work, since it does not refer to actual theories that provide
insights into the behavior of social network users, explaining why some of them imitate the behavior of others. Instead, the term refers to inuence ranking
methods, which associate every user with a score that is proportional to an estimation of the inuence she exerts on her peer members. By convention, though,
these methods are termed inuence theories in the literature. This convention is followed in this work, as well.
2
https://twitter.com/.

Please cite this article in press as: Kardara, M., et al. Large-scale evaluation framework for local inuence theories in Twitter. Information
Processing and Management (2014), http://dx.doi.org/10.1016/j.ipm.2014.06.002
M. Kardara et al. / Information Processing and Management xxx (2014) xxxxxx 3

from Twitter with more than 600,000 highly active users, who have posted over 6 million messages during a time period of
7 months. Hence, it is suitable for performing large-scale qualitative and quantitative analyses with our framework.
Our experimental study also includes several local inuence theories from the literature. To facilitate the understanding
of their functionality, we introduce a two-dimensional taxonomy that classies them with respect to their scope and the
metric they use for assessing inuence. The former criterion partitions inuence theories into global, local (i.e., topic-spe-
cic) and hybrid ones, while the latter criterion distinguishes the evidence they consider into textual, graphical and hybrid
information. We then map the main inuence theories for Twitter to our taxonomy and explain which ones are applicable to
our settings. Our analysis results in the selection of ve established and representative inuence theories that have been
widely used in the literature.
The outcomes of our thorough evaluation indicated signicant differences in the performance of these inuence theories.
To explain the resulting performance patterns, we introduce a novel methodology for delving into the internal functionality
of every theory in order to examine the dynamics of its group of inuencers. In essence, it comprises a series of statistical
analyses that reveal three aspects of each prominent group:

1. the levels of homophily among its members,


2. the versatility of their activity, and
3. the afnity between them, in terms of the frequency of their pairwise interactions.

The outcomes of this methodology advocate a tripartite categorization of inuence theories: (i) those that form groups of
inuencers with strong ties among them, (ii) those that select unrelated, but individually powerful inuencers, and (iii) those
that mark as inuencers ordinary users, who lack any sense of team spirit and exhibit low levels of collaboration. In practice,
the last category yields poor performance with respect to our evaluation framework, while the rst one identies highly
effective inuencers, who coordinate with each other in order to spread their impact to the entire community. Similar
effectiveness is achieved by the inuencers of the second category, despite the limited collaboration between them, because
they benet from their individual merits.
On the whole, the main contributions of our work are the following:

 We formalize the problem of evaluating the performance of local inuence theories on a large scale. We actually reduce it
to checking ve objectively-measurable conditions that provide strong indications of real inuence exertion in the con-
text of any social network.
 We put our evaluation framework into practice, testing ve established local inuence theories over a large dataset that
comprises 75 Twitter communities with more than 600,000 users and 6 million tweets.
 We analyze the performance of the selected inuence theories through a novel methodology that provides insights into
their functionality and the dynamics of the prominent groups they dene.
 We further analyze the performance of the selected inuence theories, by introducing a two-dimensional taxonomy that
classies inuence theories according to their scope and the inuence metric(s) they employ. We apply it to the main
inuence theories for Twitter, but it is general enough to accommodate theories for any other social network.

The rest of the paper is structured as follows: in Section 2, we present the most important works in the eld and organize
them according to our two-dimensional taxonomy. Section 3 formalizes the notions that lie at the core of Twitter and based
on them, it introduces our evaluation framework. In Section 4, we analyze the performance of selected inuence theories
with respect to the ve conditions of our framework and in Section 5, we introduce a novel methodology for analyzing
the internal dynamics of prominent groups. Finally, Section 6 concludes the paper, providing directions for future research.

2. Related work

Inuence diffusion in real-world social networks has been the subject of various studies over the past few decades see
(Katz, Lazarsfeld, & Roper, 2005) for more details. Recently, it raised new interest among researchers, largely due to the pop-
ularity of OSNs, such as Facebook3 and Twitter. The user activity recorded by these systems actually allows researchers to study
real-world social inuence on an unprecedented scale. To the best of our knowledge, though, no prior work proposed a formal
methodology for comparing the effectiveness of inuence theories on a large scale. The only work that is relevant to our eval-
uation framework is a model proposed in Gayo-Avello (2013) for evaluating inuence theories with respect to their resiliance to
abusive users. According to this model, a successful theory should assign low ranking scores to spammers and marketers, while
reserving high ranking positions for veried accounts. The author performed a comparative analysis of several global, graph-
based inuence theories (see Section 2.1 for this categorization), including PageRank, HITS, TwitterRank (Weng et al., 2010)
along with some variations of them. The outcomes demonstrate that most algorithms achieve a similar performance in terms
of resiliance to abusive users, with the exception of TwitterRank, which scored much lower than the others.

3
http://www.facebook.com.

Please cite this article in press as: Kardara, M., et al. Large-scale evaluation framework for local inuence theories in Twitter. Information
Processing and Management (2014), http://dx.doi.org/10.1016/j.ipm.2014.06.002
4 M. Kardara et al. / Information Processing and Management xxx (2014) xxxxxx

In the following, we rst examine the main inuence theories for Twitter, organizing them in a two-dimensional taxon-
omy. Then, we go more in depth on three additional topics that are closely related to their development and use.

2.1. Inuence theories

As explained above, Twitter offers a convenient platform for studying inuence diffusion on a large scale. Thus, many
inuence theories have been developed especially for its social network. In this section, we elaborate on the most important
ones so as to select those that will be examined in the context of our evaluation framework. To facilitate their understanding,
we introduce a two-dimensional taxonomy that categorizes them in terms of their scope and the metric they use for assess-
ing inuence.
The scope of an inuence theory determines its application area, i.e., the part of the social network that provides the
candidate inuential users as well as the evidence that characterizes their activity. This dimension partitions inuence the-
ories into the following categories:

 Global inuence theories take into account the activity of the entire social network with the aim of identifying the overall
most inuential users. For example, consider a theory that identies as inuencers the users with the highest number of
followers across the entire Twitter network (i.e., global indegree).
 Local inuence theories exclusively consider the activity of a specic topic community with the aim of detecting the most
inuential of its members. For instance, such a theory selects as inuencers the users with the highest number of follow-
ers within the current community (i.e., local indegree).
 Glocal inuence theories involve a hybrid functionality, deriving evidence from the entire social network so as to detect the
inuencers of a specic community. As an example, consider a theory that marks as inuencers the members of a com-
munity that have the highest global indegree.

With respect to scope, our analysis focuses on local inuence theories for Twitter. Such inuence theories are more effec-
tive in their predictions and more efcient in their functionality, given that they exclusively consider evidence drawn from
the activity and the topology of individual communities.
The metric of an inuence theory determines the type of information it takes into account when assessing the inuence
exerted by a specic user. This dimension distinguishes inuence theories into the following categories:

 Graphical inuence theories derive inuence from the position of a node (i.e., user) on the social graph. In this category fall
graph criteria, such as indegree, node centrality and betweeness.
 Contentual inuence theories assess the inuence of a user judging exclusively from the textual content she produces.
Given, though, that there is no objective measure for explicitly assessing the quality of user-generated content, implicit
estimations are typically employed in practice. In the context of Twitter, inuence criteria that belong to this category
consider the number of retweets and the number of mentions pertaining to an individual user. Their implicit assumption
is that the higher the quality of the messages posted by a user is, the more frequently she is retweeted or mentioned.
 Holistic inuence theories rely on a hybrid functionality that considers both the content produced by users and their posi-
tion on the social graph.

Note that theories belonging to the rst two categories can be further rened according to their granularity into atomic
and composite inuence theories; the former take into account a single source of evidence (i.e., a single metric), while the lat-
ter consider a combination of multiple metrics. For example, the holistic inuence theories are composite by denition. Our
analysis focuses on atomic graphical and contextual metrics, provided that they draw evidence exclusively from the infor-
mation contained in individual topic communities.
The outline of our two-dimensional taxonomy is depicted in Fig. 1 along with the inuence theories for Twitter that have
been mapped to it.4 We observe that some types of inuence theories have not been explored in the literature yet at least not
in the context of Twitter. For example, there is no glocal theory that relies on contentual evidence. In contrast, the majority of
Twitter theories employ holistic metrics for assessing inuence and are local with respect to scope. In the following, we delve
into the internal functionality of each theory in order to decide which ones are suitable for our analysis.
In Cha et al. (2010), the authors examine a series of atomic inuence theories that employ local and global evidence. Their
work actually investigates the impact of ordinary users as opposed to that of prominent Twitter users, such as celebrities.
The local inuence theories they consider are the Indegree, the Mentions and the Retweets Inuence Theories, which are
analytically presented in Section 3.3.
In Bakshy et al. (2011), the authors consider inuence as the users ability to post URLs which diffuse through their fol-
lowers into the entire social graph of Twitter. This is a global theory that employs a regression tree model for predicting a

4
It is worth stressing that our taxonomy can be generalized to accommodate any inuence theory, regardless of the social network it is crafted for. For
example, Choudhury et al. (2010) introduce a generic glocal inuence theory based on holistic metrics, while Tang, Sun, Wang, and Yang (2009) and Bodendorf
and Kaiser (2009) present generic local inuence theories that involve composite graph metrics. However, an exhaustive categorization of all inuence theories
for OSNs lies out of the scope of this work.

Please cite this article in press as: Kardara, M., et al. Large-scale evaluation framework for local inuence theories in Twitter. Information
Processing and Management (2014), http://dx.doi.org/10.1016/j.ipm.2014.06.002
M. Kardara et al. / Information Processing and Management xxx (2014) xxxxxx 5

Fig. 1. Two-dimensional taxonomy of the inuence theories for Twitter.

users inuence in terms of the average size of all the cascades she triggers. As features for this model, they use a combina-
tion of graphical and contentual metrics: the number of followers, the number of friends, the number of tweets, the date of
joining as well as the number of retweets by the authors immediate neighbors (past local inuence) and by any other Twitter
user (past total inuence). The outcomes of their experimental study suggest that the local past inuence and the number of
followers are the most reliable factors for determining inuence. The authors also conclude that marketing campaigns
should consider every individual as an inuencer, targeting many ordinary users in order to achieve high levels of adoption.
We examine this idea in the context of topic communities through the Random Inuence Theory that is presented in Sec-
tion 3.3. Note also that the inuence criterion based on the size of cascades is similar to the Retweets Inuence Theory
(cf. Section 3.3), when restricted in the borders of a topic community.
In Petrovic, Osborne, and Lavrenko (2011), the authors use a variety of contentual and graphical metrics in order to
determine the probability of a tweet to be retweeted. They found that the global social features of the author (especially
her indegree) have higher predictive accuracy than features extracted from the tweet itself. However, most of these global
metrics are incompatible with our topic-specic analysis. Only the indegree is employed by the Indegree Inuence Theory
(cf. Section 3.3), adapted, though, to the local settings of topic communities.
In Purohit and et al. (2012), the authors examine local inuence in the context of brand-pages, extracting features from
user interactions and user proles. The former include the number of retweets, the number of replies and the number of
mentions, while the latter involve the number of followers and the number of tweets on the topic. All these metrics are also
included in our analysis; for instance, the number of topic tweets lies at the core of the Tweets Inuence Theory, and the
number of replies is closely related to the Mentions Inuence Theory (cf. Section 3.3). In addition, the authors apply estab-
lished link analysis algorithms, such as HITS and PageRank, on the explicit social graph of Twitter as well as on the implicit
network that is formed by retweets. Their ndings suggest that these algorithms yield higher performance than user prole
features, especially for the explicit network of Twitter. However, the global evidence they employ is incompatible with our
topic-specic analysis. Most importantly, though, such graphical evidence is inappropriate in the context of Twitter
communities: their members participate in them through the content they publish and, thus, they are not necessarily
interlinked.
A graph-based approach is also taken by Li, Bhowmick, and Sun (2011). Their algorithm quanties the inuence of every
community member by using the underlying signed graph structure, which conveys the positive and the negative relation-
ships between all individuals. Given that Twitter is not an explicitly signed network, we cannot include this approach to our
analysis.
In Liu and et al. (2010), the authors introduced a probabilistic model for mining direct and indirect inuence between the
nodes of heterogeneous networks. Their model combines contentual and graphical metrics in order to predict the probability
that a user will retweet a friends post. It was experimentally evaluated over Twitter and two other social networks, with the
outcomes verifying signicant improvements in predictive accuracy. This method, though, operates on the level of individual
links between users; thus, its goal differs from that of our analysis, which aims at evaluating the accuracy of local inuence
theories in detecting highly inuential community members.
Finally, Weng et al. (2010) introduced a novel glocal inuence theory that is based on a modied version of Googles Page-
Rank, intuitively called TwitterRank. To measure the local inuence of individuals, it estimates the topical similarity between
users based on global information about all other topic communities they participate. Apart from contentual evidence, it also
considers the link structure of the entire social graph of Twitter. As explained above, though, such global and glocal evidence
are incompatible with our analysis, which exclusively considers inuence theories that are based on information contained
within an individual topic community.

Please cite this article in press as: Kardara, M., et al. Large-scale evaluation framework for local inuence theories in Twitter. Information
Processing and Management (2014), http://dx.doi.org/10.1016/j.ipm.2014.06.002
6 M. Kardara et al. / Information Processing and Management xxx (2014) xxxxxx

2.2. Predicting peer pressure

An important problem in social network analysis is the task of predicting the effect of social inuence on the behavior of
associated users. The goal is actually to forecast the actions of individual users with the help of an inuence theory and the
historical data about their own activity and that of their afliated users.
In Goyal, Bonchi, and Lakshmanan (2010), the authors proposed various static and time-dependent models that calculate
the probability that a user will follow her neighbors into joining a particular group as well as the time this action will take
place (within tight margins). Their models take into account both the topology of the social network and the history of users
activity. They were tested on a real-world dataset from Flickr,5 yielding high accuracy.
In Tan and et al. (2010), the authors proposed a Noise Tolerant Time-varying Factor Graph Model (NTT-FGM) for modeling
and predicting social actions. For higher predictive accuracy, NTT-FGM considers user attributes in addition to the structure
of the social network and the history of user actions. Authors also introduced the notion of latent state, which indicates how
likely a user is to perform an action at a specic point in time. The model was evaluated over real-world datasets from three
heterogeneous social networks and was found to consistently outperform the baseline methods.
In Cosley and et al. (2010), the authors evaluate two denitions of inuence with respect to the probability of behavior
adoption as a result of peer pressure. The rst denition is based on snapshot observations of the network at the time right
before the individual decides whether to adopt a behavior, while the second denition relies on detailed temporal dynamics.
The relationship between those two types of inuence was examined using data extracted from Wikipedia, with the out-
comes conrming the hypothesis of a direct link between them.

2.3. Applications of inuence theories

The notion of social inuence constitutes the cornerstone of many business applications. The goal is actually to employ
inuence theories in advertising and marketing campaigns so as to increase their efciency, achieving similar levels of
adoption at a lower cost. This problem has been formalized as the inuence maximization problem, i.e., the task of identifying
a small subset of users that could maximize the spread of inuence over the social network (Chen, Wang, & Wang, 2010). In
the context of viral marketing, this problem translates into maximizing the consumption of a product by targeting a small
number of highly inuential users and giving them incentives to adopt it. The small number of users keeps the budget low,
increasing the cost-effectiveness of marketing campaigns, while their high inuence capacity ensures large product diffusion
through word-of-mouth advertising.
In Domingos and Richardson (2001), the authors viewed the market as a social network that can be modeled as a Markov
random eld. They also introduced the notion of a customers network value, which expresses the expected prot from sales
to other persons that have been inuenced by a particular customer. In Kempe, Kleinberg, and Tardos (2003), the authors
considered the problem of selecting the most inuential nodes and studied it in the context of the most widely adopted mod-
els in social network analysis. They proved that, under these settings, it constitutes an NP-hard optimization problem and
provided approximation guarantees for efcient algorithms. This work has been the basis of several more recent studies
on the subject.
On another line of research, Cha et al. examined inuence dynamics in Twitter and discovered that the top inuential
users get disproportionately more references than the ordinary users (Cha et al., 2010). They conclude, therefore, that the
spread of information can be maximized by targeting a small number of opinion leaders. Their ndings are in accordance
with the traditional view on inuence diffusion (Katz et al., 2005), but contradict modern theories that emphasize the role
of interpersonal relationships among ordinary users (Bakshy et al., 2011). The latter works actually claim that marketing
campaigns should target a large number of plain users in order to be successful.
Also related to inuence maximization is the diffusion of innovation. In Ma, Yang, Lyu, and King (2008), the authors pro-
pose a novel way of tackling this problem through the heat diffusion theory of physics. They develop three diffusion models
along with specialized algorithms for selecting the best individuals to receive marketing samples. Their outcomes verify that
their framework outperforms previous work and that it is particularly effective in capturing diffusion of negative informa-
tion. In Luu, Lim, Hoang, and Chua (2012), the authors examine macro-level diffusion with regard to behavior adoption using
topology characteristics, such as degree distribution, along with the word-of-mouth inuence stemming from neighboring
nodes.

2.4. Homophily vs. social inuence

The above works primarily considered social inuence as the sole drive that persuades individuals to change their atti-
tude so as to conform to the activity of their neighbors. However, this is not the only source of social correlation. The same
effect can actually stem from homophily, a phenomenon that has been analytically examined in McPherson, Smith-Lovin, and
Cook (2001). This work advocates that social ties are highly likely to be formed between individuals sharing several charac-
teristics of their personality, such as beliefs, interests and demographic background (e.g., age, race and location). It is crucial,

5
http://www.ickr.com.

Please cite this article in press as: Kardara, M., et al. Large-scale evaluation framework for local inuence theories in Twitter. Information
Processing and Management (2014), http://dx.doi.org/10.1016/j.ipm.2014.06.002
M. Kardara et al. / Information Processing and Management xxx (2014) xxxxxx 7

therefore, to distinguish social inuence from homophily and other unobserved confounding variables that can induce sta-
tistical correlation between friends in a social network. This task lies at the core of several works (Anagnostopoulos, Kumar,
& Mahdian, 2008; Aral, Muchnik, & Sundararajan, 2009; Fond & Neville, 2010).
Other works have examined the dynamics of social inuence with respect to homophily. In Holme and Newman (2006),
the authors proposed a model for combining these two factors of social correlation and explained how their balance can be
controlled through a single parameter. The authors studied how homophily-based and inuence-based dynamics affect the
clustering of the network, concluding that homophily results in a large number of small clusters, while social inuence
dynamics generate large and coherent clusters. In Scripps, Tan, and Esfahanian (2009), the authors examined how homophily
and inuence affect the modeling of dynamic networks and introduced formal denitions for both factors. They also devel-
oped metrics for assessing the alignment between links and attributes under different strategies for using the historical net-
work data. Their outcomes demonstrate that the importance of individual attributes in forming links changes over time.

3. Problem formulation

In this section, we formally dene our evaluation framework along with the fundamental notions that are related with it.
We begin with the basic parts of Twitter that lie at the core of local inuence theories, we continue with the formalization of
the theories that participate in the evaluation and conclude with the denition of the ve conditions that comprise our
framework.

3.1. Twitter overview

Twitter constitutes one of the most popular micro-blogging services, encompassing a user base of more than 500 million
registered users that post more than 340 million short messages per day.6 It is also one of the most popular OSNs among
researchers studying inuence diffusion (Bakshy et al., 2011; Cha et al., 2010; Weng et al., 2010). Responsible for Twitters pop-
ularity are the unique characteristics that lie at its core:

1. Users are only allowed to post short messages of up to 140 characters, which are called tweets. This urges content pro-
viders to put all their talent into creating original, self-contained and witty messages that require the minimum attention
and time from their readers. Thus, tweets can be easily understood, memorized and shared with other people, strongly
resembling marketing slogans.
2. Twitter accounts are publicly available by default, thus encouraging and supporting the interaction between users. Any
user can freely follow (i.e., register to) the accounts of others in order to receive their latest tweets. By following a spe-
cic user u, the subscriber explicitly denotes that u is of particular interest to them, either due to a common background
(e.g., a hobby) or because of content quality (e.g., news services).

Also crucial for the success of Twitter are the usage patterns that were established by its users:

1. Tweets can be easily categorized in topics that have been freely dened by other users. This is typically done by adding
usually at the end of a tweet one or more hashtags. These are annotations that consist of the symbol #, followed by one
or more words or alphanumerics concatenated together (e.g., #twitter). Hashtags can be used, therefore, to identify
groups of people that are interested in the same topic.
2. Twitter can operate as a platform for discussion among its members. By adding the annotation @gpapadis in their
tweet(s), a user u can directly address the user gpapadis, who can later respond back in the same way (i.e., @u). This anno-
tation is called mention and allows for detecting users that are engaged into bilateral discussions.
3. Users can share with their followers tweets that they nd quite appealing or interesting, but have been authored by other
users. This is typically done by posting the original tweet along with the special annotation RT @gpapadis so as to give
credit to its rst author gpapadis in our case. This practice is called retweeting and allows for tracking the diffusion of a
particular tweet in order to estimate its inuence. The larger the cascade it triggers (i.e., the more users retweet it), the
more inuential it is (Bakshy et al., 2011).

These Twitter practices lie at the core of our analysis of the interaction patterns between members of specic communi-
ties. For brevity, we call annotated tweets the messages that contain a mention or a retweet. We also call polarized tweets,
those tweets that express either a positive or a negative sentiment, as denoted by the appropriate emoticon. A positive tweet
contains either of the following smileys :), :-), :), :D or =), while a negative tweet is marked with :(, :-(,or :
( (Barbosa & Feng, 2010). Tweets that contain both a positive and a negative emoticon are ambiguous and, thus, we exclude
them from our analysis; they are quite rare, though, comprising just 1423 tweets out of the 6 million messages of our
benchmark communities (cf. Table 2).

6
http://en.wikipedia.org/wiki/Twitter accessed on November 06, 2012.

Please cite this article in press as: Kardara, M., et al. Large-scale evaluation framework for local inuence theories in Twitter. Information
Processing and Management (2014), http://dx.doi.org/10.1016/j.ipm.2014.06.002
8 M. Kardara et al. / Information Processing and Management xxx (2014) xxxxxx

Table 1
Summary of the main notations employed in the denitions of Sections 3.3 and 3.4.

Symbol Description
hx, yi A directed edge denoting that user x follows user y
hm, u, ti A triple denoting that message m was posted by user u at time t
GT, Gh, Gp The social graph of Twitter, topic community h, prominent group p
Gp @k The prominent group comprising the top-k most inuential users
VT, Vh, Vp The users/nodes of Twitter, topic community h, prominent group p
|VT|, |Vh|, |Vp| The size (number of users) of Twitter, topic community h, prominent group p
ET, Eh, Ep The social connections (edges) in Twitter, topic community h, prominent group p
N(h), M(Gp), M(u) The social metadata (tweets) of topic community h, prominent group p, user u
ASM(Gl) The value of the activity summary metric for (sub-)community Gl
PR(Gl) The value of the polarity ratio for (sub-)community Gl
M(Gl, ASM, ti, tj) The social metadata of Gl posted during the time period [ti, tj] that are relevant to ASM
Neg(Gl), Pos(Gl) The negative/positive social metadata produced in (sub-)community Gl
CAC(Gl, ASM, t1, t2) The value of the cumulative activity curve w.r.t. to ASM for Gl in the interval [t1, t2]
Frnd The random inuence theory
Find The indegree inuence theory
Fmnt The mentions inuence theory
Frtw The retweets inuence theory
Ftwt The tweets inuence theory

Table 2
Technical characteristics of the 75 topic communities comprising the benchmark dataset of our experimental study.

Min. Mean Median Max. Total


Users 502 8851 5901 88,156 663,855
Users in the graph 30.89% 48.67% 48.94% 69.98%
Tweets 7771 80,148 39,603 1,191,345 6,011,101
Tweets per user 3.26 8.35 6.60 30.33
Days active 92 195 203 207
Negative tweets 38 668 273 12,902 50,080
Positive tweets 28 917 669 8501 68,751
Internal mentions 548 8419 3438 172,432 631,415
Internal retweets 570 18,998 5661 471,588 1,424,835

It is worth stressing at this point that more advanced methods have been proposed in the literature for automatically
detecting the sentiment of a tweet in a more comprehensive way the interested reader can refer to (Tsytsarau &
Palpanas, 2012) for more details. However, applying a more elaborate classication scheme for sentiment analysis lies
out of the scope of this work. The reason is that such techniques are typically language- and application-specic, as they have
to overcome the inherent challenges of the user-generated content posted in Twitter (Giannakopoulos et al., 2012):

1. High levels of noise in the form of spelling mistakes and missing or incorrect information.
2. Sparsity, as the size limitations minimize the information conveyed by individual messages.
3. Multilinguality, since a single message may contain words in multiple languages. Even when a single language is used,
there are no metadata to indicate it or to facilitate its automatic identication.
4. Evolving, non-standard vocabulary, in the form of slang words and dialects that are commonly used in the casual com-
munication between OSN users.

In the lack of any generic approach to sentiment classication in Twitter, we employ emoticons for identifying the sen-
timent of individual tweets. This approach overcomes the above challenges and is generic enough to apply to any topic com-
munity. It is also reliable enough for illustrating the functionality of our evaluation framework, as it is typically employed in
the literature for building the ground-truth of polarized content (Go, Bhayani, & Huang, 2010).

3.2. Basic denitions

In order to facilitate the understanding of the following formulations as well as of the analyses presented in the subse-
quent sections, we rst summarize the main notation they use in Table 1.
The members of Twitter and the connections between them are typically modeled through a graph. Each user is repre-
sented by a node, and every relationship between two users is denoted by an edge connecting the corresponding nodes. In
fact, every edge hx, yi is directed, pointing from follower x (i.e. subscriber) to followee y (i.e., content provider). This abstraction
effectively captures the topology of the underlying social network, but leaves out its activity. To model the latter, we

Please cite this article in press as: Kardara, M., et al. Large-scale evaluation framework for local inuence theories in Twitter. Information
Processing and Management (2014), http://dx.doi.org/10.1016/j.ipm.2014.06.002
M. Kardara et al. / Information Processing and Management xxx (2014) xxxxxx 9

additionally employ the notion of social metadata, i.e., the collection of tuples that record all content produced by Twitter
users over time. An individual tuple is represented as hm, u, ti, where m is a message posted on-line by user u at time t.
Plugging these two abstractions together, we formally dene the social graph of Twitter as follows:

Denition 3.1. The social graph of Twitter is a graph GT = {VT, ET, M}, where VT is the set of vertices representing its users, ET
is the set of directed edges denoting the social connections between them, and M is a function that associates every user
u e VT with the messages she has posted: M(u) = {hm, u, ti}.
M(u) encapsulates the entire writing activity of user u, which is called user metadata. It should be distinguished from the
metadata of topic h, which is dened by a function N(h) and comprises all messages that contain a specic hashtag h along
with their authors and the time they were posted. More formally:
Nh fhm; u; ti : hashtagm; h trueg;
where hashtag(m, h) is a boolean function that returns true if the message m contains the hashtag h and false otherwise.
In general, the set of all tweets sharing a specic hashtag form a topic community in Twitter that comprises their authors,
as well. This approach has some limitations, mainly the fact that it fails to capture topic related activity that is not denoted
with a specic hashtag. Nevertheless, the practice of using hashtags to dene a topic community is advocated by several
works in the eld: in Ernesto and et al. (2012), the authors use hashtags as surrogates for topics, in Pal and Counts
(2011) the usage of relative hashtags serves as a quantier of the users relevance to a specic topic, while in Bruns and
Burgess (2011), the authors examine the dynamics of politics-related topic communities as dened by hashtags. We follow
the same convention and dene a topic community as the metadata of topic h together with the social connections between
its authors. More formally:

Denition 3.2. Given the social graph of Twitter GT along with a label h, the topic community corresponding to h is a sub-
graph Gh = {Vh, Eh, N}, where N(h) is the set of topic metadata, Vh is the set of all users in VT that have posted at least one
message with the label h (i.e., Vh = {u e VT:M(u) \ N(h) 0} and Eh comprises all the edges of ET with both adjacent vertices
contained in Vh (i.e., Eh = {hu1, u2i e ET:u1 e Vh ^ u2 e Vh} # ET). The number of users participating in the community is
referred to as the size of the community and is denoted by |Vh|.
The members of a topic community typically differ in the degree of inuence they exert over their peers. Some users are
rather passive, while others excel in some aspect of the community, affecting the behavior of other members and setting
relevant trends. We call inuencers or prominent users those members that have established a prominent position inside a
community. Collectively, the inuential users of a specic community comprise a sub-community that is called prominent
group. Formally, this group is dened as follows:

Denition 3.3. Given a topic community Gh, its prominent group is a sub-graph Gp = {Vp, Ep, M}, where Vp is the set of
prominent users contained in Vh, Ep is the set of the edges between them in Eh (i.e., Ep = {hu1, u2i e Eh:u1 e Vp ^ u2 e Vp}) and M
is the function that returns the metadata of a prominent user (i.e., the same function as in Denition 3.1). The size |Vp| of the
prominent group is called prominent size.
Note that for simplicity, the overall metadata of a prominent group Gp are denoted by M(Gp); that is, MGp [u2V p Mu.
Note also that we generalize this notation to any (sub-)community Gl, i.e., to any sub-graph of Twitters social graph that
encompasses a specic set of users along with their social connections and the messages they have authored. In the follow-
ing, we use the term (sub-)community in order to collectively refer to an entire topic community and to its prominent
groups.

3.3. Local inuence theories

A local inuence theory is a formal model that aims at detecting the inuencers inside a topic community. To this
end, it associates every user with an inuence ratio, i.e., an ordinal value that denotes their individual degree of inu-
ence. As an example, consider an inuence ratio that is equal to the local indegree of a user, i.e., the number of followers
she has within the community. After mapping users to a scale of inuence ratio, an inuence theory sorts them in
descending order, and those placed at the top k ranking positions form the communitys k-prominent group symbol-
ized by Gp @k.
More formally, a local inuence theory is dened as follows:

Denition 3.4. Given a topic community Gh, a local inuence theory is a mapping of its members Vh to a totally ordered set
Oh, which assigns an inuence ratio to each member u e Vh through a function F:Vh  Gh ? R and ranks them in descending
order according to the following implication: F(ui, Gh) 6 F(uj, Gh) M o(ui) P o(uj), where o(ul) denotes the ranking position of
the user u1 in Oh. The lower the ranking position o(ul) assigned to user ul, the higher is her inuence, with o(ul) = 1
corresponding to the most inuential user.
Two points are worth clarifying with respect to this denition:

Please cite this article in press as: Kardara, M., et al. Large-scale evaluation framework for local inuence theories in Twitter. Information
Processing and Management (2014), http://dx.doi.org/10.1016/j.ipm.2014.06.002
10 M. Kardara et al. / Information Processing and Management xxx (2014) xxxxxx

 The inuence ratio assigned to a particular user may depend not only on her own activity (i.e., user metadata), but also on
information drawn from the entire community (e.g., her local indegree).
 Any evidence that stems outside the topic community is ignored. For example, any external user uj R Gh that follows a
specic community member ui e Gh is not considered in the local indegree of ui, until uj joins Gh.

Based on Denition 4.4, we now formalize the local inuence theories that were selected in the Section 3.1 for our large-
scale evaluation:

1. The Random Inuence Theory (Frnd) argues that every member of a topic community is an inuencer. All users share,
therefore, the same level of inuence and are assigned to the same inuence ratio:
F rnd u; Gh c8u 2 V h :
2. The Indegree Inuence Theory (Find) is based on the rationale that the larger audience a person has, the more people are
likely to adopt her opinions and, thus, the larger is her inuence. Therefore, the corresponding inuence ratio is analogous
to the popularity of a user in terms of the number of her followers inside the topic community (i.e., the local indegree of
her node). More formally:
F ind u; Gh jfhx; ui 2 Eh gj:
3. The Mentions Inuence Theory (Fmnt) associates the inuence of a user with her ability to get involved in discussions
with other members of the community. The more sociable a user is, the more people get in contact with her and the
higher is her inuence. To quantify this notion, the inuence ratio of a specic user is set equal to the number of internal
mentions that pertain to them:
 
F mnt ui ; Gh j hm; uj ; ti 2 Nh ^ em; ui true ^ ui uj j;
where e(m, ul) is a boolean function that returns true if tweet m mentions user ul and false otherwise.
4. The Retweets Inuence Theory (Frtw) assumes that the inuence of a user depends on the value of the content she posts
on-line. The more interesting her tweets are, the more people are likely to read it and the higher gets her inuence. This
subjective measure can be practically inferred from her internal retweet frequency, i.e., how many times her posts have
been reproduced by her peer members in the topic community. More formally:
 
F rtw ui ; Gh j hm; uj ; ti 2 Nh ^ rm; ui true ^ ui uj j;
where r (m, ul) is a boolean function that returns true if a tweet m is a retweet of a message originally posted by ul and false
otherwise.
5. The Tweets Inuence Theory (Ftwt) regards the volume of content that is produced by a particular user as a strong evi-
dence for her level of inuence. The more prolic a user is, the higher is the likelihood that a community member will
read their posts and the more inuential they are. In this context, the inuence ratio of a user u is equal to the number
of tweets on the topic she has posted on-line:
F twt u; Gh jMu \ Nhj:
Having formalized the inuence theories of our analysis, we are now able to build a principled framework for evaluating
their predictive accuracy.

3.4. Evaluation framework

The behavior of prominent users is rather crucial for a topic community, as their actions and opinions are imitated and
propagated by a considerable part of their peer community members even those out of their direct reach. This phenom-
enon is so intense that some aspects of a communitys overall activity can be considered as an extrapolation of the corre-
sponding behavior of inuencers. To quantitatively measure its extent, we need to compare the aggregate activity of
prominent users with that of the entire community on the basis of an objectively measurable metric. Such a measure takes
as input (part of) a social graph and maps the corresponding activity to a numerical value. We call it activity summary metric
and formally dene it as follows:

Denition 3.5. Given a (sub-)community Gl, an Activity Summary Metric (ASM) is a function that maps Gl to the space of
the real numbers, based on a particular subset of the information it conveys.
This is a general denition that can accommodate various metrics. As an example, consider a function that estimates the
frequency of occurrence of specic topic-related terms in the tweets of the community members. Given, though, that such ASMs
are both language- and topic-specic, the noisy, multi-lingual content of Twitter renders them incompatible with our evalua-
tion framework, which makes no assumption about the quality and the characteristics of the topic communities it uses as
benchmark data. Given also that the goal of this analysis is to illustrate the functionality of our framework, we exclusively con-
sider as ASM a measure that relies on polarized content, as it is dened in Section 3.1. In essence, it quanties the aggregate
opinion expressed by the tweets of a community or a part of its users. We call it Polarity Ratio (PR) and dene it as follows:

Please cite this article in press as: Kardara, M., et al. Large-scale evaluation framework for local inuence theories in Twitter. Information
Processing and Management (2014), http://dx.doi.org/10.1016/j.ipm.2014.06.002
M. Kardara et al. / Information Processing and Management xxx (2014) xxxxxx 11

Denition 3.6. Given a (sub-)community Gl, its polarity ratio PR(Gl) is dened as follows:
8
< jPosGl j1  1; if jNegGl j 6 jPosGl j
jNegG j1
l
PRGl
: jNegGl j1 1; if jPosGl j < jNegGl j
jPosG j1
l

where Neg(Gl) # M(Gl) and Pos(Gl) # M(Gl) stand for the subsets of social metadata of Gl that correspond to negative and
positive tweets, respectively as dened in Section 3.1 on the basis of emoticons (i.e., in a language-independent, noise-tol-
erant way). |Neg(Gl)| and |Pos(Gl)| represent their cardinality, with |Neg(Gl)| + |Pos(Gl)| 6 |M(Gl)|.
PR(Gl) takes values in the interval (|M(Gl)|, +|M(Gl)|), with the positive ones suggesting the prevalence of positive tweets
and vice versa. In more detail, a positive value n suggests that the positive tweets outnumber the negative ones by a factor of
(approximately) n + 1; the opposite applies to negative values. Neutral sentiments correspond to a balance between negative
and positive tweets (i.e., |Neg(Gl)|  |Pos(Gl)|), thus yielding values very close to 0.
Based on the notion of ASM and its instantiation through PR, we now introduce the formal denition of our large-scale
evaluation framework for inuence theories. In more detail, we deem that the following conditions provide strong indica-
tions for true inuence exertion when they are satised in conjunction:

(i) Correlation condition. The output of an ASM allows for estimating the proximity of prominent users behavior with
that of the entire community. A high correlation between ASM(Gp) and ASM(Gh) means that the behavior of the prom-
inent group is representative of the entire topic community. In fact, the closer their values are over time, the higher is
the impact of the prominent users on the community. We formally denote this condition as follows:
ASMGp  ASMGh :
As an example, consider the polarity ratio: real inuencers are expected to determine the overall opinion of a community in
an unequivocal way and, thus, PR(Gp) and PR(Gh) should share the same sign and similar magnitudes.
(ii) Size condition. The relative size of the prominent group (i.e., |Vp|) plays a crucial role in the correlation between
ASM(Gp) and ASM(Gh): the more users we consider as inuencers, the closer is the behavior of the prominent group
to the aggregate behavior of the community. The perfect match (i.e., ASM(Gp) = ASM(Gh)) actually corresponds to
the extreme case, where Gp comprises the entire community Gh. In practice, though, the smaller the prominent group
is, the more cost-effective is the corresponding application (e.g., viral marketing). Thus, it is fundamental to consider
prominent groups with a size that is merely a fraction of the entire community. This condition is denoted as follows:
jV p j  jV h j:
(iii) Volume conditions. The volume of the content produced by a prominent group is a crucial factor in determining its
actual inuence over the entire community. Imagine a community where the polarized tweets considered by PR are
monopolized by a handful of users, while the rest of its members are rather parsimonious, contributing less than 10%
of the polarized content; although there can be no real inuence relationship, the polarity ratio inevitably yields a high
correlation between PR(Gp) and PR(Gh). It is evident, therefore, that a prominent group should account for a mere frac-
tion of the communitys content that pertains to the selected ASM. In other words, a prominent group cannot comprise
hyperactive users that overwhelm their peers with their content. Instead, eligible prominent users should account for
a mere fraction of the communitys overall content. In more detail, our framework requires that real prominent users
satisfy the following two conditions with respect to the volume of their content:
 The overall volume condition demands that eligible prominent users contribute a small portion of the entire com-
munity metadata, i.e., |M(Gp, t1, t2)|  |M(Gh, t1, t2)|, where M(Gl, ti, tj) denotes the set of metadata that were posted
by the users of Gl in the time interval [ti, tj] and |M(Gl, ti, tj)| stands for its cardinality.
 The metric volume condition requires that eligible prominent users produce tweets pertaining to the selected
ASM, but they amount to a small percentage of those community tweets that participate in the estimations of
ASM, i.e., 0 < |M(Gp, ASM, t1, t2)|  |M(Gh, ASM, t1, t2)|, where M(Gl, ASM, ti, tj) denotes the set of metadata considered
by ASM that were posted by the users of Gl in the time interval [ti, tj] and |Ml(Gl, ASM, ti, tj)| stands for its cardinality.
In the case of PR, the metric volume condition sets an upper bound on the portion of polarized tweets that are
posted by prominent users.
(iv) Rate condition. Another important aspect of real inuence exertion is the temporal relationship between ASM(Gp)
and ASM(Gh), since the content accounting for ASM(Gp) has to chronologically precede the content that pertains to
ASM(Gh). To understand this condition, imagine a small group of users that simply follows the overall sentiment of
the community and is conservative with publishing polarized tweets; this group satises all the above conditions
and, thus, it can be mistaken for a prominent group, even though it exerts no real inuence over the rest of the
community.

Given an ASM, we measure the relative rate of content production between two groups during a specic time interval
[t1, t2] through the cumulative activity curve, a metric that is dened as follows:

Please cite this article in press as: Kardara, M., et al. Large-scale evaluation framework for local inuence theories in Twitter. Information
Processing and Management (2014), http://dx.doi.org/10.1016/j.ipm.2014.06.002
12 M. Kardara et al. / Information Processing and Management xxx (2014) xxxxxx

Denition 4.7. Given an ASM along with a (sub-)community Gl that was active during the time interval [t1, t2], the
Cumulative Activity Curve (CAC) quanties the production rate of the content in Gl that is relevant to ASM and is dened as
follows:
Z t2
M l Gl ; ASM; t1 ; t
CACGl ; ASM; t 1 ; t2 dt:
t1 M l Gl ; ASM; t 1 ; t2
The monotonically increasing function of M(Gl, ASM, ti, tj) is suitable for capturing the early adopter effect. CAC actually
takes values in the interval [0, 1], with higher values indicating faster convergence to the nal size of the produced content.
In our settings that rely on PR, CAC expresses how fast the polarized content of a community Gh (or its prominent group Gp)
was produced during the time interval [t1, t2].
To understand the rationale behind this measure, consider the 2-dimensional space of Fig. 2: the horizontal axis (x) cor-
responds to time period between t1 and t2, and the vertical one (y) to the portion of polarized tweets. CAC essentially esti-
MGl ;ASM;t 1 ;t
mates the area under the curve that is formed by mapping to this space the value of the ratio MG ;ASM;t 1 ;t 2
between the times t1
l

and t2. The faster the curve converges to the line y = 1, the higher is the value of CAC and the higher is the rate of production.
The exemplary data that are presented in Fig. 2 actually illustrate the three general layouts of CAC; G1 represents a com-
munity that has produced most of its polarized content by the middle of the time period, taking a high CAC value:
CAC(G1, PR, t1, t2) = 0.70. In contrast, G3 indicates a user group that posted the vast majority of its polarized tweets at the
end of the examined time interval, taking a low CAC value: CAC(G3, PR, t1, t2) = 0.37. In the middle of these two extremes lies
G2, whose polarized content is evenly distributed among the examined period, leading to a balanced value for CAC:
CAC(G2, PR, t1, t2) = 0.50.
On the whole, the rate condition for real inuence exertion is formulated as:

CACGp ; ASM; t 1 ; t2  CACGh ; ASM; t 1 ; t 2 :


Note that CAC is by denition a normalized metric. Hence, it is suitable for comparing the activity of two user groups (e.g.,
Gh and Gp) that were active during the same time period on an equal basis: the group with earlier content production takes a
higher value for CAC, regardless of the relative size of their relevant metadata.
Plugging the above ve conditions together, we can now formalize the task of evaluating the performance of an inuence
theory as follows:

Problem 1. Given an ASM along with a topic community Gh that was active during the time interval [t1, t2], an effective local
inuence theory identies as prominent group a subset Gp  Gh that satises all of the following conditions:
1 jV p j  jV h j;
2 jMGp ; t1 ; t 2 j  jMGh ; t 1 ; t 2 j;
3 0 < jMGp ; ASM; t1 ; t 2 j  jMGh ; ASM; t1 ; t 2 j;
4 ASMGp  ASMGh ; and
5 CACGp ; ASM; t1 ; t 2  CACGh ; ASM; t 1 ; t2 :
The stronger these conditions hold, the more effective is the inuence theory.
Note that the above denition is generic enough to accommodate a wide variety of activity summary metrics. For the
aforementioned reasons, we exclusively consider the polarity ratio in the following. Note also that the last two conditions
should be applied only in case the other requirements hold for the prominent group(s) at hand. In fact, the rst three

1.00
0.90
0.80
0.70
G1
0.60
G2
0.50
G3
0.40
0.30
0.20
0.10
0.00

Fig. 2. Illustration of CAC with exemplary data from three communities G1, G2 and G3.

Please cite this article in press as: Kardara, M., et al. Large-scale evaluation framework for local inuence theories in Twitter. Information
Processing and Management (2014), http://dx.doi.org/10.1016/j.ipm.2014.06.002
M. Kardara et al. / Information Processing and Management xxx (2014) xxxxxx 13

conditions lter out prominent groups that are incompatible with our evaluation framework; we cannot draw safe conclu-
sions about prominent users that comprise a large part of a communitys user base or inuencers that are too prolic, dom-
inating the overall or the metric-specic content of the community. The same applies to prominent groups that have not
posted any relevant content for the selected ASM metric. In Section 4, we adhere to this execution order when applying
our framework to the above inuence theories.

4. Evaluation of local inuence theories

In this section, we put our evaluation framework into practice, comparing the inuence theories of the previous section
on the basis of Problem 1. First, we document the process we followed in the creation of our large-scale, real-world bench-
mark dataset. Then, we examine every condition individually, in the order it should be applied. The only exception is the
Size Condition: in its place, we consider the xed set of prominent sizes k e {10, 20, 50, 100}. We selected this particular set
of k values for a number of reasons: rst, it allows for comparing the inuence theories on an equal basis under several
different settings. Second, every value ki poses a suitable size for many applications and satises the condition ki  |Vh| for
every benchmark community Gh, as explained in Section 4.1. Third, sizes of ner granularity (e.g., k e {10, 20, 30, 40, . . .})
result in negligible variations, while sizes of coarser granularity (e.g., k e {10, 100, 1000, . . .}) miss a lot of valuable infor-
mation. In contrast, the selected range results in a balanced performance variation that allows for approximating the opti-
mal prominent size, i.e., the one that yields the best performance for Problem 1 across most of the benchmark
communities.
Note that in the following, we refer to the combination of an inuence theory with a specic prominent size as inuence
settings.

4.1. Twitter dataset

In our experimental study, we considered the Twitter dataset that was employed in Yang and Leskovec (2011). It contains
more than 475 million tweets that were posted by 17 million distinct users in a time period of 7 months from the begin-
ning of June, 2009 until the end of December, 2009. In total, it has recorded around 2030% of the entire Twitter activity
during that period, thus constituting a representative sample of Twitter activity that sufces for drawing safe conclusions
about the inuence patterns among its members. It lacks, however, any information about the social connections among
its users.
To cover this gap, we additionally consider the graph formed by the directed links between its users. We actually use the
snapshot of Twitters graph that was employed in Kwak, Lee, Park, and Moon (2010), which dates from August, 2009 and
chronologically coincides with the crawling period of the content we are considering. As a result, we were able to join
the two datasets through the usernames (i.e., Twitter accounts) they have in common almost half of each communitys
members were included in this snapshot, on average. This graph, however, lacks the temporal aspect of user connections,
i.e., the exact time they were formed as well as their evolution throughout the time period we are examining content-wise.
Consequently, the conclusions we draw with respect to the link structure of Twitter are an approximation of the actual
phenomena.
As explained in Section 3.2, topic communities are dened through hashtags. However, the use of a hashtag does not nec-
essarily entail any interaction with some other members of the topic community the author may not even see other tweets
with the same hashtag. To ensure a minimum level of social interaction within a topic community, we consider as eligible for
our study those communities that contain:

 more than 500 internal mentions (i.e., mentions to a community member posted by a fellow member), and
 at least 500 internal retweets (i.e., re-posts by community members of messages originally authored by a fellow member).

We additionally impose the following constraints in order to ensure that the communities we consider are large enough
to draw safe conclusions7:

 it contains more than 5000 distinct tweets,


 it comprises more than 500 users,
 it entails more than 500 polarized tweets.

Our content dataset contains more than 49 million of tweets marked with at least one hashtag, which correspond
to around 3 million distinct topics, in total. Among them, 728 communities satisfy all the aforementioned constraints.
To select the most vibrant ones, we ranked them in descending order of the average number of tweets posted by
each member. The higher this number is, the more active the individual members are expected to be. We then

7
Note that all thresholds were set to a rounded value that approximates the mean one across all topic communities.

Please cite this article in press as: Kardara, M., et al. Large-scale evaluation framework for local inuence theories in Twitter. Information
Processing and Management (2014), http://dx.doi.org/10.1016/j.ipm.2014.06.002
14 M. Kardara et al. / Information Processing and Management xxx (2014) xxxxxx

selected the top 100 ranking hashtags and carefully examined their content to ensure that they dene real topic
communities.
In this effort, we relied on previous works that evaluate hashtags with regard to their ability to dene actual topics. In
Zubiaga, Spina, Fresno, and Martnez (2011), the authors actually distinguish hashtags into those dening real-world events
and the memes, the latter being conversational topics or widespread annotations that are not related to actual events. As
examples consider the hashtags #ff, which serves for suggesting users to follow, and #job, which annotates tweets advertis-
ing a job opportunity. A similar approach is followed by Cui and et al. (2012). Based on this categorization, we eliminated
memes from our analysis, as they are unlikely to form real topic communities and, thus, are inappropriate for our evaluation.
We also eliminated hashtags that are too generic to dene a topic community, such as #web, as well as ambiguous hashtags
that refer to two or more topics. An example of the latter is #gr88, which stands for green revolution (the Persian calendar
year being 1388), but is also used as a reference to an online casino (gr88.com). The remaining 75 topic communities, after
the elimination, formed our benchmark data.8
Their technical characteristics are presented in Table 2. On average, every test-bed community was active for over
6 months and comprised almost 9000 distinct members. Approximately half of these users are also included in the dataset
considered in Kwak et al. (2010) which, as explained above, served as the basis for graph related information in our analysis.
Each community member individually posted more than 8 tweets on average. Content-wise, every community corresponds
to 80,000 tweets on average, 1/10 of which pertain to discussions between community members. The internal retweets cover
1/4 of the overall content, while 650 tweets are marked as negative and another 900 as positive. Note, though, that the higher
frequency of positive tweets does not imply that the balance is in their favor across all topics. In fact, the negative tweets
outnumber the positive ones in 1/3 of all communities.

4.2. Volume conditions

This section investigates the performance of the selected inuence theories with respect to the two volume conditions.
We start with the metric volume condition, which measures the portion of polarized content per topic community that
stems from the prominent users dened by each inuence setting.9 In fact, our goal is to distinguish prominent groups into
three major categories:

 Inactive prominent groups post neither positive nor negative tweets. Thus, they do not satisfy the rst part of the metric
volume condition (i.e., 0<|M(Gp, ASM, t1, t2)|) and cannot exert any real inuence.
 Hyperactive prominent groups monopolize the polarized content of a community, producing a disproportionately high vol-
ume of positive and negative tweets. The metric volume condition does not hold for them, since their inuence comes at
an excessively high cost. In this study, we instantiated the second part of the metric volume condition as:

4 jMGp ; ASM; t1 ; t 2 j 6 jMGh ; ASM; t 1 ; t2 j:


This means that we consider a prominent group as hyperactive if it produces more than 25% of the overall polarized con-
tent. In such cases, the remaining polarized content may be insufcient for drawing safe conclusions about the mood of the
rest of the community members.

 Valid prominent groups are neither inactive nor hyperactive, thus satisfying the metric volume condition.

The outcomes of this analysis across all inuence settings are presented in Table 3. Note that in order to draw safe con-
clusions for Frnd, we applied it 1000 times for each prominent size k, employing in each iteration a random sample of k users
from every community.
We observe that the inactive prominent groups are rather scarce, as they merely appear in a dozen of topic communities
for k = 10. With the increase of k, though, increases the contribution of inuencers to the polarized content. As a result, inac-
tive prominent groups are reduced to just 5 for k = 20 and completely disappear for larger prominent sizes. In the same vein,
the increase in k increases the portion of hyperactive groups. In the case of Fmnt and Frwt, they remain a minority for prom-
inent sizes up to 50, but become the majority for k = 100. The relation between the mean and the median values actually
suggests that the distributions of these theories have a positive skew (i.e., towards smaller portions of polarized content).
For k = 100, though, the mean and the median values slightly exceed our limit of 25%, thus dividing evenly the individual
topic communities into valid and hyperactive ones. In contrast, the majority of the prominent groups dened by Ftwt are

8
Note that we cannot release publicly the actual content of the communities forming our benchmark dataset, due to restrictions imposed by Twitter. A
complete list of the topics it includes along with the detailed outcomes of the statistical analyses that are presented in the following can be found at: http://
l3s.de/~papadakis/EvaluationFrameworkStatisticalAnalysis.xlsx.
9
A similar analysis was conducted in Kardara and et al. (2012) as well, with the difference that for every inuence setting, it considered the aggregate
polarized content produced by the prominent groups across all topic communities. The problem with this approach is that it hides a lot of interesting
information, such as the inactive and the hyperactive prominent groups. Note also that there are many typos in Table 5 of Kardara et al. (2012), because the
numbers presented in the columns from 2 (Positive Tweets) to 6 (Retweets) have not been multiplied by 100, although they are presented as percentages.
Disregarding these typos, the two analyses lead to similar conclusions.

Please cite this article in press as: Kardara, M., et al. Large-scale evaluation framework for local inuence theories in Twitter. Information
Processing and Management (2014), http://dx.doi.org/10.1016/j.ipm.2014.06.002
M. Kardara et al. / Information Processing and Management xxx (2014) xxxxxx 15

Table 3
Portion of the polarized content produced by the prominent groups across all inuence settings.

Min. Median Max. Mean SD Inactive Hyperactive


Gp @10 Fmnt 0.04 7.33 61.95 10.54 11.92 0 9
Frtw 0.00 6.70 62.30 10.54 11.86 2 9
Ftwt 0.66 18.61 82.11 22.70 18.00 0 25
Find 0.00 0.00 0.27 0.02 0.05 15 0
Frnd 0.01 0.14 3.84 0.33 0.54 0 0
Gp @20 Fmnt 0.06 11.18 63.54 14.99 14.37 0 14
Frtw 0.06 12.44 63.54 15.11 13.01 0 12
Ftwt 0.85 26.95 82.40 30.95 19.49 0 41
Find 0.00 0.01 0.43 0.04 0.08 5 0
Frnd 0.01 0.36 3.93 0.58 0.64 0 0
Gp @50 Fmnt 0.34 19.26 71.89 22.04 16.86 0 25
Frtw 0.23 19.60 65.31 22.35 15.25 0 28
Ftwt 4.05 40.68 83.60 43.36 20.06 0 60
Find 0.00 0.04 0.86 0.09 0.15 0 0
Frnd 0.03 0.96 10.37 1.61 1.80 0 0
Gp @100 Fmnt 0.40 26.82 78.49 29.50 19.40 0 39
Frtw 1.95 25.43 66.55 27.47 15.92 0 41
Ftwt 15.59 52.16 89.98 52.77 19.73 0 68
Find 0.00 0.07 1.15 0.15 0.21 0 0
Frnd 0.12 2.00 19.08 3.01 3.01 0 0

hyperactive already for k = 20; for larger prominent sizes, the portion of valid prominent groups is reduced to less than 1/5.
This should be expected, since its prominent users are quite prolic by denition, contributing large volumes of polarized
content, as well.
Consistent exceptions to these patterns are both Find and Frnd: none of the prominent groups they dene across all inu-
ence settings is hyperactive. Hence, they satisfy the metric volume condition in all cases except for the few inactive groups
that are dened by Find for small prominent sizes.
We continue our analysis with the overall volume condition, which estimates the portion of all tweets that is authored by
the prominent users. In this case, our goal is to distinguish the prominent groups into hyperactive and valid ones.10 As hyper-
active, we consider prominent users that have posted more than 25% of the overall content of the community, while the remain-
ing prominent groups are considered valid. Apparently, among the two categories, only the valid groups of inuencers are
compatible with our evaluation framework.
The outcomes of this analysis are presented in Table 4. Note that the performance of Frnd was estimated using the same
1000 random samples of k users from each topic community as in Table 3. We observe that all theories exhibit similar behav-
ior with the metric volume condition: all prominent groups dened by Find and Frnd are valid across all inuence settings,
while Ftwt is dominated by hyperactive groups already for k = 20. For Fmnt and Frwt, the portion of hyperactive groups
increases with larger prominent sizes, but is restricted to less than 1/3 for k up to 50. For k = 100, though, their prominent
groups are evenly partitioned between valid and hyperactive ones.
On the whole, we can conclude that Ftwt is incompatible with our evaluation framework; it denes hyperactive groups of
inuencers even for small prominent sizes, violating both volume conditions. For this reason, we exclude it from the analyses
of the correlation and the rate condition. In contrast, Frnd and Find consistently satisfy both volume conditions across all inu-
ence settings. Fmnt and Frwt exhibit similar behavior, dening valid prominent groups with respect to both volume conditions
across most topic communities and inuence settings. To draw safe conclusions about their performance, we exclude from
the remaining analyses all those communities that correspond to hyperactive groups for either inuence theory and for any
prominent size. Out of the 75 topic communities considered in our study, 30 of them consistently satisfy both volume
conditions for the selected inuence theories. We exclusively consider these communities in the following. Note, though,
that Find denes inactive prominent groups for some of these communities: there are 10 such communities for k = 10, while
for k = 20 they are reduced to 4. In the following analyses, these communities are not taken into account when estimating the
performance of Find.

4.3. Correlation condition

The goal of this section is to investigate to which extent the aggregate sentiment of prominent groups matches that of
their peer community members. As explained above, we quantify this relation through the polarity ratio of the correspond-

10
Note that there is no point in dening inactive prominent groups with respect to the overall volume condition. By denition, every user that participates in
a community has posted at least one relevant tweet and, thus, every prominent group has produced at least as many tweets as its prominent size regardless of
the inuence theory that denes it.

Please cite this article in press as: Kardara, M., et al. Large-scale evaluation framework for local inuence theories in Twitter. Information
Processing and Management (2014), http://dx.doi.org/10.1016/j.ipm.2014.06.002
16 M. Kardara et al. / Information Processing and Management xxx (2014) xxxxxx

Table 4
Portion of the overall content produced by the prominent groups across all inuence settings.

Min. Median Max. Mean SD Hyperactive


Gp @10 Fmnt 0 6.41 45.68 9.38 9.26 5
Frtw 0.63 6.85 50.01 9.84 9.58 5
Ftwt 4.16 20.25 55.54 22.96 11.90 25
Find 0.00 0.01 0.35 0.02 0.05 0
Frnd 0.01 0.15 2.18 0.28 0.32 0
Gp @20 Fmnt 0.89 10.32 56.89 12.88 11.18 7
Frtw 2.65 10.91 55.14 13.76 10.74 8
Ftwt 6.88 27.84 65.57 30.50 13.62 42
Find 0.00 0.01 0.43 0.04 0.08 0
Frnd 0.02 0.35 3.91 0.56 0.60 0
Gp @50 Fmnt 2.41 16.91 70.00 19.59 13.98 17
Frtw 4.66 19.39 63.38 20.85 12.48 21
Ftwt 12.93 40.07 81.71 42.18 15.35 65
Find 0.00 0.04 0.86 0.09 0.15 0
Frnd 0.05 0.86 10.28 1.42 1.53 0
Gp @100 Fmnt 4.80 22.95 79.45 26.46 16.18 34
Frtw 6.45 25.68 72.14 26.40 13.09 40
Ftwt 19.65 49.21 90.96 51.56 15.61 74
Find 0.00 0.07 0.98 0.12 0.14 0
Frnd 0.13 1.73 19.37 2.89 2.96 0

ing user groups. In fact, our goal is to estimate the correlation between PR(Gh) and PR(Gp) across all benchmark communities
for all inuence settings.
To this end, we rst dened two variables for every inuence theory Fi and prominent size kl: Xi,l denotes the polarity ratio
of the corresponding prominent groups over the 30 valid topic communities, while Yi,l represents the polarity ratio of the
remaining community members. We then estimated the Pearson correlation coefcient11 qX i;l ;Y i;l for every pair of the
variables Xi,l and Yi,l; the higher the value of qXi;l ;Y i;l is, the stronger holds the correlation condition for the corresponding inu-
ence settings.
The results of our analysis are presented in Table 5. A detailed breakdown of PR(Gh) and PR(Gp) per topic community for
the inuence settings we are considering is given in Appendix A. Note that for Frnd and for each prominent size k, we con-
sidered the same 1000 random samples of k users from every community that were used in the previous section.
We observe that there is a positive correlation for all inuence settings, which increases proportionally with the increase
of k: the larger the prominent groups are, the larger is the portion of polarized tweets they produce and, thus, the closer is
their aggregate sentiment to that of the entire community (see next section for more details). We can deduce, therefore, that
small prominent groups are less accurate in predicting the overall mood of their peers.
Comparing the individual inuence theories across the four prominent sizes, we observe that the correlation coefcient
for Frnd is lower than the other inuence theories by 1/3 across most prominent sizes; there is a single exception for k = 10,
where Fmnt exhibits an even lower correlation. Frtw and Find take similar values in most cases, followed by Fmnt in close dis-
tance. The overall highest value corresponds to Frtw and is very close to 1, denoting an almost linear relationship between the
aggregate sentiment of the prominent users and the rest of the community.
In the following section, we examine whether these correlations emanate from prominent users tweets that chronolog-
ically precede those of their peer members.

4.4. Rate condition

In this section, we evaluate the rate condition using the 30 topic communities that are valid for the selected four inuence
theories across all prominent sizes. Remember that this condition expresses the relative rate of polarized activity between
the prominent groups and the rest of the communities and is dened as follows: CAC(Gp, ASM, t1, t2)  CAC(Gh, ASM, t1, t2),
where CAC(Gp, ASM, t1, t2) corresponds to the CAC value of the prominent group and CAC(Gh, ASM, t1, t2) to the CAC value
of the remaining community. To quantify this generic relation, we dene a new measure, called CAC deviation (DCAC), which
expresses the difference between these two measures:
DCAC CACGp ; ASM; t1 ; t 2  CACGh ; ASM; t 1 ; t2 :

11
The Pearson correlation coefcient qX,Y is an established metric that expresses the linear dependency between two variables X and Y. It takes values in the
interval [1, 1], with higher absolute values corresponding to a stronger correlation between X and Y. In fact, a value of |qX,Y| = 1 indicates a completely linear
relationship of the form X = a c + b, where a, b e R and 0 < a if qX,Y = 1, or a < 0 if qX,Y = - 1.

Please cite this article in press as: Kardara, M., et al. Large-scale evaluation framework for local inuence theories in Twitter. Information
Processing and Management (2014), http://dx.doi.org/10.1016/j.ipm.2014.06.002
M. Kardara et al. / Information Processing and Management xxx (2014) xxxxxx 17

Table 5
Pearson correlation between the aggregate sentiment of the prominent groups and the remaining community members.

Gp @10 Gp @20 Gp @50 Gp @100

Fmnt 0.09 0.40 0.55 0.66


Frtw 0.24 0.30 0.64 0.79
Find 0.26 0.49 0.63 0.66
Frnd 0.19 0.14 0.26 0.15 0.35 0.16 0.42 0.13

Table 6
DCAC over the selected 30 topic communities.

Min. Median Max. Mean SD Pos.Com.


Gp @10 Fmnt 0.079 0.017 0.078 0.012 0.029 23
Frtw 0.079 0.013 0.043 0.001 0.024 21
Find 0.042 0.021 0.085 0.017 0.028 16
Frnd 0.529 0.395 0.076 0.372 0.123 0
Gp @20 Fmnt 0.079 0.015 0.051 0.011 0.025 24
Frtw 0.079 0.010 0.040 0.004 0.027 20
Find 0.042 0.018 0.058 0.017 0.028 20
Frnd 0.441 0.243 0.012 0.225 0.120 0
Gp @50 Fmnt 0.034 0.010 0.038 0.009 0.018 22
Frtw 0.041 0.010 0.029 0.004 0.020 22
Find 0.072 0.017 0.074 0.015 0.028 22
Frnd 0.258 0.051 0.002 0.068 0.069 3
Gp @100 Fmnt 0.037 0.008 0.031 0.007 0.017 23
Frtw 0.047 0.002 0.027 0.001 0.017 19
Find 0.020 0.016 0.057 0.016 0.020 23
Frnd 0.085 0.003 0.007 0.014 0.024 7

Apparently, the rate condition holds only for a positive CAC deviation (DCAC > 0), with higher values indicating a stronger
early adopter effect for the inuencers.
The distribution of DCAC across all inuence settings is summarized in Table 6. Note that the rightmost column, named
Pos.Com., expresses the number of communities where DCAC takes positive values.
For Frnd, we used the usual 1000 random samples per topic community and prominent size. We observe that its median
and its mean DCAC take negative values in all cases. The same applies to its maximum values for k = 10 and k = 20. Inevitably,
Pos.Com. is zero for these prominent sizes, indicating that the polarized activity of the inuencers dened by Frnd chronolog-
ically follows that of the common members across all topic communities. Hence, the former exert no real inuence over the
latter. Pos.Com. actually demonstrates that Frnd denes real prominent groups only for large prominent sizes and for a hand-
ful of communities. Still, these cases correspond to very low DCAC, thus suggesting that their chronological advance is rather
limited.
Regarding the remaining inuence theories, Pos.Com. indicates that they consistently dene real prominent groups for at
least 2/3 of the considered communities. The only exception is Find for k = 10, which denes real inuencers for just 16 topic
communities; this is caused, however, by the inactive prominent groups that are present in 10 of the selected communities.
We also observe that the maximum value of Fmnt, Frtw and Find decreases with the increase of prominent size, while the min-
imum one increases. This means that the larger their prominent groups are, the smaller is the variance of DCAC across the 30
topic communities and the more consistent their behavior becomes. Also remarkable is the monotonic decrease in the value of
their median and mean DCAC with the increase of prominent size. This pattern advocates that the larger prominent groups
dened by Fmnt, Frtw and Find, the less agile they are in forecasting the overall mood of their community. Finally, it is worth
noting that Find consistently outperforms Fmnt and Frtw across all prominent sizes, achieving higher values for all statistical
measures.

4.5. Discussion

The above analysis provides valuable insights into the performance of the selected local inuence theories.
First of all, it reveals that Frnd fails to capture real inuence exertion. The main reason is that it is oblivious to the activity
of individual users, dening inuencers without taking any qualitative information into account. Its behavior with respect to
the rate condition advocates that the positive correlations reported in Table 5 cannot be attributed to the early adopter
effect. Instead, they merely demonstrate that the prominent users dened by Frnd follow the polarized activity of their peers.
Second, our analysis veries that Fmnt, Frtw and Find exhibit real inuence exertion under most settings. In fact, the condi-
tions of our framework reveal the following patterns for these theories: the larger the prominent size is, the more content

Please cite this article in press as: Kardara, M., et al. Large-scale evaluation framework for local inuence theories in Twitter. Information
Processing and Management (2014), http://dx.doi.org/10.1016/j.ipm.2014.06.002
18 M. Kardara et al. / Information Processing and Management xxx (2014) xxxxxx

their inuencers produce and the more close their polarized content is to the aggregate sentiment of the remaining commu-
nity. This comes, however, at the cost of delayed polarized activity, as the corresponding prominent users are less timely in
predicting the mood of the community.
Third, our analysis highlights the relative performance of these three inuence theories. The inuencers of Frtw excel in
approximating the communitys aggregate sentiment, while those of Find yield the timeliest predictions. In both cases, Fmnt
follows in close distance. It is worth noting that Find manages to have so competitive a performance, despite the lower (by
two orders of magnitude) volume of polarized and general content its inuencers produce in comparison with those of Fmnt
and Frtw. Apparently, this performance should be attributed to the large audience of its inuencers; even though they post a
negligible number of polarized tweets, they are able to reach a large part of the community.
Finally, our analysis demonstrates that Ftwt is incompatible with our evaluation framework; for most topic communities,
the overwhelming activity of its inuencers violates both volume conditions even for small prominent sizes.
In the following section, we investigate the internal functionality of all ve inuence theories in an effort to explain the
aforementioned performance patterns.

5. Internal analysis of local inuence theories

This section introduces a methodology for the internal analysis of local inuence theories, i.e., for examining the dynamics
that occur inside the prominent groups they dene. This methodology considers three aspects of the relations among prom-
inent users:

(i) their homophily, as inferred from the reciprocal links among them,
(ii) their afnity, in terms of the frequency of their communication, and
(iii) the versatility of their activity, as indicated by the overlap patterns among the prominent groups of different theories.

The ultimate goal of this methodology is twofold: (i) to explain the relative performance of inuence theories, as docu-
mented in the previous section, and (ii) to approximate the optimal prominent size, which achieves high performance across
most inuence theories.

5.1. Homophily

Twitter users are free to post messages on any topic at any time, thus forming topic communities on-the-y. As a result,
community members are not necessarily linked with explicit edges on the social graph. In fact, the links in Eh are likely to
represent relationships formed outside the community Gh, a situation that indicates people that have more things in com-
mon than just the interest in topic h. In the same vein, reciprocal links12 signal users that share even higher levels of homoph-
ily, i.e., similar personal backgrounds in socially signicant ways (Weng et al., 2010).
The goal of this section is to examine the levels of homophily that are shared by the prominent users of each inuence
theory. We can infer this information from the topology of Ep: densely linked sub-graphs indicate strong similarities among
users, and vice versa. To this end, we estimated for all inuence settings the reciprocity of the corresponding prominent
groups by calculating the portion of pairs of prominent users that are reciprocally connected. Formally, the reciprocity of
a prominent group Gp is dened as follows:
fhx; yi 2 Ep : hy; xi 2 Ep g=2 fhx; yi 2 Ep : hy; xi 2 Ep g
reciprocityGp 100% 100%;
jGp jjGp j  1=2 jGp jjGp j  1
where the nominator estimates the number of existing reciprocal links, while the denominator amounts to the total number
of possible reciprocal links inside Gp (i.e., all distinct pair-wise combinations of prominent users). Reciprocity takes values in
the interval [0, 100], with higher values corresponding to higher portions of reciprocal links. The results of our analysis are
summarized in Table 7.
Note that for Frnd and for every prominent size, we repeated our measurements over the same 1000 random samples that
were used in Section 4. We observe that its behavior remains relatively stable across all values of k: the median reciprocity
lies close to 5% and the mean one close to 6% in all cases. Therefore, the prominent users dened by Frnd share negligible
levels of homophily.
Regarding Find, we notice that its prominent groups exhibit substantially lower levels of reciprocity than Frnd. This clearly
indicates that the most popular users inside a topic community typically share little common background.
In contrast, the reciprocity values of Fmnt, Frtw, and Ftwt are much higher than that of Frnd. The maximum values actually
correspond to Fmnt and Frtw, with Ftwt following it in close distance. In all cases, reciprocity reaches its peak for k = 10 and
decreases proportionally with the increase of the prominent size. This means that the quadratic increase of its denominator
(i.e., pairs of prominent users) is larger than the increase of its numerator (i.e., reciprocally connected prominent users). For
k = 100, the reciprocity of these three theories actually gets rather close to Frnd especially with respect to the median values.

12
In Twitter, two users are reciprocally connected to each other, if they follow one another.

Please cite this article in press as: Kardara, M., et al. Large-scale evaluation framework for local inuence theories in Twitter. Information
Processing and Management (2014), http://dx.doi.org/10.1016/j.ipm.2014.06.002
M. Kardara et al. / Information Processing and Management xxx (2014) xxxxxx 19

Table 7
Reciprocity of prominent groups across all inuence settings.

Min. Median Max. Mean SD


Gp @10 Fmnt 0.00 19.44 75.56 23.40 18.17
Frtw 0.00 22.22 71.11 24.29 18.44
Ftwt 0.00 8.33 80.56 14.77 18.01
Find 0.00 0.00 17.86 0.44 2.32
Frnd 0.00 5.00 32.14 6.33 5.75
Gp @20 Fmnt 0.65 14.29 66.84 19.49 15.33
Frtw 0.00 15.03 63.68 19.11 15.74
Ftwt 0.00 8.09 64.74 12.70 13.93
Find 0.00 0.00 8.09 0.39 1.25
Frnd 0.18 5.04 34.25 6.58 5.83
Gp @50 Fmnt 0.87 9.41 50.32 13.03 10.69
Frtw 0.44 10.18 41.75 12.78 10.32
Ftwt 0.00 6.78 57.82 9.29 10.11
Find 0.00 0.13 8.42 0.61 1.31
Frnd 0.65 5.08 29.28 6.43 5.26
Gp @100 Fmnt 0.56 7.13 39.76 9.43 8.31
Frtw 0.42 5.93 33.82 8.79 8.10
Ftwt 0.51 4.76 44.17 6.62 7.10
Find 0.00 0.18 9.28 0.60 1.28
Frnd 0.49 5.26 31.32 6.33 5.33

This implies that members of large prominent groups share the same average levels of homophily as any random pair of
users. We can, thus, consider the prominent size of k = 50 as the critical limit, above which the notion of prominent groups
is degenerated.
On the whole, we can conclude that the prominent groups of Fmnt, Frtw, and Ftwt exhibit high levels of homophily for sizes
up to 50, forming prominent groups with a densely connected graph structure. This applies neither to Frnd nor to Find
regardless of the prominent size.

5.2. Afnity

This section seeks to measure the afnity among prominent users, i.e., the extent to which they know each other and
socialize together. As a means of interaction, we consider the reference from one user to another. In the context of Twitter,
this is exclusively done through the annotated tweets, which come in the form of mentions and retweets. We can reasonably
assume that the more a Twitter user refers to another one, the closer is the afnity between them. Therefore, our goal is to
examine whether prominent users are more likely to refer to their fellow inuencers than to other members of the same
community. To quantify this idea, we employ the following two measures:

Denition 6.1. Given a topic community Gh and its prominent group Gp, the internal mention probability (Pim) expresses
how likely it is for a prominent user in Gp to mention a fellow member of the prominent group. It is dened as follows:
mentionsGp ; Gp
Pim Gp ; Gh 100%;
mentionsGp ; Gh
where mentions(x, y) is a function that returns the number of mentions from the users in group x to the users in group y.

Denition 6.2. Given a topic community Gh and its prominent group Gp, the internal retweet probability (Pir) expresses
how likely it is for a prominent user in Gp to retweet a message that was originally posted by another prominent user in
Gp. It is dened as follows:
retweetsGp ; Gp
Pir Gp ; Gh 100%;
retweetsGp ; Gh
where retweets(x, y) is a function that returns the number of tweets that were originally authored by a user in group y and
were reposted by a user in group x.
Both metrics are dened in the interval [0, 100], with higher values corresponding to more frequent interactions among
inuencers. Therefore, the higher these probabilities are, the stronger is the afnity between the prominent users.
We calculated these probabilities over all topic communities for the usual inuence settings. For Frnd, we considered the
usual 1000 groups of k random users per topic community. Note that, by denition, there is a strong bias between Pim and
Fmnt as well as between Pir and Frtw: inuencers with high mention (retweet) rate within the community are also expected to

Please cite this article in press as: Kardara, M., et al. Large-scale evaluation framework for local inuence theories in Twitter. Information
Processing and Management (2014), http://dx.doi.org/10.1016/j.ipm.2014.06.002
20 M. Kardara et al. / Information Processing and Management xxx (2014) xxxxxx

Table 8
Internal mention probability (on the left side) and internal retweet probability (on the right side) across all topic communities for the usual inuence settings.
Tautologies are highlighted.

Pim Pir
Min. Med. Max. Mean SD Min. Med. Max. Mean SD
Gp @10
Fmnt 0.00 32.97 88.00 35.63 21.75 0.00 24.36 72.49 29.11 18.62
Frtw 0.00 24.29 87.37 26.39 19.10 0.00 31.56 73.18 34.08 20.47
Ftwt 0.00 8.05 91.13 15.03 18.43 0.00 12.08 80.39 16.81 16.38
Find 0.00 6.45 100.00 15.56 21.17 0.00 11.11 100.00 18.69 19.90
Frnd 0.00 0.00 2.05 0.18 0.37 0.00 0.02 1.20 0.16 0.28
Gp @20
Fmnt 0.00 49.17 92.37 48.72 21.63 0.00 41.76 87.49 40.79 21.46
Frtw 0.00 36.09 92.51 35.74 20.11 0.00 48.51 85.46 48.66 19.77
Ftwt 0.00 17.24 91.44 21.09 19.75 0.00 24.05 70.67 27.12 18.23
Find 0.00 15.09 90.91 21.57 21.47 0.00 22.03 76.47 25.45 18.75
Frnd 0.00 0.19 2.51 0.41 0.52 0.00 0.27 1.99 0.40 0.44
Gp @50
Fmnt 22.14 66.43 97.76 64.94 18.69 15.30 58.48 91.59 56.04 19.52
Frtw 0.00 53.38 94.98 50.61 21.23 25.32 68.66 96.27 66.65 17.11
Ftwt 0.00 28.62 95.24 33.49 23.61 0.00 39.29 87.01 40.44 22.07
Find 0.61 29.17 87.06 33.38 17.70 0.00 38.69 85.71 38.26 20.76
Frnd 0.00 0.78 9.40 1.24 1.50 0.00 0.91 8.40 1.30 1.39
Gp @100
Fmnt 32.98 77.11 99.49 75.26 16.36 22.05 68.18 96.67 65.71 17.75
Frtw 0.16 66.29 98.52 61.42 20.93 47.85 77.51 100.00 77.96 14.61
Ftwt 0.17 38.45 98.25 42.26 24.36 3.23 52.70 93.93 51.22 21.71
Find 10.74 41.18 92.58 42.31 18.05 11.46 47.27 92.05 48.65 19.13
Frnd 0.11 1.57 18.36 2.34 2.66 0.12 1.39 17.49 2.42 2.66

be frequently mentioned (retweeted) by their peer prominent users, thus leading to a tautological observation. Nevertheless,
we include these combinations in our analysis in order to provide an indication of the maximum possible probabilities.
The outcomes of our evaluation are presented in Table 8. Looking at the results, we observe that Frnd takes very low values
for both probabilities across all prominent sizes: its median value is equal or very close to 0, while its mean one increases in
proportion with k, but remains lower than 3% in all cases. This behavior demonstrates the insignicant levels of interaction
between randomly selected prominent users, as they are more likely to refer to community members outside their prom-
inent group.
On the other extreme lie Fmnt and Frtw, which take the highest by far values for both probabilities across all prominent
sizes. In all cases, their median value is almost identical to the mean one, thus indicating normal distributions over the 75
communities. For Pim, Frtw takes values 10% to 20% lower than the maximum ones (i.e., those of the tautology of Fmnt). A sim-
ilar pattern is exhibited for Pir, where Fmnt lies within 10% of the tautology of Frtw. In every case, the larger the prominent size
is, the higher the corresponding probabilities are. In fact, for k = 10 and k = 20, their median and average probabilities take
values lower than 50%, thus indicating moderate levels of afnity between prominent users. Higher prominent sizes, though,
yield higher median and average probabilities, which exceed 50% to a considerable extent. This implies a rather strong afn-
ity, as prominent users are more likely to refer to a fellow inuencer than to any other member of their community.
In the middle of these two extremes lie Ftwt and Find, which exhibit similar behavior for both probabilities regardless of
the prominent size. For k = 10, they are closer to Frnd, as their median and mean values are lower than half the difference
between Frnd and Fmnt/Frtw. For larger k, though, the afnity patterns among their prominent users are extensively reinforced,
bridging the gap with Fmnt and Frtw to some extent. Nevertheless, their values for both Pim and Pir rarely exceed 50%, thus
indicating that the inuencers they identify are more likely to interact with the rest of the community than between them.
We can deduce, therefore, that on average, these two theories exhibit moderate afnity patterns for large prominent sizes.
To understand the variations of Pim and Pir across the individual communities, we examined the effect of community size
on them. In general, we expect that the more members a community encompasses, the lower is the probability for a prom-
inent user to interact with another inuencer. To examine this assumption, we estimated the Pearson correlation between
the community size and the probabilities Pim and Pir across all communities and prominent sizes. The results demonstrate a
clearly negative: the correlation coefcient uctuates between 0.20 and 0.45, with the lower values corresponding to lar-
ger prominent sizes. This pattern applies to all inuence theories for both Pim and Pir and is particularly intensive for Find and
Frnd.
On the whole, we can conclude that the inuencers of Fmnt and Frtw form tightly coupled sub-groups with frequent ref-
erences to each other, especially for prominent sizes larger than k = 20. In contrast, the prominent users of Frnd have practi-
cally no ties between them, unless large prominent sizes are considered. Finally, Ftwt and Find dene prominent groups with
moderate afnity that increases in proportion with k.

Please cite this article in press as: Kardara, M., et al. Large-scale evaluation framework for local inuence theories in Twitter. Information
Processing and Management (2014), http://dx.doi.org/10.1016/j.ipm.2014.06.002
M. Kardara et al. / Information Processing and Management xxx (2014) xxxxxx 21

5.3. Versatile activity

In this section, we investigate whether different inuence theories consider the same users as inuencers in the context
of the individual topic communities. Given that every theory captures a different aspect of a community, the prominent users
shared by two of them correspond to highly inuential members with versatile activity, thus affecting their peers in multiple
ways. To investigate this aspect, we consider cross-theory overlaps, i.e., the portion of common users among the prominent
groups of the same community and prominent size that are dened by different inuence theories.
To measure the cross-theory overlaps, we employ the Jaccard similarity coefcient. Given two prominent groups of equal
size, G1 @k and G2 @k, their Jaccard similarity, JG1 @k; G2 @k), is dened as the size of the intersection of their prominent users
divided by the size of their union:
jV 1 @k \ V 2 @kj
JG1 @k; G2 @k 100%:
jV 1 @k [ V 2 @kj
In essence, Jaccard similarity expresses the portion of users that are common among the input groups. It takes values in the
interval [0, 100], with higher values denoting higher overlap.
Note that we had to exclude Frnd from our analysis, since it does not provide any useful insights; its overlap with any other
theory is random by denition. For the remaining inuence theories, we considered the 6 possible pair-wise combinations,
computing the overlap between their prominent groups for the usual four prominent sizes across all communities. In total,
we have 24 different combinations of prominent sizes and pairs of inuence theories. The outcomes of our analysis are pre-
sented in Table 9.
We start with the pair of theories that corresponds to the highest similarity, namely Fmnt and Frtw. Their minimum overlap
exceeds 0 in all cases, which means that they share at least one inuencer for every individual topic regardless of the prom-
inent size. In most of the cases, their median overlap is equal to the average of the minimum and the maximum ones, a
strong indication of a normal distribution across the individual communities. This evidence is also advocated by the fact that
the median value is almost identical with the mean one across all k. In absolute numbers, both metrics exceed 25% in all
cases, thus suggesting that on average, 1/4 of the prominent users immersing themselves into discussions with other com-
munity members generate content of high value, which is propagated to a wide audience, and vice versa. This is consistent
with the intuitive perception that both criteria are indicative of a users ability to trigger a direct reaction from fellow users.
On the other hand, the lowest similarity consistently corresponds to Find and Ftwt. For k lower than 50, their median Jac-
card similarity is 0, thus suggesting that there is a cross-theory overlap solely for a handful of communities. Even for these
communities, the overlap of the prominent groups remains rather low, taking a maximum value of 20% and a mean one of
less than 3%. For larger prominent sizes, most metrics increase proportionally, with the mean one reaching 6% for k = 100.
Therefore, even for the largest prominent groups, only 1/20 of the most popular users are among the most prolic of the
community members. This means that popular users do no need to overwhelm the community with large volumes of con-
tent in order to ensure their high levels of inuence.

Table 9
Jaccard similarities between the prominent groups that are dened by different inuence theories over the same topic community for the usual prominent
sizes.

Theory1 Find Find Find Fmnt Fmnt Frtw


Theory2 Fmnt Frtw Ftwt Frtw Ftwt Ftwt
Gp @10
Minimum 0.00 0.00 0.00 5.00 0.00 0.00
Median 10.00 10.00 0.00 25.00 10.00 15.00
Maximum 35.00 40.00 20.00 45.00 35.00 35.00
Mean SD 11.93 9.11 11.87 9.61 2.07 3.78 28.13 9.29 12.20 8.27 13.47 8.14
Gp @20
Minimum 0.00 0.00 0.00 10.00 0.00 0.00
Median 12.50 12.50 0.00 30.00 12.50 15.00
Maximum 35.00 30.00 17.50 45.00 32.50 30.00
Mean SD 12.57 7.84 12.77 7.62 2.83 3.71 28.47 8.14 12.77 8.03 14.33 6.89
Gp @50
Minimum 2.00 4.00 0.00 13.00 2.00 4.00
Median 13.00 14.00 4.00 30.00 15.00 16.00
Maximum 28.00 27.00 17.00 43.00 32.00 28.00
Mean SD 14.33 6.16 14.52 6.03 4.41 3.91 28.76 7.01 14.91 6.97 16.49 5.68
Gp @100
Minimum 5.00 6.00 0.50 15.00 4.00 5.00
Median 15.00 16.00 5.00 30.00 16.00 18.00
Maximum 27.50 29.50 19.00 42.50 33.00 29.00
Mean SD 15.31 5.29 16.17 5.72 5.73 4.26 28.75 6.54 16.87 6.80 18.22 5.14

Please cite this article in press as: Kardara, M., et al. Large-scale evaluation framework for local inuence theories in Twitter. Information
Processing and Management (2014), http://dx.doi.org/10.1016/j.ipm.2014.06.002
22 M. Kardara et al. / Information Processing and Management xxx (2014) xxxxxx

Fig. 3. Dynamics inside the prominent groups of Fmnt and Frtw.

In the middle of these two extremes lie the overlaps of Fmnt and Frwt with Frnd and Find. All four combinations exhibit highly
similar behavior regardless of the prominent size. They all indicate a normal distribution across the 75 communities, as the
mean overlap is very close to the median one as well as to the average of the minimum and the maximum value. In absolute
numbers, the mean overlap uctuates between 12% and 18%, with the higher values corresponding to higher prominent
sizes. This implies that on average, around 1/6 of the k most prolic or popular users are heavily cited or discussed, with
their portion increasing slightly with the increase of k. These overlap patterns, which remain relatively stable across all
prominent sizes, are also consistent with earlier evidence that was derived after examining the relations between Find, Fmnt
and Ftwt out of context (i.e., over the entire social graph of Twitter and over the entire activity of its users) (Cha et al., 2010).

5.4. Discussion

The outcomes of our internal analysis suggest that inuence theories can be distinguished into three main categories.
The rst one includes those inuence theories that dene prominent groups with strong ties among their members. Their
prominent users are acquainted with each other, forming tightly connected groups with an organized, uniform behavior that
exerts signicant inuence over the rest of the community. These internal dynamics facilitate the collaboration between
inuencers, enabling them to maximize their inuence and to achieve high performance with respect to our evaluation
framework. Fig. 3 visualizes these dynamics. Among the theories examined in Section 4, Fmnt and Frtw fall under this category.
Their prominent users share high levels of homophily (especially for prominent sizes k 6 50) and high levels of afnity (espe-
cially for prominent sizes k P 50). Part of them also exhibits a highly versatile activity, contributing heavily to the content
and the discussions of the topic community as well as to the promotion of personal opinions. These patterns are intensied
for prominent groups with 50 members, thus providing a reliable approximation of the optimal prominent size.
The second category comprises inuence theories that identify as prominent users individually powerful members who
fail to form a team with their fellow inuencers. The reason is that they share low levels of homophily, they interact with
each other to a limited extent and their activity within the community is typically unilateral, exhibiting low versatility. These
internal dynamics are depicted in Fig. 4, demonstrating that the inuencers dened by these theories rarely operate in

Fig. 4. Dynamics inside the prominent groups of Find and Ftwt.

Please cite this article in press as: Kardara, M., et al. Large-scale evaluation framework for local inuence theories in Twitter. Information
Processing and Management (2014), http://dx.doi.org/10.1016/j.ipm.2014.06.002
M. Kardara et al. / Information Processing and Management xxx (2014) xxxxxx 23

Fig. 5. Dynamics inside the prominent groups of Frnd.

collaboration. To this category belong Find and Ftwt. The latter is incompatible with our evaluation framework, but the former
achieved high effectiveness. Its inuencers are rather parsimonious with the content they contribute to the community, yet
they are able to predict accurately and on-time certain aspects of the communitys activity.
Finally, the third category entails those inuence theories that dene prominent users who form a team only by euphe-
mism. They do not associate with each other, as they are highly likely to be unknown to each other or to have different social
backgrounds (i.e., absence of homophily). They also appear to be rather indifferent to the activity of the community, as they
rarely excel in any of its aspects. It is no wonder, therefore, that they fail to synchronize and collectively inuence their peers,
thus yielding rather low effectiveness with respect to our evaluation framework regardless of the prominent size. In our
study, this category is exemplied through Frnd. The dynamics of its prominent groups are illustrated in Fig. 5.
Another point worth considering is whether our ndings for Twitter (i.e. inuential individuals affecting the sentiment of
a topic community) translate to actual inuence in the real world. Previous studies OConnor et al. (2010) and Abbasi, Chai,
Liu, and Sagoo (2012) suggest that the sentiment expressed in social media is strongly correlated with the opinions of people
in the real world to such an extent that sentiment detection in social media could act as a surrogate for polls. From that point
of view, it seems that inuential social media users can have an effect on the opinions of the general population.
To clarify the comparison between real and online world and to further elucidate the results of our analysis, we examine
the prominent users dened by each inuence theory in terms of their real-world status and identities where applicable.
Two topic communities were selected for this purpose13: #iranelection, which refers to the Iran presidential election held on 12
June 2009, and #noh8, which pertains to the NOH8 Campaign, promoting marriage, gender and human equality. Their promi-
nent groups for k = 20 are listed in Tables 10 and 11, respectively.
Starting with #iranelection, we observe that the prominent group of Frtw consists almost entirely of journalists, bloggers
and activists, the majority of whom is Iranian. The prominent group dened by Find contains an equally high amount of news
agencies and journalists/political bloggers. The inuencers of Fmnt are similar with those of Frtw with the addition of few ran-
dom users and seemingly unrelated celebrities. The prominent group of Frtw seems to be the least relevant, merely containing
a handful of accounts that are associated with Iran and the Green movement. An interesting observation is that two accounts
are selected as inuencers by all theories, except for Ftwt: oxfordgirl and persiankiwi. Both of them correspond to ordinary
users from Iran, who were previously unknown, but gained reputation as reliable sources on the topic through their dedi-
cation and active participation in the community. It is also worth noting that a large part of this communitys inuencers
(including oxfordgirl and persiankiwi) also appear in the prominent groups of relevant topics, such as #neda.
For #noh8, we observe that Find, Fmnt and Frtw contain an equal mix of celebrities and random users along with few
accounts that belong to human rights organizations ghting against discrimination. Under closer examination, though, most
of the random users appear to be gay activists or people with strong participation in gender equality issues. Again, the Ftwt
theory produces the least relevant results, with more than two thirds of the accounts being disabled or suspended at the
moment, a strong indication that they exhibited spamming or marketing behavior. It is also worth mentioning that there
is no overlap between the prominent groups of the two topic communities across all inuence theories.
The above ndings are consistent with the results of the previous analyses, which demonstrate that Find and Frtw are the
most reliable methods for detecting inuencers, while Ftwt does not convey real inuence (Purohit et al., 2012). We can
deduce, therefore, that Find and Frtw incorporate different, but equally important types of evidence, with the former consid-
ering users popularity as a reliable estimation of their inuence and the latter relying on their ability to produce interesting
content. Finally, it is important to stress that inuence appears to be context-specic, since there is negligible or no overlap
between the prominent groups of different communities, unless the corresponding topics are semantically related.

13
The top 20 inuencers of each theory for all topic communities can be found at: http://l3s.de/~papadakis/EvaluationFrameworkStatisticalAnalysis.xlsx.

Please cite this article in press as: Kardara, M., et al. Large-scale evaluation framework for local inuence theories in Twitter. Information
Processing and Management (2014), http://dx.doi.org/10.1016/j.ipm.2014.06.002
24 M. Kardara et al. / Information Processing and Management xxx (2014) xxxxxx

Table 10
The Gp @20 dened by each inuence theory for the topic community #iranelection.

Find Fmnt Frtw Ftwt


cnnbrk t oxfordgirl dominiquerdr
persiankiwi iranbaan t tehranweekly
mashable oxfordgirl lissnup perry1949
google onlymehdi persianbanoo lissnup
mousavi1388 tehranweekly manic77 khoshkeledoc
nprpolitics lissnup iranbaan edwand
neilhimself dadashiii onlymehdi mwolda
timoreilly k k t4tx
oxfordgirl manic77 iran88 fardinzamani
anamariecox cnn parsa4 persia_news
nansen persiankiwi madyar artemis_ia
zaibatsu gr gr dadashiii
nprnews persianbanoo tehranweekly iranlaya
iamdiddy dominiquerdr persiankiwi atlsafa
brooksbayne madyar dadashiii uncoolbobby
leolaporte iran88 fahimn persia_max_news
dcagle Lotfan lotfan joannemichele
hufngtonpost fahimn sheydaj sannri
cnn austinheap sheydajahanbin iranwwp
nytimeskristof patrickaltoft dominiquerdr persia_news_2

Table 11
The Gp @20 dened by each inuence theory for the topic community #noh8.

Find Fmnt Frtw Ftwt


drdrew noh8campaign bouska greenbean55
bouska bouska biolawyer noh8campaign
emmyrossum ste_vee kylechristian angelbenton
hollymadison123 greenbean55 ryanleejohnson johnprather
queerunity lukasrossi greenbean55 geisha_boy
lovebscott hellomattwalker kristicansler khajiitchick
torianddean drdrew noh8campaign matthewlush
bluecrystalsky scoutmasterson scoutmasterson alchey
gaycivilrights looneypyrodude jeffrago kylechristian
egheitasean rosemcgowan bear54 unlivedlife
lukasrossi jeffparshley matthewlush andrewxanarchy
onemorelesbian chimasimone willfalls crystallewis60
ontd_uffy dawnrichard mabergel sthprkfnt
yezbok madalyngrimm ladygagast aristokatcy
rufuscoolkitty matthewlush rakefet27 kausinkonfusion
artemisrex shannamoakler egheitasean ste_vee
jaysays chris_gorham tiffanyrinehart jamesrestein
tiffanyrinehart hollymadison123 ashe prettyynikki
aramina kimzolciak ladyspeaker scoutmasterson
stephjonesmusic kylechristian Etejeday Hasthepotential

6. Conclusion

In this paper, we proposed an evaluation framework for assessing the relative effectiveness of topic-specic inuence the-
ories. Its goal is to examine whether these theories identify as inuencers small groups of users who are able to affect certain,
objectively measurable aspects of a community with minimal effort. Two assumptions lie at the core of our framework:

i. Each topic community is dened by the same hashtag throughout its lifetime and, thus, it involves members that are
not necessarily connected with explicit links on the social graph, while excluding those using another hashtag for the
same topic.
ii. Each community involves a substantial portion of polarized messages.

In this context, our framework denes ve necessary and sufcient conditions that an effective inuence theory should
satisfy across a series of topic communities. The rst three encapsulate prerequisites that lter out theories yielding over-
sized or overwhelming inuencers. For the other two conditions, we introduced evaluation metrics that estimate the seman-
tic and chronological relation between the average sentiment of inuencers and the remaining community. Both are based

Please cite this article in press as: Kardara, M., et al. Large-scale evaluation framework for local inuence theories in Twitter. Information
Processing and Management (2014), http://dx.doi.org/10.1016/j.ipm.2014.06.002
M. Kardara et al. / Information Processing and Management xxx (2014) xxxxxx 25

on emoticons for identifying the expression of sentiment, an approach that was preferred over more advanced sentiment
analysis techniques, which are also compatible with our framework.
To demonstrate the use of our framework, we evaluated the relative effectiveness of ve established topic-specic inu-
ence theories for Twitter, namely Ftwt, Frnd, Fmnt, Frtw and Find. We applied them on a large-scale, real-world dataset that com-
prises 75 topic communities dened by hashtags, with each one involving 80,000 tweets and almost 9000 users, on average.
The outcomes of our analysis indicate that Ftwt is incompatible with our framework, while Frnd denes groups that exert no
real inuence over their peers. In contrast, Fmnt, Frtw and Find exhibit high effectiveness across most theories. To shed light on
our experimental results, we introduced a novel methodology for examining the internal dynamics of the prominent groups
of each theory. The outcomes of this analysis suggest that inuence theories can be classied in three main categories with
respect to the interactions among their prominent users. The rst one involves theories that dene coherent prominent
groups, whose members operate as a team, sharing high levels of homophily and afnity. The other two categories select
as inuencers unrelated individuals that do not operate in coordination; they differ, though, in the quality of their prominent
users, i.e., whether they are ordinary community members, or they excel in some aspect of the communitys activity.
In the future, we intend to explore ways of generalizing our conclusions on Twitter to other OSNs. In addition, we plan to
investigate possible methods for combining Fmnt, Frtw and Find into a single, comprehensive theory that outperforms in effec-
tiveness the individual theories comprising it.

Appendix A

Polarity ratio breakdown per valid topic community for the prominent users of Fmnt, Frtw and Find and for the rest of the
community.

k = 10 k = 20
PR(Gp@10) PR(GP) PR(Gp@20) PR(GP)
Fmnt Frwt Find Fmnt Frwt Find Fmnt Frwt Find Fmnt Frwt Find
#bsb 3.50 9.33 5.50 7.55 7.33 7.48 13.00 10.33 13.50 7.25 7.29 7.23
#bsbthisisusoct6th 4.50 8.00 13.00 6.55 6.40 6.38 7.00 11.25 14.00 6.42 6.23 6.28
#business 1.00 1.00 1.00 16.37 16.37 16.37 1.00 1.00 1.00 16.37 16.37 16.37
#design 116.50 57.75 0.33 0.37 0.36 0.08 19.42 22.50 0.33 0.37 0.35 0.08
#etsy 4.00 11.50 1.00 13.90 13.85 13.93 9.00 24.50 7.00 13.85 13.56 13.87
#f1 0.40 0.60 0.22 0.14 0.14 0.14 0.83 0.29 0.50 0.13 0.14 0.14
#fashion 1.00 0.00 0.00 3.59 3.60 3.60 1.00 3.00 1.00 3.59 3.57 3.59
#gop 0.75 8.00 0.00 0.07 0.05 0.08 0.80 17.00 0.33 0.07 0.19 0.09
#green 0.50 0.67 0.00 0.28 0.28 0.28 0.33 0.11 0.00 0.29 0.29 0.28
#h1n1 17.00 153.00 0.00 4.13 3.42 4.30 35.50 76.50 0.00 3.55 3.44 4.30
#hcr 3.88 4.22 0.00 0.24 0.23 0.28 1.48 4.14 0.00 0.25 0.21 0.28
#health 2.00 1.33 0.00 0.85 0.91 0.87 2.50 0.25 1.00 0.85 0.91 0.86
#healthcare 2.00 0.43 0.00 0.56 0.52 0.49 1.22 0.33 7.00 0.56 0.53 0.47
#iamthemob 3.00 2.00 1.00 0.60 0.63 0.70 2.94 1.15 4.00 0.57 0.64 0.65
#iranelection 0.46 0.27 0.40 0.52 0.52 0.52 1.07 1.03 0.40 0.49 0.49 0.52
#nascar 0.21 0.25 1.33 1.60 1.60 1.50 1.07 0.71 1.56 1.54 1.59 1.50
#neda 3.25 1.78 1.00 2.47 2.50 2.50 3.33 2.45 1.33 2.46 2.49 2.50
#n 1.00 186.00 0.00 3.65 3.26 3.65 3.00 187.00 0.00 3.64 3.26 3.65
#noh8 7.25 11.67 10.00 4.38 4.39 4.52 7.38 13.00 12.00 4.37 4.35 4.50
#p2 0.22 4.36 3.00 0.03 0.02 0.02 0.31 1.02 7.00 0.05 0.05 0.01
#photography 2.50 2.32 0.00 2.65 2.67 2.64 2.52 1.27 3.00 2.65 2.88 2.63
#piraten 18.92 14.73 19.00 3.01 3.03 3.31 13.72 15.59 0.10 3.00 2.97 3.46
#redsox 0.00 1.00 0.50 2.09 2.19 1.97 0.84 0.12 1.14 2.09 2.20 2.14
#seattle 0.50 2.00 1.50 1.86 1.84 1.82 0.67 2.67 3.00 1.88 1.92 1.87
#sgp 2.00 0.73 21.00 0.36 0.39 0.34 0.71 0.35 0.83 0.38 0.47 0.38
#swineu 1.00 1.00 0.00 8.85 8.85 8.78 0.00 7.50 2.00 8.85 8.74 8.92
#tcot 3.70 2.82 0.00 0.36 0.37 0.39 3.29 2.37 2.63 0.36 0.36 0.38
#teaparty 6.78 6.78 3.00 0.71 0.71 0.94 5.07 5.20 1.40 0.69 0.66 0.90
#tlot 1.07 0.44 7.33 0.08 0.10 0.07 1.28 2.05 3.17 0.07 0.05 0.08
#travel 0.58 0.30 0.00 0.16 0.16 0.15 0.23 0.46 0.13 0.16 0.17 0.15
Pearson correlation 0.09 0.25 0.29 0.40 0.30 0.50

Please cite this article in press as: Kardara, M., et al. Large-scale evaluation framework for local inuence theories in Twitter. Information
Processing and Management (2014), http://dx.doi.org/10.1016/j.ipm.2014.06.002
26 M. Kardara et al. / Information Processing and Management xxx (2014) xxxxxx

k = 50 k = 100
PR(Gp@50) PR(GP) PR(Gp@100) PR(GP)
Fmnt Frwt Find Fmnt Frwt Find Fmnt Frwt Find Fmnt Frwt Find
#bsb 17.25 15.00 15.00 6.79 7.06 7.06 11.43 5.80 11.40 6.95 7.71 7.10
#bsbthisisusoct6th 6.07 10.44 7.60 6.53 6.01 6.39 6.53 10.46 7.29 6.44 5.78 6.39
#business 0.00 1.00 2.00 16.91 16.53 16.36 0.25 16.00 4.00 17.11 16.21 16.34
#design 10.95 9.08 0.29 0.37 0.37 0.08 6.76 5.92 0.07 0.38 0.37 0.08
#etsy 20.00 35.00 19.00 13.59 13.09 13.68 33.25 37.00 32.50 12.92 12.48 13.38
#f1 1.00 0.69 0.18 0.11 0.12 0.14 1.11 0.74 1.10 0.09 0.11 0.11
#fashion 2.50 0.50 1.00 3.58 3.75 3.59 2.75 1.25 0.00 3.59 3.75 3.70
#gop 0.21 2.25 0.13 0.07 0.13 0.08 0.45 1.82 0.50 0.13 0.12 0.10
#green 0.40 0.21 1.00 0.27 0.28 0.27 0.30 0.73 0.00 0.28 0.26 0.29
#h1n1 32.00 21.57 1.00 3.46 3.56 4.33 15.64 13.44 1.00 3.29 3.23 4.50
#hcr 1.18 1.83 0.50 0.23 0.20 0.28 0.86 1.20 2.00 0.22 0.20 0.27
#health 0.60 0.19 4.00 0.87 0.95 0.85 0.09 0.11 2.50 0.90 0.96 0.85
#healthcare 0.55 0.26 5.50 0.59 0.55 0.46 0.26 0.10 2.00 0.59 0.59 0.47
#iamthemob 2.52 0.91 3.67 0.49 0.66 0.64 1.59 0.64 3.67 0.50 0.72 0.62
#iranelection 0.54 0.51 0.73 0.52 0.52 0.52 0.39 0.37 0.48 0.55 0.56 0.52
#nascar 1.61 1.02 1.00 1.48 1.58 1.55 1.67 1.55 1.20 1.46 1.48 1.54
#neda 1.87 4.34 2.67 2.53 2.33 2.49 3.23 3.41 1.00 2.39 2.35 2.55
#n 0.13 30.50 2.00 3.71 3.30 3.64 0.08 27.00 2.00 3.73 3.29 3.65
#noh8 6.70 16.25 15.00 4.38 4.18 4.39 7.23 9.90 18.00 4.25 4.10 4.34
#p2 0.02 0.79 0.72 0.03 0.06 0.03 0.02 0.13 0.92 0.03 0.00 0.01
#photography 2.74 1.47 1.14 2.62 2.87 2.67 2.40 0.60 1.22 2.67 3.60 2.67
#piraten 12.56 1.04 3.39 2.88 4.56 3.34 3.85 1.14 3.93 3.23 5.16 3.31
#redsox 0.74 0.54 0.48 2.31 2.22 2.18 0.99 0.77 0.31 2.36 2.41 2.17
#seattle 2.14 2.67 6.00 2.11 2.11 1.98 2.63 3.67 5.00 2.25 2.24 2.05
#sgp 0.78 0.44 0.88 0.37 0.67 0.37 2.14 0.02 1.90 0.20 0.57 0.30
#swineu 9.25 13.00 2.00 8.69 8.47 8.92 8.64 22.75 4.00 8.73 7.91 9.06
#tcot 1.48 1.92 0.02 0.35 0.34 0.40 1.52 1.13 0.69 0.31 0.33 0.38
#teaparty 4.44 2.57 1.25 0.66 0.71 0.90 3.11 1.30 4.80 0.56 0.81 0.68
#tlot 2.29 0.65 3.54 0.00 0.05 0.04 1.37 0.09 2.50 0.00 0.11 0.03
#travel 0.36 0.46 0.88 0.17 0.18 0.18 0.74 0.23 0.62 0.21 0.19 0.18
Pearson correlation 0.55 0.64 0.63 0.66 0.79 0.66

References

Abbasi, M., Chai, S., Liu, H., & Sagoo, K. (2012). Real-world behavior analysis through a social media lens. Maryland: Springer.
Anagnostopoulos, A., Kumar, R., & Mahdian, M. (2008). Inuence and correlation in social networks. Las Vegas: ACM (pp. 715).
Aral, S., Muchnik, L., & Sundararajan, A. (2009). Distinguishing inuence-based contagion from homophily-driven diffusion in dynamic networks.
Proceedings of the National Academy of Sciences.
Bakshy, E., Hofman, J., Mason, W., & Watts, D. (2011). Everyones an inuencer: Quantifying inuence on Twitter. Hong Kong: ACM (pp. 6574).
Barbosa, L., & Feng, J. (2010). Robust sentiment detection on Twitter from biased and noisy data (pp. 3644). Beijing, China: Association for Computational
Linguistics.
Bodendorf, F., & Kaiser, C. (2009). Detecting opinion leaders and trends in online social networks. Hong Kong, China: ACM (pp. 6568).
Brown, D., & Hayes, N. (2008). Inuencer marketing: Who really inuences your customers? Elsevier/Butterworth-Heinemann.
Bruns, A., & Burgess, J. (2011). The use of Twitter hashtags in the formation of ad hoc publics. In European consortium for political research conference. Reykjavik,
Iceland.
Cha, M., Haddadi, H., Benevenuto, F., & Gummadi, K. (2010). Measuring user inuence in Twitter: The million follower fallacy. Washington: The AAAI Press.
Chen, W., Wang, C., & Wang, Y. (2010). Scalable inuence maximization for prevalent viral marketing in large-scale social networks. Washington: ACM
(pp. 10291038).
Choudhury, M. D., et al. (2010). Birds of a Feather: Does user homophily impact information diffusion in social media? CoRR Vol. abs/1006.1702.
Cosley, D. et al (2010). Sequential inuence models in social networks. Washington: The AAAI Press.
Cui, A. et al (2012). Discover breaking events with popular hashtags in Twitter. Maui, Hawaii: ACM.
Domingos, P., & Richardson, M. (2001). Mining the network value of customers. Proceedings of the seventh ACM SIGKDD international conference on Knowledge
discovery and data mining (KDD0 01). San Francisco, California: ACM (pp. 5766).
Ernesto, D. -A., et al. (2012). What is happening right now ... that interests me? Online topic discovery and recommendation in Twitter. New York: ACM.
Fond, T. L., & Neville, J. (2010). Randomization tests for distinguishing social inuence and homophily effects. Proceedings of the 19th international conference on
World wide web (WWW0 10). Raleigh, North Carolina, USA: ACM (pp. 601610).
Gayo-Avello, D. (2013). Nepotistic relationships in Twitter and their impact on rank prestige algorithms. Information Processing & Management, 12501280.
Giannakopoulos, G. et al (2012). Representation models for text classication: A comparative analysis over three web document types. Craiova, Romania: ACM.
Go, A., Bhayani, R., & Huang, L. (2009). Twitter sentiment classication using distant supervision. Processing.
Goyal, A., Bonchi, F., & Lakshmanan, L. (2010). Learning inuence probabilities in social networks. New York, USA: ACM (pp. 241250).

Please cite this article in press as: Kardara, M., et al. Large-scale evaluation framework for local inuence theories in Twitter. Information
Processing and Management (2014), http://dx.doi.org/10.1016/j.ipm.2014.06.002
M. Kardara et al. / Information Processing and Management xxx (2014) xxxxxx 27

Holme, P., & Newman, M. E. J. (2006). Nonequilibrium phase transition in the coevolution of networks and opinions. Physical Review E.
Kardara, M. et al (2012). Inuence patterns in topic communities of social media. Craiova, Romania: ACM (pp. 10).
Katz, E., Lazarsfeld, P., & Roper, E. (2005). Personal inuence: The part played by people in the ow of mass communications. Transaction Publishers.
Keller, E., Fay, B., & Berry, J. (2007). Leading the conversation: inuencers impact on word of mouth and the brand conversation. Keller Fay Group (pp. 173186).
Kempe, D., Kleinberg, J., & Tardos, E. (2003). Maximizing the spread of inuence through a social network. Washington, DC, USA: ACM (pp. 137146).
Kwak, H., Lee, C., Park, H., & Moon, S. (2010). What is Twitter, a social network or a news media? Raleigh, North Carolina, USA: ACM (pp. 591600).
Li, H., Bhowmick, S., & Sun, A. (2011). CASINO: Towards conformity-aware social inuence analysis in online social networks. Glasgow, UK: ACM
(pp. 10071012).
Liu, L. et al (2010). Mining topic-level inuence in heterogeneous networks. Toronto, Canada: ACM (pp. 199208).
Luu, M. D., Lim, E.-P., Hoang, T.-A., & Chua, F. C. T. (2012). Modeling diffusion in social networks using network properties. Dublin, Ireland: The AAAI Press.
Ma, H., Yang, H., Lyu, M., & King, I. (2008). Mining social networks using heat diffusion processes for marketing candidates selection. Napa Valley, California: ACM
(pp. 233242).
McPherson, M., Smith-Lovin, L., & Cook, J. (2001). Birds of a feather: Homophily in social networks. Annual Review of Sociology, 27, 415444.
OConnor, B., Balasubramanyan, R., Routledge, B. R., & Smith, N. A. (2010). From tweets to polls: Linking text sentiment to public opinion time series. In
Proceedings of the international AAAI conference on weblogs and social media, Washington, DC.
Pal, A., & Counts, S. (2011). Identifying topical authorities in microblogs. New York: ACM.
Petrovic, S., Osborne, M., & Lavrenko, V. (2011). RT to win! Predicting message propagation in Twitter. Barcelona, Spain: The AAAI Press.
Purohit, H. et al (2012). Finding inuential authors in brand-page communities. Dublin, Ireland: The AAAI Press.
Scripps, J., Tan, P.-N., & Esfahanian, A.-H. (2009). Measuring the effects of preprocessing decisions and network forces in dynamic network analysis. Paris, France:
ACM (pp. 747756).
Tan, C. et al (2010). Social action tracking via noise tolerant time-varying factor graphs. Washington: ACM (pp. 10491058).
Tang, J., Sun, J., Wang, C., & Yang, Z. (2009). Social inuence analysis in large-scale networks. Paris, France: ACM (pp. 807816).
Tsytsarau, M., & Palpanas, T. (2012). Survey on mining subjective data on the web. Data Mining and Knowledge Discovery, 24(3), 478514.
Weng, J., Lim, E.-P., Jiang, J., & He, Q. (2010). TwitterRank: Finding topic-sensitive inuential Twitterers. New York: ACM (pp. 261270).
Yang, J., & Leskovec, J. (2011). Patterns of temporal variation in online media. Hong Kong: ACM (pp. 177186).
Zubiaga, A., Spina, D., Fresno, V., & Martnez, R. (2011). Classifying trending topics: A typology of conversation triggers on Twitter. Glasgow: ACM.

Please cite this article in press as: Kardara, M., et al. Large-scale evaluation framework for local inuence theories in Twitter. Information
Processing and Management (2014), http://dx.doi.org/10.1016/j.ipm.2014.06.002

Das könnte Ihnen auch gefallen