Sie sind auf Seite 1von 42

User Behavior Model &

Recommendation On Basis Of Social


Networks
Hossain MD. Shakawat
Department of Computer Science & Engineering
ID 11-18494-1
American International Universiry-Bangladesh
Najeeb, Ahmad Taher
Department of Computer Science & Engineering
ID 11-18198-1
American International Universiry-Bangladesh
Alam Shah
Department of Computer Science & Engineering
ID 10-17685-3
American International Universiry-Bangladesh
September 8, 2014
1
Table of Contents:
Abstruct . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . :3
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . : 4
2. Previous Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . : 5
2.1 Location Based Social Network. . . . . . . . . . . . . . . ....: 5
2.2 Collaborative Recommendation
Based Social Network. . . . . . . . . . . . . . . . . . . . . . . . . . . :10
2.3 Sentimental Intensity
Analysis of Informal Texts. . . . . . . . . . . . . . . . . . . . . ..:13
2.4 Big 5 Modelling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ....:18
3. Objective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . : 28
4. Research Question . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . : 29
5. Proposed Research Methodology. . . . . . . . . . . . . . . . . . ...: 29
5.1 Data Collection. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . :30
5.2 Data Analysis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ...:31
5.3 Results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ....:32
5.4 Recommendation Analysis. . . . . . . . . . . . . . . . . . . . . ...:33
6. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ...: 35
2
Abstract
At present social networks play an important role to express users
sentiment & his/her interest on a particular eld. Extracting ones
public data (what he/she shares with friends/relatives & his/her ex-
pression over others thought) means extracting ones behavior. Den-
ing some determined hypothesis if we make machine able to under-
stand humans sentiment and interest, it is possible to recommend
a user on his/her personal interest on basis of his/her sentiment by
machine. Our main approach is suggesting one regarding his/her spe-
cic interests that anticipated based on his/her respective public data
analysis which can be extended to further business analysis to suggest
dierent companies products or services depend on consumer personal
choice. This automation would also help to choose the correct candi-
date for any questionnaire. And anyone to know about his/her own.
How his/her behavior may inuence others. It is possible to Easily
select one for leadership, People who seem to be eager with, People
who have chance to oppose, Find out a dependable one. . . .
Acknowledgements:
Special thanks to our honorable teacher and supervisor Md. Saddam Hossain
Faculty, Department of CS.American International University-Bangladesh.
3
1 Introduction
With millions of users, social networking services like Facebook and Twitter
have become some of the most popular internet applications. These applica-
tions are the source of knowledge and information. The rich knowledge that
has accumulated in these social sites enables a variety of recommendation
systems for new friends and media [1]. To use such opportunity, it is pos-
sible to create automated system that can categorize users according to big
5 personality factor. To categorize users in such categorization system, it is
needed to collect users data without interfering users daily activities. Thus
the system will help others and user itself to know about himself or others.
For example: An employee need vacation and if boss is listed as friend on
OSN then employee get chances to apply for his demand according to boss
motive generated by the system (Neuroticism indicates chances of higher
chances disagree when agreeableness indicates chances of higher agree). On-
line Social Networks (OSN) deals with big data, to analysis such data; sys-
tem will be able to predict the suitable person for leadership, people who
may oppose. These opportunities and challenges have been tackled by many
new approaches to recommendation systems, using dierent data sources and
methodologies to generate dierent kinds of recommendations. In this article
we provide a description of such system. From the very beginning, Consumer
interests have a great impression on business policy. Oering the right prod-
ucts or services to the right customers is the main theme of every successful
business policy. Many business organizations can be beneted by using the
data collected from the OSN. And at present the popularity of social net-
works is rising very rapidly. From the sociologists point of view, OSN can
be characterized as collective goods produced through computer mediated
collective action [2]. Users spend a huge amount of time involving in OSN of
their daily life and share a lot of information about them and their friends
and family. So, this is a great opportunity to know about peoples sentiment
and interest. It is possible to understand the behavior of user from OSN as
it becomes a crucial factor for advertising policies and better site design. In
particular giving the success of item recommendation systems to commercial
websites, such as Amazon.com and Net x, it is considered worthwhile to
revisit the recommendation problem through the novel perspective of social
networking. In general, recommendation systems aim to provide personal-
ized recommendations of items to users based on their previous behavior as
well as on other information gathered by item descriptions and user proles.
4
Our experiment is based on Twitter and Facebook; the most popular OSN
website having a large place of advertisement. These websites have a huge
amount of user and the user feels comfortable using these sites because of
the user friendly features of micro blogging, status update, photo and video
sharing, comment on posts, joining and creating groups, like page/s, create
events, playing games and so on. We aim to analyze user sentiment through
his past activity while using the OSN and map it on Big ve factor. Finding
out a set or a particular user interests eld, and recommend him or her by
giving informative services.
2 Previous work
OSN is the practice of expanding the number of ones business and social
contacts by making connections through individuals [3]. In this era of internet
OSN is extremely popular among people [4]. Two third of world population
spent 10
2.1 Location Based Social Network:
A social network is a social structure made up of individuals connected
by one or more specic types of interdependency, such as friendship,
common interests, and shared knowledge. Generally, a social network-
ing service builds on and reects the real-life social networks among
people through online platforms such as a website, providing ways for
users to share ideas, activities, events, and interests over the Internet.
The increasing availability of location-acquisition technology (for ex-
ample GPS and Wi-Fi) empowers people to add a location dimension
to existing online social networks in a variety of ways. For example,
users can upload location-tagged photos to a social networking service
such as Flickr [12], comment on an event at the exact place where the
event is happening (for instance, in Twitter [13]), share their present
location on a website (such as Foursquare [14]) for organizing a group
activity in the real world, record travel routes with GPS trajectories
to share travel experiences in an online community. Here, a location
can be represented in absolute (latitude-longitude coordinates), relative
(100 meters north of the Space Needle), and symbolic (home, oce, or
shopping mall) form. Also, the location embedded into a social network
can be a stand-alone instant location of an individual, like in a bar at
9pm, or a location history accumulated over a certain period, such as
5
a GPS trajectory: a cinema a restaurant a park a bar.
The dimension of location brings social networks back to reality, bridg-
ing the gap between the physical world and online social networking
services. For example, a user with a mobile phone can leave his/her
comments with respect to a restaurant in an online social site (after
nishing dinner) so that the people from his/her social structure can
reference his/her comments when they later visit the restaurant. In this
example, users create their own location-related stories in the physical
world and browse other peoples information as well. An online social
site becomes a platform for facilitating the sharing of peoples experi-
ences. Furthermore, people in an existing social network can expand
their social structure with the new interdependency derived from their
locations. As location is one of the most important components of
user context, extensive knowledge about an individuals interests and
behavior can be learned from her locations. For instance, people who
enjoy the same restaurant can connect with each other. Individuals
constantly hiking the same mountain can be put in contact with each
other to share their travel experiences. Sometimes, two individuals who
do not share the same absolute location can still be linked as long as
their locations are indicative of a similar interest, such as beaches or
lakes.
These kinds of location-embedded and location-driven social structures
are known as location-based social networks, formally dened as fol-
lows:
A location-based social network (LBSN) does not only mean adding a
location to an existing social network so that people in the social struc-
ture can share location embedded information, but also consists of the
new social structure made up of individuals connected by the interde-
pendency derived from their locations in the physical world as well as
their location-tagged media content, such as photos, video, and texts.
Here, the physical location consists of the instant location of an indi-
vidual at a given timestamp and the location history that an individual
has accumulated in a certain period. Further, the interdependency in-
cludes not only that two persons co-occur in the same physical location
or share similar location histories but also the knowledge, e.g., common
interests, behavior, and activities, inferred from an individuals location
(history)and location-tagged data.
In a location-based social network, people can not only track and
6
share the location-related information of an individual via either mo-
bile devices or desktop computers, but also leverage collaborative so-
cial knowledge learned from user generated and location-related con-
tent, such as GPS trajectories and geo-tagged photos. One example
is determining this summers most popular restaurant by mining peo-
ples geo-tagged comments. Another example could be identifying the
most popular travel routes in a city based on a large number of users
geo-tagged photos. Consequently, LBSNs enable many novel applica-
tions that change the way we live, such as physical location (or ac-
tivity) recommendation systems [15,16] and travel planning , while
oering many new research opportunities for social network analysis
(like user modeling in the physical world and connection strength anal-
ysis)[17,18] , spatio-temporal data mining [19], ubiquitous computing
[20], and spatio-temporal databases [19, 21] Existing applications pro-
viding location-based social networking services can be broadly cate-
gorized into three folds: geo-tagged-media-based, point-location-driven
and trajectory-centric.
Geo-tagged-media-based. Quite a few geo-tagging services enable
users to add a location label to media content such as text, pho-
tos, and videos generated in the physical world. The tagging can
occur instantly when the medium is generated, or after a user has
returned home. In this way, people can browse their content at
the exact location where it was created (on a digital map or in the
physical world using a mobile phone). Users can also comment on
the media and expand their social structures using the interdepen-
dency derived from the geo-tagged content (for example, in favor
of the same photo taken at a location). Representative websites
of such location-based social networking services include Flickr,
Panoramio, and Geo-twitter. Though a location dimension has
been added to these social networks, the focus of such services
is still on the media content. That is, location is used only as
a feature to organize and enrich media content while the major
interdependency between users is based on the media itself.
Point-location-driven. Applications like Foursquare and Google
7
Latitude encourage people to share their current locations, such
as a restaurant or a museum. In Foursquare, points and badges
are awarded for checking in at venues. The individual with the
most number of check-ins at a venue is crowned Mayor. With the
real-time location of users, an individual can discover friends (from
her social network) around her physical location so as to enable
certain social activities in the physical world, e.g., inviting people
to have dinner or go shopping. Meanwhile, users can add tips to
venues that other users can read, which serve as suggestions for
things to do, see, or eat at the location. With this kind of ser-
vice, a venue (point location) is the main element determining the
in-terdependency connecting users, while user-generated content
such as tips and badges feature a point location.
Trajectory-centric.In a trajectory-centric social networking ser-
vice, such as .Bikely, SportsDo, and Microsoft GeoLife, users pay
attention to both point locations (passed by a trajectory) and the
detailed route connecting these point locations. These services do
not only tell users basic information, such as distance, duration,
and velocity, about a particular trajectory, but also show a users
experiences represented by tags, tips, and photos for the trajec-
tory. In short, these services provide how and what information
in addition to where and when. In this way, other people can ref-
erence a users travel/sports experience by browsing or replaying
the trajectory on a digital map, and follow the trajectory in the
real world with a GPS-phone.
Table 1 provides a brief comparison among the set here services. The
major dierences between the point-location-driven and the trajectory-
centric LBSN lie in two aspects. One is that a trajectory oers richer
information than a point location, such as how to reach a location, the
temporal duration that a user stayed in a location, the time length for
travelling between two locations, and the physical/trac conditions of
a route. As a result, we are more likely to accurately understand an
individuals behavior and interests in a trajectory-centric LBSN. The
other is that in a point-location-driven LBSN users usually share their
8
real-time location while the trajectory-centric more likely delivers his-
torical locations as users typically prefer to upload a trajectory after a
trip has nished (though it can be operated in a continuously upload-
ing manner). This property could compromise some scenarios based
on the real-time location of a user, however, it reduces to some extent
the privacy issues in a location-based social network. In other words,
when people see a users trajectory the user is no longer there Table 1
Comparison of dierent location-based social networking services
Table 1: Data tables
LBSN Services Focus Real-time Information
Geo-tagged-media-based Media Normal Poor
Point-location-driven Point location Instant Norma
l Trajectory-centric Trajectory Relatively Slow Rich
Actually, the location data generated in the rst two LBSN services
can be converted into the form of a trajectory which might be used by
the third category of LBSN service. For example, if we sequentially
connect the point locations of the geo-tagged photos taken by a user
over several days, a sparse trajectory can be formulated. Likewise, the
check-in records of an individual ordered by time can be regarded as a
low-sampling-rate trajectory. However, due to the sparseness, i.e., the
distance and time interval between two consecutive points in a trajec-
tory could be very big, the uncertainty existing in a single trajectory
from the rst two services is increased. Aiming to put these trajecto-
ries into trajectory-centric LBSN services, we need to use them in a
collective and collaborative way.
The following sections will pay closer attention to trajectory data,
which is the most complex data structure to be found in the three
LBSN services, and provides the richest information. If it is handled
well, other data sources become easier to deal with. Moreover, as men-
tioned above, location data can be converted into a trajectory on many
occasions. Consequently, some methodologies designed for trajectory
data can be employed by the rst two LBSN services.
9
2.2 Collaborative Recommendation Based Social Network:
With the recent advances in technology, there is an emerging pres-
ence of social media and social networking systems. In the case of
multimedia enriched social network systems, such as last.fm, the col-
lective goods are musical tracks and the collective action is the process
of crafting individual proles of musical preference and linking them
either explicitly, via bonds of friendship, or implicitly, through collab-
orative annotation [22].
This collective action leads to the creation of an implicit social net-
working structure, which we aim to further explore. In particular given
the success of item recommendation systems in commercial websites,
such as Amazon.com and Net x, it is considered worthwhile to revisit
the recommendation problem through the novel perspective of social
networking. In general, recommendation systems aim to provide per-
sonalized recommendations of items to users based on their previous
behavior as well as on other information gathered by item descriptions
and user proles.
However, no emphasis has been placed yet on personalization based
explicitly on social networks. The reason is that despite there is an
increasing interest in the exploration of social networks, there does not
exist a concrete dataset that includes both explicit bonds of friendships
among users and free-form collaborative annotation of items. This is
due to that most social media systems do not allow for free access to
all user pro les or lists of friends.
Given the incentives of the widespread add option of social networks
and of the lack of some previous study that directly addresses the prob-
lem of eciently integrating the added value knowledge provided by
those networks in the eld of collaborative recommendation, we pro-
pose a new methodology that tackles the aforementioned issues. Within
this context we make the following contributions:
We introduce a dataset based on data from the last.fm social net-
work that describes a social graph among users, tracks and tags,
eectively including bonds of friendship and collaborative anno-
tation.
10
We evaluate a Random Walk with Restarts (RWR) model on this
dataset and show that the incorporation of friendship and social
tagging can improve the performance of an item recommendation
system.
We show that the RWR method outperforms the standard Col-
laborative Filtering (CF) method, which we also evaluate against
the same dataset.
We show that our method using the RWR method requires no
training and successfully manages to capture
We may distinguish two broad categories of collaborative recommen-
dation systems, namely content-based and collaborative ltering. A
content-based system selects items based on the correlation between
the content of the items (e.g. keywords describing the items, such as
album genre, artists, etc., for music tracks) and the users preferences
[23]. However, it is limited to dictionary-bound relations between the
keywords used by users and the descriptions of items and therefore does
not explore implicit associations between users.
Collaborative ltering systems are divided into two categories, i.e. memory-
based and model-based. In the memory based systems [24] we calculate
the similarity between all users, based on their ratings of items using
some heuristic measure such as the cosine similarity or the Pearson
correlation score. Then we predict a missing rate by aggregating the
ratings of the k nearest neighbors of the user we want to recommend
to. The problem with memory-based systems is that we have to de-
cide on a rather arbitrary basis over parameters such as the number
of neighbors. What is more, in the case of social networks there is no
straightforward way to introduce similarities between users based on
friendships and social tagging, other than some way of ad hoc interpo-
lation of similarity weights from those dierent sources.
The model-based ltering systems assume that the users build up clus-
ters based on their similar behavior in rating of items. A model is
learned based on patterns recognized in the rating behaviors of users
using clustering, Bayesian networks and other machine learning tech-
niques [25, 26]. The problem with model-based methods is that it is
11
necessary to ne-tune several parameters of the model as well as the
fact that the models produced might not generalize well in radically
dierent context. What is more, as in the case of memory-based sys-
tems extra eort and training needs to be done in order to introduce
knowledge from social networks.
Many research publications have been lately revolving around the area
of social media. In particular, several studies focus on dataset collec-
tion and analysis from social networks. Das et al. [27] propose sample
based algorithms that capture information in the neighborhood of a
user in dynamic social networks utilizing random walks. Halpin et
al. [8] study the distribution of tags in the social bookmarking site
del.icio.us and propose a generative model of collaborative tagging in
order to evaluate the dynamics that lie beneath the act of collabora-
tive recommendation. Their ndings prove that the dataset collected
follows a power-law distribution. Even though both studies examine
social networks that are based on social tagging, they do not explore the
dynamics of friendships among users. Taking into account the power
of free-form tagging of items by users other than their authors/owners,
researchers also focus on tag recommendation. Subramanya and Liu
[28] propose a system that automatically recommends tags for blogs,
using similarity ranking in a manner similar to collaborative ltering
techniques. Stromhaier [29] studies a novel idea in tag recommenda-
tion, which bridges the gap between the keywords issued by a user in
a query and the tags actually used by a social system. He argues that
the tags used by a user when performing a query exhibit his or her
intent, whereas the annotations of items describe content semantics.
As a result, he proposes a new form of purpose tags, which extract the
intent of the user and facilitate goal oriented search in a social network.
Both studies underline the importance and discriminative power of so-
cial tagging, which is also validated by our work.
Several studies exist in the eld of applying Random Walks on bipartite
graphs. Craswell and Szummer [30] study a clickthrough data graph in
order to perform item recommendation. Nevertheless, no social content
is available between users. Yildirim and Krishnamoorthy [31] propose
a novel recommendation algorithm which performs Random Walks on
12
a graph that denotes similarity measures between items. They evalu-
ate their system using data from Movie Lens. Although, the use of the
Random Walk model performs well in the context of recommendation,
their use of an Item-Item similarity matrix raises some issues as to the
ability of the system to extend when other similarities are introduced
based on social tagging. Recent work has also been done in the eld of
applying Random Walks over a social graph instead of bipartite graphs,
similar to what we propose in this paper. Clements et al. [32] propose
a single term query system performing Random Walks on graphs in-
cluding users, items and tags. They use data from LibraryThing, an
online book catalogue where users rate and tag books they have read.
Due to lack of ground truth, they assume that the tags assigned to an
item by each user are the same as they would use as query terms to
retrieve the annotated item. We argue that this assumption is rather
strong and that a user experiment would be more appropriate in order
to properly establish the ground truth.
Hotho et al. evaluate a variation of adapted PageRank on a dataset
from del.icio.us, exploring folksonomies of bookmarks based also on
collaborative annotation [33] . However, since they evaluate their pro-
posed algorithm empirically, any comparison attempts to their results
becomes cumbersome. Although both studies are close to our approach,
we use a dierent model, namely RWR, in which we explicitly include
friendships in our dataset and perform collaborative recommendations
instead of queries on the graph.
2.3 Sentiment Intensity Analysis of Informal Texts:
The proliferation of social networks such as blogs, forums and other
online means of expression and communication have resulted in a land-
scape where people are able to freely discuss online through a variety
of means and applications [34].
Probably one of the most novel and interesting way of communication
in cyberspace is through 3D virtual environments. In such environ-
ments, people, represented by their avatars, socialize and interact with
each other and with virtual humans operated by machines i.e., com-
13
puter systems. Examples of such virtual environments are ourishing
and include Second Life World of Warcraft [35], There [36], IMVU [37],
Moove [38], Activeworlds [39], Bluemars [40], Club Cooee [41], etc.
Despite the fact that the graphics of those environments remain rela-
tively poor, futuristic movies such as Avatar [42] provide an example of
sophisticated landscapes and renderings that will be attainable by such
environments in the foreseeable future. However, regardless of how at-
tractive and realistic such articial 3D worlds become, they will always
remain heavily dependant on the quality of human communication that
takes place within them. As shown in [43, 37], communication in en-
vironments that are not limited to one, textual modality, consists of
not just semantic data transfer, but also of dense non-verbal commu-
nication where sentiment plays an important role. Moreover, without
emotion no consistent and coherent (virtual) body language is possi-
ble. Such primordial movements include facial expressions, eye looks,
arm-language coordination, etc.
Sentiment detection from textual utterances can play an important role
in the development of realistic and interactive dialog systems. Such
systems serve various educational, business or entertainment oriented
functions and also include systems that are deployed in 3D virtual en-
vironments. With the aid of dialog coherence modules, conversational
systems aim at a realistic interaction ow at the emotional level e.g.,
Aect Listeners [44] and can greatly benet from the correct identi-
cation of the emotional state of their participants. Taking into consid-
eration that the majority of input to practical conversational systems
constitute of short, informal, textual exchanges, it is essential that the
sentiment analysis component integrated in the dialog system is able
to cope with this type of informal, often incomplete or ill-formed type
of communication.
Sentiment analysis, the process of automatically detecting if a text
segment contains emotional or opinionated content and extracting its
polarity or valence, is a eld of research that has received signicant
attention in recent years, both in academia and in industry. The afore-
mentioned increase of user-generated content on the web has resulted
in a wealth of information that is potentially of vital importance to
institutions and companies, providing them with data to research their
consumers, manage their reputations and identify new opportunities.
As a result, most of the research in the eld has been limited to product
14
reviews, where the aim is to predict whether the reviewer recommends
a product or not, based on the textual content of the review.
The focus of this paper is dierent. Instead of focusing our attention
to product reviews, we explore a more ubiquitous eld of informal, so-
cial interactions in cyberspace. The unprecedented popularity of social
platforms such as Facebook, Twitter, MySpace as well as 3D virtual
worlds has resulted in an unparallel increase of textual exchanges that
remains relatively unexplored especially in terms of its emotional con-
tent.
Specically, we aim to answer the following question: can lexicon-based
approaches perform more eectively than machine-learning approaches
in this domain? This question is particularly important, because pre-
vious research in sentiment analysis using product reviews has shown
that machine-learning approaches typically outperform lexicon-based
ones but no exploration of whether the same holds for informal, so-
cial interactions has been carried in the past. The dierence between
the two domains is numerous. Firstly, reviews tend to be longer and
more verbose than typical social interactions which may only be a few
words long and often contain signicant spelling errors [45]. Secondly,
no clear golden standard exists in the domain of informal communi-
cations with which to train a machine-learning classier in opposition
to the thumbs up or thumbs down feature of reviews. Lastly, social
exchanges on the web tend to be much more diverse in terms of their
topics with issues ranging from politics and recent news to religion
while in contrast; product reviews by denition have a specic subject,
i.e. the product under discussion. The study of emotional and social
interactions in virtual worlds implies the study of virtual human (VH)
behaviors. Two types of VH exist: avatars (i.e. the projection of a real
human in the 3D environment) and agents (i.e. the projection of an
autonomous machine simulating a human in the virtual world). These
VH types result in three possible types of communications: avatar to
avatar, agent to agent and avatar to agent. Each one of those has the
following interesting aspects respectively:
- A non verbal body language based on VH emotional states and mind
prole.
- A potential visualization of the interaction from a third VH that
15
should be represented by an avatar;
- A non-verbal communication for the human representation and an
action of agent strongly inuenced by interpreted emotions from
the avatar. It seems only logical that articial intelligence and
conversation systems would strongly benet these aspects in order
to make the communication more realistic. The structure of this
paper is as follows. The next section provides a brief overview of
relevant work in sentiment analysis. Section 3 presents the lexicon
based classier and section 4 presents the two machine-learning
classiers that will be used in this study. Section 5 describes the
data sets that were used and explains the experimental setup while
section 6 presents and analyzes the results.
Finally, we conclude and present some potential future directions of re-
search. Sentiment analysis, also known as opinion mining, has known
considerable interest recently. Most research has focused on analyz-
ing the content of either movie or general product reviews (e.g. [46]).
Attempts to expand the application of sentiment analysis to other do-
mains, such as debates [47], news and blogs [48] are also prominent. The
seminal book of Pang and Lee [49] presents a thorough analysis of the
work in the eld. In this section we will focus on the more prominent
work which is relevant to our approach. Pang et al. [46] were amongst
of the rst to explore the sentiment analysis of reviews, focusing on
machine-learning approaches. These approaches generally function as
follows: initially, a general inductive process learns the characteristics
of a class during a training phase, by observing the properties of a
number of pre classied documents (i.e. reference corpus ) and applies
the acquired knowledge to determine the best category for new, un-
seen documents, during testing. Pang et al. [46] experimented with
three dierent algorithms: Support Vector Machines (SVMs), Naive
Bayes and Maximum Entropy classiers, using a variety of features,
such as unigrams and bigrams, part-of-speech tags, binary and term
frequency feature weights and others. Their best attained accuracy
in a dataset consisting of movie reviews, was attained using a SVM
classier with binary features, although all three classiers gave very
comparable performance. Other approaches (e.g. [50, 51]) have focused
on extending the feature set with semantically or linguistically-driven
features in order to improve classication accuracy. Dictionary/lexicon-
16
based sentiment analysis is typically based on lists of words with some
sort of pre-determined emotional weight. Examples of such dictionar-
ies include the General Inquirer (GI) dictionary [52] and the Linguistic
Inquiry and Word Count (LIWC) software [53], which are also used in
the present study. Both lexicons are build with the aid of experts that
classify certain tokens in terms of their aective content (e.g. positive
or negative). The Aective Norms for English Words (ANEW) lexicon
[39] contains ratings of terms on a nine-point scale in regard to three
individual dimensions: valence, arousal and dominance. The ratings
were produced manually by psychology class students. Ways to pro-
duce such emotional dictionaries in an automatic or semi-automatic
fashion have also been introduced in research [40]. Emotional dictio-
naries have mostly been utilized in psychology or sociology oriented
research [54].
The idea of emotional conversationalists is relatively old. First at-
tempts to create such a system can be traced back to Parry [55], a
chatterbot intended for studying the nature of paranoia and able to
express fears, anxieties or beliefs. More recent work include research
on the development of synthetic characters and chatterbots with per-
sonalities [35] and studies on emotional responses and their inuence
on the creation of believable agents or interactive virtual personalities
[36]. In [56] authors focused on the role of emotions for gaining rapport
in spoken dialog systems by rendering responses that contain suitable
emotion, both lexically and auditory. Studies on the role of facial ex-
pressions in building rapport in a virtual human-users interactions were
conducted in [57]. A chatterbot system that generates emotional re-
sponses by selecting and displaying expressive images of the character
emulated by the chatterbot was presented in [58]. It has been almost
two decades that emotional communication for virtual worlds is a chal-
lenging research eld. One of the pioneer paper has been proposed
by Cassel et al. [42]. In the proposed system, conversations between
multiple human-like agents were automatically generates and animates
with appropriate and synchronized speech, intonation, facial expres-
sions, and hand gestures proposed numerous ways to design personal-
ity and emotion models for virtual humans. More recently, predicted a
specic personality and emotional states from hierarchical fuzzy rules
to facilitate personality and emotion control, and in 2009, Pelachaud
et al. [32] developed a model of behavior expressivity using a set of six
17
parameters that act as modulation of behavior animation. Finally, this
year, [60] introduced a graphical representation of human emotion ex-
tracted from text sentences. The main contributions of that approach
included an original pipeline that extracts, processes, and renders emo-
tion of 3D VH. Additionally, the paper presented methods to optimize
the computational pipeline so that real time virtual reality rendering
can be achieved on common PCs. Lastly, it was demonstrated how the
Poisson distribution can be utilized to transfer database extracted lex-
ical and language parameters into coherent intensities of valence and
arousal (i.e. parameters of Russells circumplex model of emotion).
2.4 Big 5 modeling:
At present, many researchers believe that there are ve core personality
traits and the evidence of this theory has been growing over the past 50
years [6]. From the point of view of a sociologist, social media can be
characterized as collective goods produced through computer-mediated
collective action [7]. While people of each category have dierent atti-
tude corresponding sites, taste of products, dierent skill to accomplish
work. The ve factors are Extraversion, Agreeableness, Conscientious-
ness, Neuroticism and Openness [8]. The people of dierent category
have dierent way to express their thoughts [6] and OSN user have
dierent level of signicance to express their thoughts or express their
behavior[5]. The user of OSN categorize according to Big Five factors
[9]. The behavior of OSN user varies from users location to location
[10]. But there is a similarity having same behavior in people from
same or nearby location [11]. Also behavior varies from dierent aged
people.
The personality traits used in the 5 factor model are Extraversion,
Agreeableness, Conscientiousness, Neuroticism and Openness to ex-
perience [61]. It is important to ignore the positive or negative as-
sociations that these words have in everyday language. For example,
Agreeableness is obviously advantageous for achieving and maintaining
popularity. Agreeable people are better liked than disagreeable people.
On the other hand, agreeableness is not useful in situations that require
tough or totally objective decisions. Disagreeable people can make ex-
18
cellent scientists, critics, or soldiers. Remember, none of the ve traits
is in themselves positive or negative, they are simply characteristics
that individuals exhibit to a greater or lesser ex tent.
Each of these 5 personality traits describes, relative to other people,
the frequency or intensity of a person s feelings, thoughts, or behav-
iors. Everyone possesses all 5 of these traits to a greater or lesser
degree. For example, two individuals could be described as agreeable
(agreeable people value getting along with others). But there could
be signicant variation in the degree to w hich they are both agree-
able. I n other words, all 5 personality traits exist on a continuum (see
diagram) rather than as attributes that a person does or does not have.
Extraversion
Extraversion is marked by pronounced engagement with the exter-
nal world. Extraverts enjoy being with people, are full of energy,
and often experience positive emotions. They tend to be enthu-
siastic, action-oriented, individuals who are likely to say Yes!
or Let s go! to opportunities for excitement. I n groups
they like to talk, assert themselves, and draw attention to them-
selves. Introverts lack the exuberance, energy, and activity levels
of extraverts. They tend to be quiet, low -key, deliberate, and
disengaged from the social world. Their lack of social involvement
should not be interpreted as shyness or depression; the introvert
simply needs less stimulation than an extravert and prefers to be
alone. The independence and reserve of the introvert is sometimes
mistaken as unfriendliness or arrogance. In reality, an introvert
who scores high on the agreeableness dimension will not seek oth-
ers out but w ill be quite pleasant w hen approached.
Agreeableness
Agreeableness reects individual dierences in concern with coop-
eration and social harmony. Agreeable individuals value getting
along with others. They are therefore considerate, friendly, gener-
ous, helpful, and willing to compromise their interests with others
. Agreeable people also have an optimistic view of human nature.
They believe people are basically honest, decent, and trustworthy.
Disagreeable individuals place self-interest above getting along w
ith others. They are generally unconcerned with others w ell-
19
being, and therefore are unlikely to ex tend themselves for other
people. Sometimes their skepticism about others motives causes
them to be suspicious, unfriendly, and uncooperative. Agreeable-
ness is obviously advantageous for attaining and maintaining pop-
ularity. Agreeable people are better liked than disagreeable peo-
ple. On the other hand, agreeableness is not useful in situations
that require tough or absolute objective decisions. Disagreeable
people can make excellent scientists, critics, or soldiers.
Conscientiousness
Conscientiousness concerns the way in which w e control, regu-
late, and direct our impulses. Impulses are not inherently bad;
occasionally time constraints require a snap decision, and acting
on our rst impulse can be an eective response. Also, in times of
play rather than work, acting spontaneously and impulsively can
be fun. Impulsive individuals can be seen by others as colorful,
fun-to-be-with, and z any.
Nonetheless, acting on impulse can lead to trouble in a number of
ways. Some impulses are antisocial. Uncontrolled antisocial acts
not only harm other members of society, but also can result in ret-
ribution toward the perpetrator of such impulsive acts. Another
problem with impulsive acts is that they often produce immediate
rewards but undesirable, long-term consequences. Examples in-
clude excessive socializing that leads to being red from ones job,
hurling an insult that causes the breakup of an important rela-
tionship, or using pleasure-inducing drugs that eventually destroy
one s health.
Impulsive behavior, even w hen not seriously destructive, dimin-
ishes a person s eectiveness in signicant ways. Acting impul-
sively disallow s contemplating alternative courses of action, some
of which would have been wiser than the impulsive choice. Impul-
sivity also sidetracks people during projects that require organized
sequences of steps or stages. Accomplishments of an impulsive
person are therefore small, scattered, and inconsistent.
A hallmark of intelligence, w hat potentially separates human be-
ings from earlier life forms, is the ability to think about future
consequences before acting on an impulse. Intelligent activity in-
volves contemplation of long-range goals, organizing and planning
20
routes to these goals, and persisting toward one s goals in the face
of short-lived impulses to the contrary. The idea that intelligence
involves impulse control is nicely captured by the term prudence,
an alternative label for the Conscientiousness domain. Prudent
means both wise and cautious. Persons w ho score high on the
Conscientiousness scale are, in fact, perceived by others as intelli-
gent.
The benets of high conscientiousness are obvious. Conscientious
individuals avoid trouble and achieve high levels of success through
purposeful planning and persistence. They are also positively re-
garded by others as intelligent and reliable. On the negative side,
they can be compulsive perfectionists and workaholics. Further-
more, extremely conscientious individuals might be regarded as
stuy and boring. Unconscientious people may be criticized for
their unreliability, lack of ambition, and failure to stay within the
lines, but they w ill experience many short-lived pleasures and
they will never be called stuy.
Neuroticism
Freud originally used the term neurosis to describe a condition
marked by mental distress, emotional suering, and an inability
to cope eectively with the normal demands of life. H e sug-
gested that everyone show s some signs of neurosis, but that w
e dier in our degree of suering and our specic symptoms of
distress. Today neuroticism refers to the tendency to experience
negative feelings. Those w ho score high on Neuroticism may ex-
perience primarily one specic negative feeling such as anxiety,
anger, or depression, but are likely to experience several of these
emotions. People high in neuroticism are emotionally reactive.
They respond emotionally to events that would not aect most
people, and their reactions tend to be more intense than normal.
They are more likely to interpret ordinary situations as threaten-
ing, and minor frustrations as hopelessly dicult. Their negative
emotional reactions tend to persist for unusually long periods of
time, which means they are often in a bad mood. These problems
in emotional regulation can diminish a neurotic s ability to think
clearly, make decisions, and cope eectively with stress.
21
At the other end of the scale, individuals w ho score low in neuroti-
cism are less easily upset and are less emotionally reactive. They
tend to be calm, emotionally stable, and free from persistent neg-
ative feelings. Freedom from negative feelings does not mean that
low scorers experience a lot of positive feelings; frequency of pos-
itive emotions is a component of the Extraversion domain.
Openness to Experience
Openness to Experience describes a dimension of cognitive style
that distinguishes imaginative, creative people from down-to-earth,
conventional people. Open people are intellectually curious, ap-
preciative of art, and sensitive to beauty. They tend to be, com-
pared to closed people, more aw are of their feelings. They tend
to think and act in individualistic and nonconforming ways. In-
tellectuals typically score high on Openness to Experience; con-
sequently, this factor has also been called Culture or Intellect.
Nonetheless, Intellect is probably best regarded as one aspect of
openness to experience. Scores on Openness to Experience are
only modestly related to years of education and scores on stan-
dard intelligent tests.
Another characteristic of the open cognitive style is a facility for
thinking in symbols and abstractions far removed from concrete
experience. Depending on the individuals specic intellectual abil-
ities, this symbolic cognition may take the form of mathematical,
logical, or geometric thinking, artistic and metaphorical use of
language, music composition or performance, or one of the many
visual or performing arts. People with low scores on openness to
experience tend to have narrow , common interests. They prefer
the plain, straightforward, and obvious over the complex, ambigu-
ous, and subtle. They may regard the arts and sciences with suspi-
cion, regarding these endeavors as abstruse or of no practical use.
Closed people prefer familiarity over novelty; they are conservative
and resistant to change. Openness is often presented as health-
ier or more mature by psychologists, w ho are often themselves
open to experience. However, open and closed styles of thinking
are useful in dierent environments. The intellectual style of the
open person may serve a professor w ell, but research has show
22
n that closed thinking is related to superior job performance in
police work, sales, and a number of service occupations.
Subordinate Personality Traits or Facets
Each of the big 5 personality traits is made up of 6 facets or sub traits.
These can be assessed independently of the trait that they belong to.
Extraversion Facets:
Friendliness. Friendly people genuinely like other people and openly
demonstrate positive feelings toward others. They make friends
quickly and it is easy for them to form close, intimate relation-
ships. Low scorers on Friendliness are not necessarily cold and
hostile, but they do not reach out to others and are perceived as
distant and reserved.
Gregariousness. Gregarious people nd the company of others
pleasantly stimulating and rewarding. They enjoy the excitement
of crowds. Low scorers tend to feel overwhelmed by, and therefore
actively avoid, large crowds. They do not necessarily dislike being
with people sometimes, but their need for privacy and time to
themselves is much greater than for individuals w ho score high
on this scale.
Assertiveness. High scorers Assertiveness like to speak out, take
charge, and direct the activities of others. They tend to be leaders
in groups. Low scorers tend not to talk much and let others control
the activities of groups.
Activity Level. Active individuals lead fast-paced, busy lives.
They move about quickly, energetically, and vigorously, and they
are involved in many activities. People who score low on this scale
follow a slower and more leisurely, relaxed pace.
Excitement-Seeking. High scorers on this scale are easily bored
without high levels of stimulation. They love bright lights and
hustle and bustle. They are likely to take risks and seek thrills.
Low scorers are overwhelmed by noise and commotion and are
adverse to thrill-seeking.
23
Cheerfulness. This scale measures positive mood and feelings, not
negative emotions (which are a part of the Neuroticism domain).
Persons w ho score high on this scale typically experience a range
of positive feelings, including happiness, enthusiasm, optimism,
and joy. Low scorers are not as prone to such energetic, high
spirits.
Agreeableness Facets:
Trust. A person with high trust assumes that most people are
fair, honest, and have good intentions. Persons low in trust may
see others as selsh, devious, and potentially dangerous.
Morality. High scorers on this scale see no need for pretence or
manipulation when dealing with others and are therefore candid,
frank, and sincere. Low scorers believe that a certain amount
of deception in social relationships is necessary. People nd it
relatively easy to relate to the straightforward high-scorers on this
scale. They generally nd it more dicult to relate to the low -
scorers on this scale. I t should be made clear that low scorers are
not unprincipled or immoral; they are simply more guarded and
less willing to openly reveal the whole truth.
Altruism. Altruistic people nd helping other people genuinely re-
warding. Consequently, they are generally willing to assist those w
ho are in need. Altruistic people nd that doing things for others
is a form of self-fulllment rather than self-sacrice. Low scorers
on this scale do not particularly like helping those in need. Re-
quests for help feel like an imposition rather than an opportunity
for self-fulllment.
Cooperation. Individuals w ho score high on this scale dislike
confrontations. They are perfectly willing to compromise or to
deny their own needs in order to get along with others. Those w
ho score low on this scale are more likely to intimidate others to
get their way.
Modesty. High scorers on this scale do not like to claim that they
are better than other people. I n some cases this attitude may
derive from low self-condence or self-esteem. Nonetheless, some
24
people with high self-esteem nd immodesty unseemly. Those w
ho are willing to describe themselves as superior tend to be seen
as disagreeably arrogant by other people.
Sympathy. People w ho score high on this scale are tender-hearted
and compassionate. They feel the pain of others vicariously and
are easily moved to pity. Low scorers are not aected strongly
by human suering. They pride themselves on making objective
judgments based on reason. They are more concerned with truth
and impartial justice than with mercy.
Conscientiousness Facets:
Self-Ecacy. Self-Ecacy describes condence in ones ability to
accomplish things. High scorers believe they have the intelligence
(common sense), drive, and self-control necessary for achieving
success. Low scorers do not feel eective, and may have a sense
that they are not in control of their lives.
Orderliness. Persons with high scores on orderliness are well-
organized. They like to live according to routines and schedules.
They keep lists and make plans. Low scorers tend to be disorga-
nized and scattered.
Dutifulness. This scale reects the strength of a persons sense
of duty and obligation. Those w ho score high on this scale have
a strong sense of moral obligation. Low scorers nd contracts,
rules, and regulations overly conning. They are likely to be seen
as unreliable or even irresponsible.
Achievement-Striving. Individuals who score high on this scale
strive hard to achieve excellence. Their drive to be recognized
as successful keeps them on track toward their lofty goals. They
often have a strong sense of direction in life, but extremely high
scores may be too single-minded and obsessed with their work.
Low scorers are content to get by with a minimal amount of work,
and might be seen by others as lazy.
Self-Discipline. Self-discipline-w hat many people call will-power-
refers to the ability to persist at dicult or unpleasant tasks until
they are completed. People w ho possess high self-discipline are
25
able to overcome reluctance to begin tasks and stay on track de-
spite distractions. Those with low self-discipline procrastinate and
show poor follow -through, often failing to complete tasks-even
tasks they w ant very much to complete.
Cautiousness. Cautiousness describes the disposition to think
through possibilities before acting. High scorers on the Cautious-
ness scale take their time w hen making decisions. Low scorers
often say or do rst thing that comes to mind without deliberating
alternatives and the probable consequences of those alternatives.
Neuroticism Facets:
Anxiety. The ght-or-ight system of the brain of anxious indi-
viduals is too easily and too often engaged. Therefore, people w
ho are high in anxiety often feel like something dangerous is about
to happen. They may be afraid of specic situations or be just
generally fearful. They feel tense, jittery, and nervous.
Anger. Persons w ho score high in Anger feel enraged w hen things
do not go their w ay. They are sensitive about being treated
fairly and feel resentful and bitter when they feel they are being
cheated. This scale measures the tendency to feel angry; whether
or not the person ex presses annoyance and hostility depends on
the individuals level on Agreeableness. Low scorers do not get
angry often or easily.
Depression. This scale measures the tendency to feel sad, dejected,
and discouraged. High scorers lack energy and have dicult initi-
ating activities. Low scorers tend to be free from these depressive
feelings.
Self-Consciousness. Self-conscious individuals are sensitive about
w hat others think of them. Their concern about rejection and
ridicule cause them to feel shy and uncomfortable abound others.
They are easily embarrassed and often feel ashamed. Their fears
that others w ill criticize or make fun of them are exaggerated
and unrealistic, but their awkwardness and discomfort may make
these fears a self-fullling prophecy. Low scorers, in contrast, do
not suer from the mistaken impression that everyone is watching
and judging them. They do not feel nervous in social situations.
26
Immoderation. Immoderate individuals feel strong cravings and
urges that they have diculty resisting. They tend to be ori-
ented toward short-term pleasures and rewards rather than long-
term consequences. Low scorers do not experience strong, irre-
sistible cravings and consequently do not nd themselves tempted
to overindulge.
Vulnerability. High scorers on Vulnerability experience panic, con-
fusion, and helplessness when under pressure stress. Low scorers
feel more poised, condent, and clear-thinking when stressed.
Openness Facets:
Imagination. To imaginative individuals, the real world is often
too plain and ordinary. High scorers on this scale use fantasy as a
w ay of creating a richer, more interesting world. Low scorers are
on this scale are more oriented to facts than fantasy.
Artistic Interests. High scorers on this scale love beauty, both in
art and in nature. They become easily involved and absorbed in
artistic and natural events. They are not necessarily artistically
trained or talented, although many will be. The dening features
of this scale are interest in, and appreciation of natural and arti-
cial beauty. Low scorers lack aesthetic sensitivity and interest in
the arts.
Emotionality. Persons high on Emotionality have good access to
and awareness of their own feelings. Low scorers are less aw are
of their feelings and tend not to ex press their emotions openly.
Adventurousness. High scorers on adventurousness are eager to
try new activities, travel to foreign lands, and experience dierent
things. They nd familiarity and routine boring, and will take a
new route home just because it is dierent. Low scorers tend to
feel uncomfortable with change and prefer familiar routines
Intellect. Intellect and artistic interests are the two most impor-
tant, central aspects of openness to experience. High scorers on
Intellect love to play with ideas. They are open-minded to new
and unusual ideas, and like to debate intellectual issues. They
27
enjoy riddles, puzzles, and brain teasers. Low scorers on Intel-
lect prefer dealing with people or things rather than ideas. They
regard intellectual exercises as a waste of time. Intellect should
not be equated with intelligence. Intellect is an intellectual style,
not an intellectual ability, although high scorers on Intellect score
slightly higher than low -Intellect individuals on standardized in-
telligence tests.
Liberalism. Psychological liberalism refers to a readiness to chal-
lenge authority, convention, and traditional values. In its most ex-
treme form, psychological liberalism can even represent outright
hostility toward rules, sympathy for law -breakers, and love of
ambiguity, chaos, and disorder. Psychological conservatives pre-
fer the security and stability brought by conformity to tradition.
Psychological liberalism and conservatism are not identical to po-
litical aliation, but certainly incline individuals toward certain
political parties
It is possible, although unusual, to score high in one or more facets of a
personality trait and low in other facets of the same trait. For ex ample,
you could score highly in Imagination, Artistic Interests, Emotionality
and Adventurousness, but score low in Intellect and Liberalism.
3 Objective
The main objective of this paper is to draw user virtual behaviour model
analyzing his/her OSN existence and can recommend on basis of behavior
model. To reach our main goal, we need to consider few sub objectives as
below-
1. Analysis user behaviour in OSN for last few days.
2. Categorize his/her existence in big 5.
3. Percentage of existence in big 5 factors help to elaborate user behaviour
pattern.
4. Recommend some services/products to user on basis of his her behavior
model
28
4 Research Questions
Therefore main research question of this paper is How to categorize users
of OSN according to big 5 factor from their behaviours in OSN? and sub
research questions are
1. How OSN represent one user?
2. How could we analysis user behavior ?
3. How to categorize user behavior in big 5 factor?
5 Proposed Research Methodology
In this paper our aim is to make relationship among text corpus from social
network with psychological theory of personality. We will also try to imple-
ment a recommendation system based on behavior analysis. So correlational
and exploratory methodologies are used in this paper where our concept is
Behavior indicator is BIG 5 Modeling and variables are Extraversion, Neu-
roticism, Agreeableness, Openness and Conscientiousness.
29
5.1 Data Collection:
In this research to categorize users behavior the big data is collected.
The data is collected from OSN( Twitter). Where the data is stored in
OSN by user activity such as posts by own, posts by his friend, liked
30
pages category etc. The collected data will be the public data where is
no barrier to use those kind of data. At a time a users previous 20 days
data will be collected. Data will be directly collected by the system
from OSN by fully user authorization. After collection of Data it will
be stored in system database with security.
Twitter, a social network site, can be used for sentiment analysis as
it has a very large number of short messages created by its user [62].
So we used Twitter to collect users data. Using Twitter REST api
1.1, we collected public tweets and re-tweets. Our twitter app requires
users to authorize the app for extracting data from their proles. The
app will not collect data if users do not allow it to run. We made
sure all data we extract from twitter is public data. By calling get
statuses/user timeline and get statuses/retweets of me methods we can
collect the users tweets and retweets. The app can also collect public
data from proles that the user is currently following by using get
friends/ids method. The data we collected are in json format and the
app can write the data to text les. As separated les are easier to
use we separated each users data le by using users unique identier-
userid or username.
5.2 Data Analysis:
Text le which contain past data of a single user is analyzed through
LIWC (Linguistic Inquiry and Word Count). It is a text analysis soft-
ware program designed by James W. Pennebaker, Roger J. Booth and
Martha E. Each text le analyzed by LIWC2007 can be treated as a
whole or broken into segments. It counts the words according to its
dictionary. After nishing this process it saves in a specied le where
the result is written on the below corresponding its category. Where,
these categories indicate dierent aspects of big 5 factor. On basis of
these results the modelling is implemented. The data table is given
below which shows which category lies in which factor
The collected data is analyzed by LIWC to split every sentence. Then
according to the meaning and use of word there will be a percentage
31
Table 2: Data tables
Extraversion Openness Neuroticism Consciousness Agreeableness
Social process Leisure Swear words Relativity Positive Emotion
Family Insight Negation Motion Feel
Friends Body Negative emo Space Discrepancy(should, would)
Humans Ingestion Anxiety Time Tentative(maybe)
Aective Anger Religion Hear
Biological process Sadness Death
Sexua Sexual Money
Achievement Certainty
See
marking according to big 5 category. After marking the percentage
will be sum up and the higher marking category will be taken as user
behavior.
5.3 Results:
Result of total counted words provided by LIWC is in percentage.
LIWC gives the result in such way:
result=(TC*100)/WC Where WC = total words in text le. TC =
total words in category.
The opposite method is used to know the exact number of words.
Where,
TC=(result*100)/WC
Then which categories lie in same factor of Big 5, values of those cat-
egories is sum up using linear regression formula. Linear regression
f(X)=X1+X2+X3+. . . +Xi
After getting the value of each factor it is percentage.
Percentage formula part/whole=%/100
These results are used to draw the pie chart using EXCEL.
Example:
32
5.4 Recommendation analysis::
33
De-
pending on the behavior analysis some brands of products are suggested
or recommended to users. Major percentage of behavior inuence one
to like such products brands. There are some examples given in table
below which show majority of people having such behavior have inter-
est on these brands or categories of product/services. Table no 3,4 &
5 shows some example of recommandations
34
Table 3: Data tables
Big 5 Factor Categories/Brands of
Game Movie
Extraversion Strategy(Age of Empire, Commandos) Political, Fantasy, Family
Openness Racing(NFS) Comedy, Sports, Drama
Neurotic Shooting(COD, CS) Crime scene, Action, Horror
Conscious Sudoku, Chess Political, Historical, Conspiracy
Agreeable Sports Romantic, Drama
Table 4: Data tables
Big 5 Factor Categories/Brands of
Music Food
Extraversion Rock Bead, Meat
Openness Classical, Vocal, Country wood Multicultural Food, Pizza
Neurotic POP, Heavy Metal Fast Food
Conscious New Released, Historic Salad, Vegetable
Agreeable Romantics, Country Bread, Chess
Table 5: Data tables
Big 5 Factor Categories/Brands of
Beverage Play
Extraversion Coee, Tea Football, Athletics
Openness Milkshake, Green Tea Cricket, Swim
Neurotic Soft Drinks Boxing, Rugby, Marshal arts
Conscious Green tea, Black Coee Athletics, Marshal arts
Agreeable coee, tea, soft Drinks Gymnastics
6 Conclusion
We show that personality can be recognised by computers through language
cues. To date, There has been little work on automatic recognition of user
personality and our research is the rst to examine the recognition of person-
ality in dialogue and recommendation based on sentiment analysis results.
What we clearly emerges is that extraversion is the easiest trait to model in
35
general, followed by emotional stability and conscientiousness. We can also
see that feature selection is very important, as some of the best models only
contain a small subset of the full feature set. Prosodic features are impor-
tant for modelling observed extraversion, emotional stability and openness
to experience.
LIWC features are benecial for all traits. We also analysed the inuence
of the most relevant individual features in specic models, for all recognition
tasks. We also used Stanford NLP (natural language processing) application
to analysis and split the texts. But as LIWC generates more accurate results
than Stanford NLP so later we used only LIWC.
At this moment our system can only use text information. But in future
we will enable our system to mine data from shared links or videos. Our
system cannot identify quotes (which user uses to express others speech).
There is a big scope of aanalysis in more categories of sentimental words/sign.
Recommendation system on brands more accurately depends on percentage
on big 5 factor. Depth of measuring and scale of marking will be more
ecient.
References
1. Bao, J., Zheng, Y., Mokbel, M. 2012. Recommendations in Location-
based Social Networks. ACM TIST. V, N, Article A(January YYYY),
30 pages. DOI = 10.1145/0000000.0000000
http://doi.acm.org/10.1145/0000000.0000000M. Smith, V. Barash, L.
Getoor, and H. W. Lauw. Leveraging social context for searching social
media. In SSM 08: Proceeding of the 2008 ACM workshop on Search
in social media, pages 91-94, New York, NY, USA, 2008. ACM.
2. A. M. Ferman, J. H. Errico, P. van Beek, and M. I.Sezan. Content-
based ltering and personalization using structured metadata. In JCDL
02: Proceedings of the 2nd ACM/IEEE-CS joint conference on Digital
libraries, pages 393(393, New York, NY, USA, 2002.ACM).
3. Nielsen Online Report. Social networks & blogs now 4th most popular
online activity, 2009.
36
4. F Benevento and Tiago Rodrigues and Meeyounga Cha* and Virgilio
Almedia. Characterizing User Behavior in Online Social Networks.
2009.
5. Kendra Cherry. The Big Five Personality Dimensions. http://psychology.about.com/od/personalitydevelopment/a/bigve.htm
. [19-june-14].
6. Jie Bao and Yu Zheng and David Wikle and Mohamed F. Mokbel. A
Survey On Recommendation in Location-Based Social Networks. ACM
TIST. V,N, Article A. January 2012.
7. Ward, James C., and Amy L. Ostrom. The internet as information
mineeld: an analysis of the source and content of brand information
yielded by net searches. Journal of Business research 56.11 (2003):
907-914.
8. Shuotian BAi, Tingshao Zhu and Li Cheng. Big-Five Parsonality
Prediction based on User Behaviors at Social Network Sites. arXiv:
1204.4809v1[cs.CY] 21 apr 2012.
9. Mia O. Hoogenboom, John D. Armstrong, Ton G.G. Grootuis and Neil
B. Metcalfe. The growth benits of aggressive behaviour vary with indi-
vidual metabolism and resource predictability. http://beheco.oxfordjournals.org/content/early/2012/09/25/beheco.ars161.full
. Behaviour Ecology(2012) dol:10.1093/beheco/ars161. 28-september-
2012.
10. M.Smith, V. Barash, L.Getoor and H. W. Lauw. Leveraging social
context for searching social media. In SSM 08: proceeding of the 2008
ACM workshop on search in social media, pages 91-94, New York, NY,
USA, 2008. ACM.
11. Katherine R. Luking, Joan Luby and Deanna M. Barch. Developmental
Cognitive Neuroscience. Volume 9. July 2014. Pages 82-92. Download:
http://www.sciencedirect.com/science/article/pii/S1878929314000073/
pdt?md5=8162e9d9e0d9730b51269c0619cc205c&pid=1-s2.0-S1878929314000073-
main.pdf
12. . Flickr.http://www.ickr.com
13. Twitter.http://twitter.com
37
14. Foursquare.https://foursquare.com
15. Cao, X., Cong, G., Jensen, C.S.: Mining signicant semantic locations
from gps data. Proc. VLDB Endow.3, 10091020 (2010)
16. Zheng, Y., Zhang, L., Xie, X., Ma, W.Y.: Mining interesting locations
and travel sequences from gps trajectories. In: Proceedings of the 18th
international conference on World wide web, WWW 09, pp. 791 800.
ACM, New York, NY, USA (2009)
17. Li, Q., Zheng, Y., Xie, X., Chen, Y., Liu, W., Ma, W.Y.: Mining
user similarity based on location history. In: Proceedings of the 16th
ACM SIGSPATIAL international conference on Advances in geographic
information systems, GIS 08, pp. 34:134:10. ACM, New York, NY,
USA (2008)
18. Xiao, X., Zheng, Y., Luo, Q., Xie, X.: Finding similar users using
category-based location history. In: Proceedings of the 18th SIGSPA-
TIAL International Conference on Advances in Geographic Information
Systems, GIS 10, pp. 442445. ACM, New York, NY, USA (2010)
19. Liu, W., Zheng, Y., Chawla, S., Yuan, J., Xie, X.: Discovering spatio-
temporal causal interactions in trac data streams. In: The 17th ACM
SIGKDD international conference on Knowledge Discovery and Data
mining, KDD 11. ACM, New York, NY, USA (2011)
20. Zheng, Y., Li, Q., Chen, Y., Xie, X., Ma, W.Y.: Understanding mo-
bility based on gps data. In: Proceedings of the 10th international
conference on Ubiquitous computing, UbiComp 08, pp. 312321. ACM,
New York, NY, USA (2008)
21. Wang, L., Zheng, Y., Xie, X., Ma, W.Y.: A exible spatio-temporal
indexing scheme for largescale gps track retrieval. In: Proceedings of
the The Ninth International Conference on Mobile Data Management,
pp. 18. IEEE Computer Society, Washington, DC, USA (2008)
22. Ioannis Konstas, Vassilios Stathopoulos, Joemon M Jose: On Social
Networks and Collaborative Recommendation.
23. A. M. Ferman, J. H. Errico, P. van Beek, and M. I. Sezan. Content-
based ltering and personalization using structured metadata. In JCDL
38
02: Proceedings of the 2nd ACM/IEEE-CS joint conference on Digital
libraries, pages 393{393, New York, NY, USA, 2002. ACM.
24. J. L. Herlocker, J. A. Konstan, A. Borchers, and J. Riedl. An algo-
rithmic framework for performing collaborative ltering. In SIGIR 99:
Proceedings of the 22nd annual international ACM SIGIR conference
on Research and development in information retrieval, pages 230{237,
New York, NY, USA, 1999. ACM.
25. G. Adomavicius and A. Tuzhilin. Toward the next generation of rec-
ommender systems: A survey of the state-of-the-art and possible ex-
tensions. Knowledge and Data Engineering, IEEE Transactions on,
17(6):734{749, 2005.
26. H. Yildirim and M. S. Krishnamoorthy. A random walk method for
alleviating the sparsity problem in collaborative ltering. In RecSys 08:
Proceedings of the 2008 ACM conference on Recommender systems,
pages 131{138, New York, NY, USA, 2008. ACM.
27. G. Das, N. Koudas, M. Papagelis, and S. Puttaswamy. Ecient sam-
pling of information in social networks. In I. Soboro, E. Agichtein, and
R. Kumar, editors, SSM, pages 67{74. ACM, 2008.
28. S. B. Subramanya and H. Liu. Socialtagger -collaborative tagging for
blogs in the long tail. In SSM 08: Proceeding of the 2008 ACM work-
shop on Search in social media, pages 19{26, New York, NY, USA,
2008. ACM.
29. M. Strohmaier. Purpose tagging: capturing user intent to assist goal-
oriented social search. In SSM 08: Proceeding of the 2008 ACM work-
shop on Search in social media, pages 35{42, New York, NY, USA,
2008. ACM.
30. N. Craswell and M. Szummer. Random walks on the click graph. In
SIGIR 07: Proceedings of the 30th annual international ACM SIGIR
conference on Research and development in information retrieval, pages
239{246, New York, NY, USA, 2007. ACM
31. H. Yildirim and M. S. Krishnamoorthy. A random walk method for
alleviating the sparsity problem in collaborative ltering. In RecSys 08:
39
Proceedings of the 2008 ACM conference on Recommender systems,
pages 131{138, New York, NY, USA, 2008. ACM.
32. M. Clements, A. P. de Vries, and M. J. T. Reinders. Optimizing single
term queries using a personalized markov random walk over the social
graph. In Workshop on Exploiting Semantic Annotations in Informa-
tion Retrieval (ESAIR), March 2008.
33. A. Hotho, R. Jaschke, C. Schmitz, and G. Stumme. Information Re-
trieval in Folksonomies: Search and Ranking. 2006.
34. Georgios Paltogloua, Stephane Gobronb, Marcin Skowronc, Mike Thel-
walla, and Daniel Thalmannb. Sentiment analysis of informal textual
communication in cyberspace.
35. Barthelemy, F., D.B.G.S., Magnant, X.: Believable synthetic charac-
ters in a virtual emarket. In: In Proceedings of the IASTED Articial
Intelligence and Applications (2004)
36. Bates, J.: The role of emotion in believable agents. Communications
of the ACM 37(7), 122{125 (1994)
37. Becheiraz, P., Thalmann, D.: A model of nonverbal communication
and interpersonal relationship between virtual actors. In: CA 96. p.
58. IEEE Computer Society, Washington, DC, USA (1996)
38. Blitzer, J., Dredze, M., Pereira, F.: Biographies, bollywood, boom-
boxes and blenders: Domain adaptation for sentiment classication. In:
45th ACL. pp. 440{ 447. Association for Computational Linguistics,
Prague, Czech Republic (June 2007)
39. Bradley, M., Lang, P.: Aective norms for english words (anew): Stim-
uli, instruction manual and aective ratings. Tech. rep., Gainesville,
FL. The Center for Research in Psychophysiology, University of Florida
(1999)
40. Brooke, J., Toloski, M., Taboada, M.: Cross-linguistic sentiment anal-
ysis: From english to spanish. In: ICRA-NLP (2009)
41. Cassell, J.: Embodied conversational agents. MIT Press, Cambridge,
MA, USA (2000)
40
42. Cassell, J., Pelachaud, C., Badler, N., Steedman, M., Achorn, B.,
Becket, T., Douville, B., Prevost, S., Stone, M.: Animated conver-
sation: rule-based generation of facial expression, gesture & spoken
intonation for multiple conversational agents. In: SIGGRAPH 94. pp.
413{420. ACM, New York, NY, USA (1994)
43. Kappas, A., Hess, U., Scherer, K.R.: Voice and emotion. In: Fun-
damentals of nonverbal behavior. p. 200238. Cambridge University
Press, Cambridge and New York (1991)
44. Skowron, M.: Aect listeners: Acquisition of aective states by means
of conversational systems. In: COST 2102 Training School. pp. 169{181
(2009)
45. Thelwall, M., Wilkinson, D.: Public dialogs in social network sites:
What is their purpose? J. Am. Soc. Inf. Sci. Technol. 61(2), 392{404
(2010)
46. Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up? sentiment classi-
cation using machine learning techniques. In: EMNLP 2002 (2002)
47. Thomas, M., Pang, B., Lee, L.: Get out the vote: Determining sup-
port or opposition from congressional oor-debate transcripts. CoRR
abs/cs/0607062 (2006)
48. Ounis, I., Macdonald, C., Soboro, I.: Overview of the trec-2008 blog
trac. In: The TREC 2008 Proceedings. NIST (2008)
49. Pang, B., Lee, L.: Opinion Mining and Sentiment Analysis. Now Pub-
lishers Inc. (2008)
50. Mullen, T., Collier, N.: Sentiment analysis using support vector ma-
chines with diverse information sources. In: Lin, D., Wu, D. (eds.)
Proceedings of EMNLP 2004. pp. 412{418. Association for Computa-
tional Linguistics, Barcelona, Spain (July 2004)
51. Whitelaw, C., Garg, N., Argamon, S.: Using appraisal groups for sen-
timent analysis. In: CIKM 05. pp. 625{631. ACM, New York, NY,
USA (2005)
41
52. Wilson, T., Wiebe, J., Homann, P.: Recognizing contextual polarity
in phraselevel sentiment analysis. In: HLT/EMNLP 2005. Vancouver,
CA (2005)
53. Pennebaker J., F.M., R., B.: Linguistic Inquiry and Word Count:
LIWC. Erlbaum Publishers, 2 edn. (2001)
54. Slatcher, R., Chung, C., Pennebaker, J., Stone, L.: Winning words:
Individual dierences in linguistic style among U.S. presidential and
vice presidential candidates. Journal of Research in Personality 41(1),
63{75 (2007)
55. Colby, K.: Articial paranoia. Articial Intelligence 2(1), 1{25 (1971)
56. Acosta, J.: Using Emotion to Gain Rapport in a Spoken Dialog System.
Ph.D. thesis, University of Texas at El Paso (2009)
57. Gratch, J., W.N.G.J.F.E., Duy, R.:
58. Turney, P.D., Littman, M.L.: Unsupervised learning of semantic ori-
entation from a hundred-billion-word corpus. CoRR cs.LG/0212012
(2002)
59. Pelachaud, C.: Studies on gesture expressivity for a virtual agent.
Speech Commun. 51(7), 630{639 (2009)
60. Gobron, S., Ahn, J., Paltoglou, G., Thelwall, M., Thalmann, D.: From
sentence to emotion: a real-time three-dimensional graphics metaphor
of emotions extracted from text 26(6-8), 505{519 (June 2010)
61. Shuotian BAi, Tingshao Zhu and Li Cheng. Big-Five Parsonality
Prediction based on User Behaviors at Social Network Sites. arXiv:
1204.4809v1[cs.CY] 21 apr 2012.
62. Pak, Alexander, and Patrick Paroubek. Twitter as a Corpus for Sen-
timent Analysis and Opinion Mining. LREC. 2010. Page 1326
42

Das könnte Ihnen auch gefallen