Beruflich Dokumente
Kultur Dokumente
Yelp Database
Dionysios Nikolopoulos
Introduction
Users
Businesses
Reviews
Check-in
Tips
Tasks
It is obvious that the aforementioned networks contain users that are not
members of the strong groups. As a result, both of the networks are
further subsetted to contain users that belong exclusively to the strong
group.
4
istograms Complete User Network
7
istograms Complete User Network
Degree distribution and Review count seem to follow power law in the log
scale
Average stars are generally high and round numbers are strongly
expressed
8
istograms 1000 strongest users
9
istograms 1000 strongest users
Observations are less but seem to follow a different distribution than the
Complete network
10
istograms 1000 strongest users -
Exclusive
11
istograms 1000 strongest users -
Exclusive
Even less observations but they seem to follow the same scheme
12
istograms 10000 strongest users
13
istograms 10000 strongest users
Average stars and Review count seem to obey a log-normal distribution
14
istograms 10000 strongest users -
Exclusive
15
istograms 10000 strongest users -
Exclusive
Similar behavior with previous group with different averages
16
Scatter plots Average stars VS Review
count
17
Scatter plots Average stars VS Review
count
All networks seem to obey a wide normal distribution
Logarithmic distribution did not reveal any further characteristics
18
Scatter plots Average stars VS User
degree
19
Scatter plots Average stars VS User
degree
Similar behavior with the previous scatter plot
20
Scatter plots Review count VS User
degree
21
Scatter plots Average stars VS User
degree
Logarithmic plots are used as they seem to reveal more about the
corresponding behaviors
22
Boxplots Reviews and degree
comparison
23
Boxplots Reviews and degree
comparison
Reviews: Large deviation for 1000 strongest network where average is
slightly larger for the exclusive one.
10000 strongest network has similar behavior while the average number
of reviews seems less in comparison with the 1000 network. Deviation is
less partly due to the more observations.
Complete Network has very low average as most of the users rarely post a
review.
Degree: Lower deviation and lower average degree for the 1000 exclusive
network in comparison with the 1000 network.
Equal behavior for the 10000 networks but the difference in deviation is
significantly smaller
Complete network has again the smallest average while deviation remains
at the same levels
24
Communities in 1000 strongest users -
Exclusive
A built-in greedy algorithm is used to reveal possible communities in the
1000 strongest users exclusive network.
Even the greedy algorithm is very time consuming when applied to the
other, larger networks.
25
Communities in 1000 strongest users -
Exclusive
26
Communities in 1000 strongest users -
Exclusive
The groups are largely overlapped and they contain around 300 nodes
each
Degree: Group 3 has larger average and deviation and outliers appear a lot
higher than the other two groups. Similar behavior as the layered
network.
28
Communities in 1000 strongest users
Exclusive Scatter plots
29
Communities in 1000 strongest users
Exclusive Scatter plots
Scatter plots do not reveal any significant trend or a large difference
between the groups
30
Sparse network based on 1000
strongest - Exclusive
An attempt to study and visualize a sparse network based on 1000
strongest users Exclusive is made.
31
Sparse network based on 1000
strongest - Exclusive
32
Sparse network based on 1000
strongest - Exclusive
As it can be seen from the visualization two groups are formed after the
community algorithm and Group 1 consists of stronger users on average.
Review count follows the opposite trend. 33
Implementation of the algorithm
proposed in the paper
35
Implementation of the algorithm in
1000 strongest - exclusive
36
Implementation of the algorithm in
1000 strongest - exclusive
Scatter Plots are available but did not reveal any particular trend
37
Conclusion
In Task 1 high degree means high number of friends. Analysis has been
done in the assignment of the course
What does that reveal for the Users behaviors and characteristics?
38
Conclusion
An effort to focus on the social behavior of the Users of Yelp
Application was made.
39
Conclusion
Average stars and Review count distribution for Strong Users follows an
approximate log-normal distribution
Degree distribution follows power law distribution similar to the Full network
Exclusive networks have also high average degree in the friendship network
- Strong Users as defined in Task 2 have high number of friends on average
Review count seems to have a slightly directed relation with the degree for the
strong Users
Strong Users do not have such a large difference in degree compared to the
difference in the number of reviews
40
Conclusion
Communities try to discover distinct parts in the network
Degree and review count do not follow the same trend Group 1 has
higher average degree but lower average number of review count
41
Conclusion
42