Sie sind auf Seite 1von 18

Using Check-ins from Geo-Social Data to

Determine Safe Locations during Natural


Calamities

Major Project-I Proposal Report


Submitted in partial fulfillment of the requirements for the degree of
BACHELOR OF TECHNOLOGY
in
COMPUTER SCIENCE AND ENGINEERING

by
ARUNRAJ GANAPATHY (15CO110)
MOHAMMED AMEEN (15CO131)
SATISH AVADHOOT MHETRE (15CO242)

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING


NATIONAL INSTITUTE OF TECHNOLOGY KARNATAKA,
SURATHKAL, MANGALORE - 575025
August, 2018
DECLARATION

We hereby declare that the Major Project-I Proposal Report entitled Using
Check-ins from Geo-Social Data to Determine Safe Locations during Natu-
ral Calamities which is being submitted to the National Institute of Technology
Karnataka, Surathkal in partial fulfilment of the requirements for the award of the
Degree of BACHELOR OF TECHNOLOGY in Computer Science and En-
gineering is a bonafide report of the work carried out by us. The material contained
in this report has not been submitted to any University or Institution for the award
of any degree.

Arunraj Ganapathy (15CO110)


Department of Computer Science and Engineering

Mohammed Ameen (15CO131)


Department of Computer Science and Engineering

Satish Avadhoot Mhetre (15CO242)


Department of Computer Science and Engineering

Place: NITK, Surathkal.


Date: 17/08/2018
CERTIFICATE

This is to certify that the Major Project-I Proposal Report entitled Using
Check-ins from Geo-Social Data to Determine Safe Locations during Nat-
ural Calamities submitted by ARUNRAJ GANAPATHY (Register Number:
15CO110), MOHAMMED AMEEN (Register Number: 15CO131) and SATISH
AVADHOOT MHETRE (Register Number: 15CO242) as the record of the work
carried out by them, is accepted as the Major Project-I Proposal Report submission
in partial fulfilment of the requirements for the award of degree of Bachelor of
Technology.

Dr M Venkatesan
Guide

Chairman - DUGC
Acknowledgment

We would like to thank Dr. M Venkatesan for giving us an opportunity to work with
him for the major project.This is a great chance for learning and professional develop-
ment for us. His guidance starting from preliminary knowledge of the field to helping
us in selecting the proposal was valuable.

We would also like to extend our gratitude to one another for each one’s valuable
inputs.

Place: Surathkal Arunraj Ganapathy


Satish Avadhoot Mhetre
Mohammed Ameen
Date: 10/08/2018
i
Abstract

Spatial clustering deals with the unsupervised grouping of places or locations into
clusters and finds important applications in urban planning and marketing. However,
the current spatial clustering models disregard information about the people and the
time who and when are related to the clustered places.

In our project , we will develop an algorithm to cluster places not only based on
their locations but also their semantics.Our model considers spatio-temporal informa-
tion and the social relationships between users who visit the clustered places.

Specifically, two places are considered similar if they are spatially close and visited
by people of similar communities.

Also it is worth noting that with drastic improvements in the availability of


location-tracking technologies, it has become easy to track locations and movements
of users through user check-in. These check-ins provide insights into the community
structure of people and the area/location where they are based upon.

With this information we can determine if a location is safe or not during natural
calamities, notify the people in the location and take necessary evacuation actions.

Keywords: Clustering, unsupervised learning, spatio-temporal informa-


tion, check-ins

i
ii
Contents

1 Introduction 1

1.1 Issues and Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.2 Outline of the report . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

2 Literature Survey 3

2.1 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.2 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

Bibliography 7

iii
Chapter 1

Introduction

Social networks have gained popularity recently with the advent of sites such as Twit-
ter, Instagram, Facebook, etc. Every day millions of people participate actively in
these platform and the numbers are growing exponentially. These networks are a rich
source of data as users populate their sites with personal information.

Spatial clustering is an unsupervised algorithm to group places, into clusters, so


that places within a cluster are similar and those outside the cluster are dissimilar.
Traditionally, the distance function between two places determines the spatial distance
between them and ignores any semantic or social similarity.Thus in most applications
this might lead to spurious clusters.

If the check-ins are clustered using traditional spatial-clustering algorithms, there


is a possibility that two check-ins from the above set of check-ins might form an outlier.

With our approach, we investigate the extension of traditional density-based clus-


tering for spatial locations to consider their relationship to a social network of people
who visit them and the time when they were visited. In specific, we consider the places
of a Geo-Social Network application, which allows users to capture their geographic
locations and share them in the social network, by an operation called check-in.

1
1.1 Issues and Challenges
During the course of out project development we will face the following challenges:-

1. Limited availability of check-in data sets for social networks.

2. The number of samples to be processed is very high. Algorithms have to be


very conscious of scaling issues. Like many interesting problems, clustering in
general is NP-hard

3. High dimensionality is another problem .The number of features is very high


and may even exceed the number of samples.

4. Most features are zero for most samples, i.e. the object-feature matrix is sparse.
This property strongly affects the measurements of similarity and the compu-
tational complexity.

5. Outliers may have significant importance. Finding these outliers is highly non-
trivial, and removing them is not necessarily desirable.

1.2 Outline of the report


This report contains 2 sections namely Introduction and Literature survey.

Introduction section gives insight into what Geo-Social Network is and how we
can harness the data generated by them and using the same for our project. It also
gives the basic idea behind the project and its requirement. It also lists issues and
challenges.

The Literature Survey sections gives insights into the existing and already imple-
mented methods similar to our use-case..It is the stepping stone which will help us
during our development process.

2
Chapter 2

Literature Survey

There are various models that have been proposed for clustering geo-social network
data. The most prominent ones have three components :The social network is an
undirected graph G = (U,E) where U is the set of users and each edge (ui ,uj )  E
indicates that the users ui , uj  U are friends.Set P is the set of all places visited by
users, in the form of <latitude , longitude> GPS points. A check-in in CK is a triplet
<ui , pk , tr > indicates that a user ui visited the place pk at certain time tr .

DCPGS Model

In the Density-based Clustering Places in Geo-Social Networks Model(DCPGS),


for each pi in the GeoSN, DCPGS finds the geo-social neighborhood N(pi ) of pi which
includes all places pj such that Dgs (pi ,pj ) ≤ , Ds (pi ,pj ) ≤ τ and E (pi ,pj ) ≤ maxD.

For two places pi ,pj E (pi ,pj ) is the Euclidian distance, Dgs =f( Ds (pi ,pj ), E (pi ,pj ))
is the geo-social distance, defined as a function of Dgs (pi ,pj ) and E (pi ,pj ).Parameter 
is geo-social distance threshold, while τ and maxD are two sanity constraints for the
social and the spatial distances between places respectively.

Since the geo-social distance Dgs (pi ,pj ) is a function of a spatial and a social
distance,τ and maxD constrain these individual distances to avoid the following two
cases that negatively affect the quality of geo-social clusters:

3
1. The geo-social distance between two places pi and pi could be less than  if they
are extremely close to each other in space, but have no social connection at all.
This may lead to putting places close to each other spatially, but having no
social relationship, into the same cluster.

2. The geo-social distance between two places pi and pi could be less than  if
they have very small social distance, but they are extremely far from each other
spatially. This may lead to putting places with close social distances, but large
spatial distances, into the same cluster.

Constraints τ and maxD are defined for quality control and can be set by experts
or according to the analyst’s experience.

The social distance Ds (pi ,pj ) takes as inputs the sets of users Upi and Upj who
have visited pi and pj , respectively, and returns a value between 0 and 1. Also the
Euclid distance E(pi ,pj ) is normalized by converting into a spatial distance Dp (pi ,pj )
E(pi ,pj )
= maxD
so that any place pj in the geo-social neighborhood of pi has spatial distance
no larger than 1.

Finally, Dgs (pi ,pj ) is defined as weighted sum of Ds (pi ,pj ) and Dp (pi ,pj ).

Dgs (pi ,pj )=ω . Ds (pi ,pj ) + (1-ω) . Dp (pi ,pj ) where ω  [0,1]

4
2.1 Problem Statement
To determine the safe locations during natural calamities by using spatio-temporal
clustering of Geo-Social Network Data.

2.2 Objectives
In order to achieve the task of identifying the safe locations from social media check-ins
the following objectives have to be met:-

1. Gathering check-ins from Social Network Data for clustering.

2. Fine tuning the data to suit our requirements.

3. Developing a heuristic algorithm to cluster check-in locations based on temporal-


geo-social distance.

4. Identifying the clusters which are safe based on the check-ins provided by the
users.

5
6
Bibliography

[1] Wu, D., Shi, J. and Mamoulis, N., 2018. Density-Based Place Clustering Using
Geo-Social Network Data. IEEE Transactions on Knowledge and Data Engineer-
ing, 30(5), pp.838-851.

[2] Srivastava, S., Pande, S. and Ranu, S., 2015, November. Geo-social clustering
of places from check-in data. In Data Mining (ICDM), 2015 IEEE International
Conference on (pp. 985-990). IEEE.

[3] Mishra, N., Schreiber, R., Stanton, I. and Tarjan, R.E., 2007, December. Cluster-
ing social networks. In International Workshop on Algorithms and Models for the
Web-Graph (pp. 56-67). Springer, Berlin, Heidelberg.

[4] Wu, D., Mamoulis, N. and Shi, J., 2015. Clustering in geo-social networks. Bulletin
of the IEEE Computer Society Technical Committee on Data Engineering.

Das könnte Ihnen auch gefallen