Location Detection Over Social Media

Media Engineering and Technology Faculty
German University in Cairo
Location Detection Over Social

Media
Bachelor Thesis
Author: Ahmed Soliman

Supervisors: Sarah Elkasrawy
Submission Date: XX July, 20XX

Media Engineering and Technology Faculty
German University in Cairo
Location Detection Over Social

Media
Bachelor Thesis
Author: Ahmed Soliman

Supervisors: Sarah Elkasrawy
Submission Date: XX July, 20XX

This is to certify that:
(i) the thesis comprises only my original work toward the Bachelor Degree
(ii) due acknowlegement has been made in the text to all other material used
Ahmed Soliman
XX July, 20XX
Acknowledgments
Text
V
VI
Abstract
Abstact
VII
VIII
Contents
Acknowledgments V
1 Introduction 1
1.1 Section Name . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Another Section . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
2 Background 3
3 Location Detection Approaches 5

3.1 Profile location identification . . . . . . . . . . . . . . . . . . . . . . . . . 5
3.2 Location detection by language . . . . . . . . . . . . . . . . . . . . . . . 5
3.3 Machine learning approaches . . . . . . . . . . . . . . . . . . . . . . . . . 5
3.3.1 Content-based Statistical Classifier . . . . . . . . . . . . . . . . . 5
4 Conclusion 7
5 Future Work 9
Appendix 10
A Lists 11
List of Abbreviations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
References 13
IX
Chapter 1
Introduction
1.1 Section Name

Some sample text with an Acronym Without Citation (AC), some citation [1], and some
more Acronym With Citation [2] (AC2).
1.2 Another Section

Reference to Section 3.1, and reuse of AC nad AC2 with also full use of Acronym With
Citation [2] (AC2).
1
2 CHAPTER 1. INTRODUCTION
Chapter 2
Background
Background
3
4 CHAPTER 2. BACKGROUND
Chapter 3
Location Detection Approaches
In this chapter we introduce and describe several approaches for location detection over
social media.
3.1 Profile location identification
3.2 Location detection by language
3.3 Machine learning approaches
3.3.1 Content-based Statistical Classifier

In this section we describe our statistical location classifier that is trained from different
terms extracted from all the users geotagged tweets.
We created this classifier for city level location for which we have ground truth. Each
user in our training dataset corresponds to a training example where the features are
extracted from the user tweet contents and the corresponding output is the geolocation
provided with that tweet. The number of classes in this trained model equal to the total
number of locations in our training dataset (total number of cities).
3.3.1.1 Feature Extraction
First, we tokenize all tweets in our training dataset to filter them, we filter tweets by
removing URLs, mentions and hashtags, then we remove any word that is identified as
stop word. Stop words are defined by a list of words provided by nltk stopwords corpus.
Once the stop words are removed, lemmatization in which we reduce the forms of a word
to a common base form is performed using stanford coreNLP. Once the tokens have been
5
6 CHAPTER 3. LOCATION DETECTION APPROACHES
extracted, we use simple heuristic algorithm which is called CALGARI[1]. This algorithm
is based on intuition that a model will perform better if it is trained on terms that are
more likely to be used by some users from particular regions than users from the general
population. In this algorithm we define a score for each term, this score show us how
likely this term happens in our dataset. We will explain how this score is calculated
below:
Let s(T ) be a function which takes a term and calculate the score for that term T ,
F(T ) be the frequency of a term T in our dataset, (T , c) be a function that count how
many times the term T is used with class c, is the total number of different terms in
out dataset and C be the set of classes (locations) in our dataset, we need to evaluate
this equation for each term:
max(P (T | c = C))
s(T ) = where c C
P(T )
F(T )
The term P(T ) = , so we need to know how to evaluate the numerator.

C
X (T , ci )
P (T | c = C) =
P
i (ti , ci )
j
Now after calculating a score for each term, the algorithm sorts the terms based on
this score in non decreasing order and choose the best 10,000 terms as features for our
model.
Chapter 4
Conclusion
Conclusion
7
8 CHAPTER 4. CONCLUSION
Chapter 5
Future Work
Text
9
Appendix
10
Appendix A
Lists
AC Acronym Without Citation
AC2 Acronym With Citation [2]
11
List of Figures
12
Bibliography
[1] W.G. Campbell. Form and style in thesis writing. Houghton Mifflin, 1954.
[2] S. Wenkang. An analysis of the current state of English majors BA thesis writing
[J]. Foreign Language World, 3, 2004.
13

Location Detection Over Social Media

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Location Detection Over Social Media

Hochgeladen von

Copyright:

Verfügbare Formate

Media Engineering and Technology Faculty

German University in Cairo

Location Detection Over Social

Author: Ahmed Soliman

Submission Date: XX July, 20XX

Location Detection Over Social

Author: Ahmed Soliman

Submission Date: XX July, 20XX

3 Location Detection Approaches 5

1.1 Section Name

1.2 Another Section

Location Detection Approaches

3.1 Profile location identification

3.2 Location detection by language

3.3 Machine learning approaches

3.3.1 Content-based Statistical Classifier

3.3.1.1 Feature Extraction

AC Acronym Without Citation

AC2 Acronym With Citation [2]

Das könnte Ihnen auch gefallen