Beruflich Dokumente
Kultur Dokumente
1
Certificate of Approval
It is certified that the work presented in this report was performed by Waleed
Ahmed, Saif Ali Khan, Haris Jamil and Mansoor Naseer under the
supervision of Dr. Masroor Hussain and Dr. Fawad Hussain. The work is
adequate and lies within the scope of the BS degree in Computer Engineering
at Ghulam Ishaq Khan Institute of Engineering Sciences and Technology.
---------------------
(Advisor)
-------------------
(Dean)
-------------------
(Dean)
2
ABSTRACT
Fake News is a growing problem in the modern world, it aims at swaying the
opinion of the vast majority of people who use social media on a day to day
basis. This project aims to solve the problem of fake news on internet. The
project is a web based application which determines whether a news Article
is fake or credible, using different machine learning models, which are
trained on a large dataset. Web application takes a URL as an input from the
user and extracts the relevant text from the URL using a web crawler and
then extracts feature vectors from the text using NLP. Machine Learning
Models are then used on the feature vectors to classify news source as fake
or credible.
3
ACKNOWLEDGEMENTS
We would like to express our profound gratitude to our supervisor, Dr.
Masroor Hussain for sharing his perspective and experience regarding this
subject. We have put great efforts in this project and achieved great success,
but it would have been nearly impossible without his guidance and motivation.
He was always there for our help and always kept a check on us.
4
TABLE OF CONTENTS
CHAPTER I ................................................................................................ 10
1. Introduction ................................................................................ 10
1.1 Purpose.................................................................................... 10
CHAPTER II ............................................................................................... 12
3. Design ......................................................................................... 16
5
CHAPTER IV.............................................................................................. 36
CHAPTER V ............................................................................................... 59
CHAPTER VI.............................................................................................. 65
6
6.1 Conclusion .............................................................................. 65
GLOSSARY................................................................................................. 67
REFERENCES ............................................................................................ 68
APPENDIX .................................................................................................. 71
7
LIST OF FIGURES
CHAPTER I ................................................................................................ 10
CHAPTER II ............................................................................................... 12
Figure 2.3-1 Accuracy Comparison with Reseach Papers ............................ 14
CHAPTER III ............................................................................................. 16
Figure 3.1-1 Layered Architecture ................................................................ 16
Figure 3.7-1 Use Case Diagram 1 ................................................................. 30
Figure 3.7-2 Use Case Diagram 2 ................................................................. 31
Figure 3.8-1 Component Diagram ................................................................ 33
Figure 3.8-2 ER Diagram .............................................................................. 34
Figure 3.8-3 Data Flow Diagram .................................................................. 35
CHAPTER IV.............................................................................................. 36
Figure 4.1-1Gantt Chart ................................................................................ 37
Figure 4.1-2Workflow Diagram ................................................................... 39
Figure 4.2-1 Accuracy vs Number of Features ............................................. 42
Figure 4.2-2 Accuracy vs SVM Kernel ........................................................ 43
Figure 4.2-3 Accuracy vs Depth of Random Forest and Decision Tree ....... 44
Figure 4.2-4 Accuracy vs train/test split ....................................................... 45
Figure 4.2-5 Feature Reduction (Graph) ....................................................... 46
CHAPTER V ............................................................................................... 59
CHAPTER VI.............................................................................................. 65
GLOSSARY................................................................................................. 67
REFERENCES ............................................................................................ 68
APPENDIX .................................................................................................. 71
8
LIST OF TABLES
9
CHAPTER I
1. Introduction
1.1 Purpose
Analyzing and detecting fake news on the internet is one the hardest problem
to be solved. Recently, Fake News had been an important talk in general public
and researchers due to online media outlets like social media feeds, blogs and
online newspaper. According to BBC survey, 79 percent of people are worried
about what is fake and real online. The survey of more than 16,000 adults was
conducted by Globescan. Globescan’s chairman Doug Miller said: “These poll
findings suggest that the era of ‘fake news’ may be as significant in reducing
the credibility of on-line information as Edward Snowden’s 2013 National
Security Agency (NSA) surveillance revelations were in reducing people’s
comfort in expressing their opinions online”. Apple’s stock took a temporary
10-point hit after a false report surfaced on CNN’s iReport that Steve Jobs had
a heart attack.
In light of above incidents we discover that fake news could have much more
drastic effect even on country`s economy. So to minimize such news to create
drastic effect, we have to verify fake news. Purpose of our project is to detect
fake news.
10
1.2 Product Scope
The scope of our product is to detect fake news from online articles using
machine learning. Our fake news detectors purely uses linguistics features to
detect fake news in content. By using different machine learning models, we
will detect fake news for better accuracy. Our project has major impact on
social media like Facebook and Twitter because major population of world has
access to these platforms. Fake news have impact on decision making of these
people which could lead to serious mistakes.
Name Description
NLP Natural Language Processing
URL Uniform Resource Locator
ML Machine Learning
11
CHAPTER II
2. Literature Review
1
Chang, Juju; Lefferman, Jake; Pedersen, Claire; Martz, Geoff (November 29, 2016).
"When Fake News Stories Make Real News Headlines". Nightline. ABC News.
12
stance). Their graph-theoretic approach also keeps track of provenance in
argumentation schemes. The results of Naive Bayes approach are given below.
2.2 Approach
In this project we are classifying fake news based on purely linguistic
features. There have been work which simply do fact checking to classify
fake news. There are some sources which are known to be spreading fake
news. The list reliable and non-reliable is maintained at OpenSources and
FakeNewsWatch.
The most difficult task was to collect labeled data of classified news.
Fortunately we were able to download the labeled data from Datacamp. But
insufficiency of has always remain the main issue with our project.
13
2.3 Previous work
We have followed multiple research papers from different universities as
reference. These research papers worked only on Title on the News article with
maximum of 12 features. We are using 38 features and these features are
extracted from both Title and Text. We got help from these research papers in
finding the appropriate features. For example we got the idea of using text
difficulty index (Gunning Fog) from the paper of Michigan University
(Ver´onica P´erez-Rosas et al., 2017). These researches were done in
universities like Stanford and Michigan. We tried to gather the best points
from each of the research paper and apply them to our project. Due to which
we are getting much higher accuracy than the research papers that we have
been following.
90
80
70
60
50
40
30
20
10
0
Brian Edmonds, Xiaojing Ji, Ver´onicaVer´, Bennett Victoria L. Rubin, Niall J. Our Project
Shiyi Li, Xingyu Liu, 2017 Kleinberg2, Alexandra Lefevre Conroy, Yimin Chen, 2015
Rada Mihalcea, 2017
Column1 Accuracy
14
We split our dataset of size 6,300 News Articles into two sets, 80% - Training
Set and 20% - Test Set. We did the training on training set and then tested the
trained system on test set for accuracy (how accurate the prediction is).
Currently we are getting maximum accuracy of 85.7%. We tested the system
by feeding the system with URLs of fake news and authentic news, system
outputs satisfactory results.
https://www.datacamp.com/community/podcast/data-science-fake-
news
https://github.com/docketrun/Detecting-Fake-News-with-Scikit-Learn
http://dailyheadlines.net
https://www.snopes.com/
https://tribune.com.pk/fake-news/
https://www.scoopwhoop.com
http://abcnews.go.com/alerts/fact-or-fake
15
CHAPTER III
3. Design
3.1 Overview
The system works on already trained Machine Learning algorithms. Multiple
machine learning algorithms have been trained by providing a data set of both
fake and authentic news. The summary of overall procedure is as follows.
16
3.2 Product Functions
A URL of news article must be entered.
NLP is performed on the text extracted from the URL and relevant
features are extracted from that NLP.
News articles are classified as fake or authentic from the features
extracted.
Classified news are stored in data base to maintain list of URLs with
the output predicted (Fake/Authentic), and each user can view that
maintained list.
User can vote on maintained list if that specific news isn’t classified
correctly.
Administrator: Will maintain the overall aspects of web application and will
be responsible for giving users appropriate roles and authority.
User: The main actor using the web application to analyze the URLs.
3.4 Constraints
i. Our software will never assure authenticity of the result. For this,
we need user feedback.
ii. Our software will only be available in English language and news
article provided to the software should also be in English
language.
17
iii. We don’t have access to huge amount of data for training of
machine learning model.
iv. Software will not work without internet connection.
v. Our software does not perform well when article`s body is plain,
short and emotionless.
18
NLTK: A Python library used for NLP (natural language processing).
We will be using NLTK for feature extraction from the news article.
Angular: The angular 4 will be used to implement the web based
interface and client side of application.
Scrapy: A Python library to scrape websites. We will be using scrapy
to fetch text of the news article’s header from URL provided by the
user
19
3.5.2.1 Functional Requirements with Traceability information
Parent
Requirement
#
Description Take A News Article URL from user which is to be analyzed and classified. It
must be a valid news URL.
Rationale System must take a valid URL from user to extract text from.
Source Source -
Document
Acceptance/Fi
t Criteria
Dependencies
Change
History
20
3.5.2.1.2 Extract the title and article using scrappy
Parent
Requirement
#
Description Extract Relevant Text from the URL provided using Scrapy.
Rationale System has to extract only title and body of the article which is then fed to the
classification system for feature extraction and classification.
Source Source -
Document
Acceptance/Fi
t Criteria
Dependencies
Change
History
21
3.5.2.1.3 NLP is applied on text extracted using scrappy
Parent
Requirement
#
Description The text extracted by web crawler is used for feature extraction using NLP.
Rationale We have to extract features so they can be used in Machine Learning Algorithms
for classification.
Source Source -
Document
Acceptance/Fi
t Criteria
Dependencies
Change
History
22
3.5.2.1.4 Apply machine learning algorithms on the data.
Parent
Requirement
#
Description Apply Machine Learning Algorithms on feature vectors to classify news as fake
or credible.
Source Source -
Document
Acceptance/Fi
t Criteria
Dependencies
Change
History
23
3.5.2.1.5 Store the results in the database
Parent
Requirement
#
Rationale If another user enters the same URL the system does not have to process the
URL again and simply return result.
Source Source -
Document
Acceptance/Fi
t Criteria
Dependencies
Change
History
24
3.5.2.1.6 User can sign using email and password
Parent
Requirement
#
Description User can Sign up using his email address and login.
Source Source -
Document
Acceptance/Fi
t Criteria
Dependencies
Change
History
25
3.5.2.1.7 User can view the results of news stored in the database
Parent
Requirement
#
Description User can view all the recently processed and classified news article and vote for
the accuracy of classification.
Rationale This will help the developers to improve the system and get feedback regarding
the accuracy of the classification system.
Source Source -
Document
Acceptance/Fi
t Criteria
Dependencies
Change
History
26
3.5.2.1.8 User feedback for the classified news
Parent
Requirement
#
Description After a predefined time limit and number of votes the system verifies the
classification.
Rationale Verification of the classification is very important for the gaining users trust and
also for system improvements.
Source Source -
Document
Acceptance/Fi
t Criteria
Dependencies
Change
History
27
3.5.2.1.9 Using the stored classified data in database for training.
Parent
Requirement
#
Source Source -
Document
Acceptance/Fi
t Criteria
Dependencies
Change
History
28
3.6 Performance Requirements
ID Performance Requirement
29
3.7.1 Use Case Diagram 1
30
3.7.2 Use Case Diagram 2
The use case related to user feedback is shown in Figure 3.7-2. In order for a
user to give feedback related to accuracy of classification a user must sign up.
The system displays all the recently processed/classified URL’s to the user. If
the user is logged in he can choose to vote for any classification result. After
some time ( 1 week) the system will check the votes for the classification and
based on the votes the system will be able to verify whether the classification
was correct or not. If the classification is verified the system adds the features
of the correct classification to the training set.
31
3.7.3 Use Case Diagram 3
Figure 3.7-3 shows the use case related to basic use of the software. User
enters a News URL. System verifies the URL and extracts relevant text from
the URL using a web crawler and then classified the news article as fake or
credible using machine learning algorithms. After the result is computed the
user can view the result.
32
3.8 UML Diagrams
Following are the Unified Modelling Language (UML) diagrams that are
intended to provide a standard way to visualize the design of our system.
The Figure 3.8-1 shows the overall view of the system showing all the
different components and information that flows between these different
components. User Interface is the view available to the user through which
user interacts with the system. Our user interface will be web application. The
User will input a News URL which will then be passed to the web crawler.
Web crawler will crawl the URL and extract relevant text and pass it to the
Classification System. Classification System will then extract the required
features from the text and apply machine learning algorithms on the feature
vector and will store the results in the Database.
33
3.8.2 Database ER Diagram
Figure 3.8-2 shows the entity relation diagram of our system. There’s a many
to many relationship of voting between User and List of classified news, but
it isn’t necessary for every user to vote a classified news and vice versa.
Classified News and Domain are related by Many to One Relationship with
total cardinality ratio.
34
3.8.3 Data Flow Diagram
Figure 3.8-3 shows the flow the flow of data. First User sends the URL, error
is displayed if entered text isn’t in URL format and else the URL is searched
in Database in ‘Already Classified List’. If URL is found, it just displays the
previous result, else the crawler crawls the website and scraps the relevant text.
NLP is applied on text and features from NLP are processed by ML
algorithms. Each Algorithm gives result, all the results are sent to Voting
algorithm, and the final result is displayed, and stored in Database.
35
CHAPTER IV
4. Proposed Solution
The only solution to the problem defined in the earlier section was to design
and implement such a Web based application which will take a news URL as
an input and will give result of its authenticity with higher accuracy. We had
a problem in achieving higher accuracy because of limited dataset. We still are
achieving 85.7% test accuracy which is much higher than the Research Papers
we have been following. To tackle this issue, we have implemented the
mechanism where processed URLs get stored in the database which are the
fed to the training algorithms. In this way our system keeps getting smarter
with time.
4.1 Methodology
Developing an Automatic Fake News Detector was a challenging problem. To
make sure, that we accomplished this task efficiently, without facing major
problems, which would have caused major redesigns and re-engineering of the
software architecture, in a time and cost constrained project environment, we
started off with developing SRS (Software Requirement Specifications) and
detailed design of the system. Gantt chart and work break down structure were
created in that phase to monitor the project and when a phase should start or
end.
36
Figure 4.1-1Gantt Chart
Labeled Dataset is gathered of about 6,500 News articles containing Text and
Title of News.
When the URL is entered, text and title of the news form that URL is scrapped
using WEB crawler.
37
Same NLP is applied to the extracted text and title and 38
features are fed to Machine Learning Algorithms.
When user enters a URL and checks the authentication of News, it gets stored in
Database.
38
Figure 4.1-2Workflow Diagram
Following is the table of features selected for text with the weight/importance
of each feature as calculated by machine learning algorithm. Same features
have been selected for title but not mentioned in the table.
39
Table 4.1-1 Features with importance
Feature Importance
Word Count 0.03223736
Character Count 0.11497973
Punctuation Count 0.0979961
Uppercase Count 0.07135418
Gunning Fog 0.0166595
Automated Readability Index 0.03313012
Linsear Write Formula 0.01666274
Difficult Words 0.0262762
Dale-chall Readability Score 0.01767803
Punctuation Count / Character Count 0.21654589
Count of numbers 0.01909209
Count of brackets 0.00145834
Count of Asterisk (offensive words) 0.01956875
The above table shows us which features are most important for news
classification, by giving them a weight or score. For example, according to
this table, Ratio of Punctuation Count and Character Count has highest score
(.2165). It means that this feature has 21.65% importance, and it has the
highest probability of classifying the news. While bracket counts has least
importance, which means that this feature helps least to classify the news
article into fake or authentic.
40
4.1.2 Normalization
We have used the normalization in which we rescaled the feature values
between [0, 1]. There was quite obvious increase in our accuracy after the use
of this normalization method.
For example if punctuation count ranges from [10 , 200], x' can be calculated
by subtracted each news’s punctuation count with 10, and dividing by 190.
4.2 Training
After cleaning and normalizing the data, we set it to training. We tried multiple
algorithms and techniques for training the data, and selected two (Random
Forest and SVM) which gave the highest accuracy. Training acquired most of
the time of the project development, because we had endless combinations and
possibilities to try out, in order to achieve highest accuracy with limited size
of dataset. We tried changing the normalization method, training algorithm,
number of iterations, kernel in SVM and number of features.
41
4.2.1 Number of Features
Following is the graph of Accuracy vs Number of Features.
NO. OF FEATURES
Random Forest Decision Tree SVM (Linear)
90
80
70
60
50
40
30
6 13 19 25 28
Note that, 19 features are used for title and text separately, in total 38 features
are used.
42
4.2.2 SVM Kernels
Following graph shows the difference in accuracy with different SVM kernels.
SVM Kernel
90
80
70
60
50
40
30
20
10
0
Default Linear
SVM Kernel
In above graph, it can be seen that Linear kernel gives the highest accuracy
(85.7%). That’s because most of textual data is linearly separable, and linear
kernel works really good when data is linearly separable or has high number
of features. That's because mapping the data to a higher dimensional space
does not really improve the performance (L Arras, F Horn et al., 2017).
43
4.2.3 Random Forest and Decision Tree
Following is graph of Accuracy vs Maximum Depth of Random Forest and
Decision Tree.
MAXIMUM DEPTH
Random Forest Decision Tree
90
80
70
60
50
40
30
20
10
0
5 8 10 14
44
4.2.4 Train/Test Split
Right now we’re splitting the data into 80/20, with 80 being training set and
20 being the test set. Following is the graph that shows Accuracy vs Machine
Learning models with different splits.
TRAIN/TEST SPLIT
SVM (Linear) Random Forest Decision Tree
90
85
80
75
70
65
90/10 80/20 70/30 60/40
It can be seen from this graph that highest accuracy is achieved when the
dataset is split 80/20, with 20% being test set. Phenomenon of over and
underfitting can be observed in this graph as well.
45
4.2.5 Feature Reduction
We have used PCA and LDA for feature reduction.
Following is the graph of Accuracy with PCA and LDA, and without feature
reduction vs number of features.
CHART TITLE
No Feature Reduction PCA LDA
88
86
84
82
80
78
76
10 15 20 25
46
Above graph is given below in tabulated form
It can be clearly seen that PCA has always been greater than Random forest
trained without Reduction in Features.
47
4.2.6 Summary of Training
As depicted in the previous graphs, we played around with the data, features
and machine learning algorithms to achieve the desired accuracy. We also
implemented neural networks but it was giving really low accuracy (53%) due
to insufficient data size. So we decided not to include neural network in our
work, we will add it in future when we have hands on sufficient data size. We
hope when have large amount of news articles, deep learning will cause a great
increase the accuracy of our system.
Decision Tree:
48
Random Forest:
49
SVM:
50
Over-All:
Here we can see that SVM gives us the highest accuracy among other Machine
Learning algorithms, the reason has been described previously. SVM performs
great on textual data because textual data is almost all the time linearly
separable and SVM is a good choice for linearly separable data.
51
4.3 Server-side Implementation
Main part of our server is Machine Learning Algorithms. Classification and
Web Backend part of the project has been implemented in Python. Django is
used for back-end library of Sklearn is used for the training purposes. We
started our project with Decision Tree algorithm with 19 features, and got 53%
accuracy after splitting the dataset to 80-20 into training and testing. After
going through research papers and obtaining strong points from each of them
we were able to get 85.7% accuracy. We combined Random Forest and SVM
(Linear Kernel) to give us the highest accuracy. We wanted to use Deep
learning and hoped to get much higher accuracy with it, but failed due to small
size of dataset. For the NLP part, we used NLTK and Textstat (python APIs)
for complex feature extraction like adverb count or text difficulty.
One of our main hurdle was to scrap html page properly. Online news
articles are not written in standard form, e.g., news on Facebook is written in
different html format than the news on bbc.com. We couldn’t tackle this
generality, and used python’s library Newspaper3k which is made specially to
scrap of news articles.
52
again and can just give the result from the database. Voting table has also been
maintained which keeps record of vote give to each URL.
4.5 Schedule
The four core modules of the system were divided among the group to be
designed, developed and deployed in isolation and then integrate with the
system to achieve the overall functionalities.
53
1.4.0 Determining Non- 31-34
Functional
Requirements
1.5.0 Identifying Security 35 – 37
Measures
1.6.0 Communication and 38 – 42
user interface
requirements
1.6.1 Determining System 43
Dependencies
1.7.0 Constraints 44 – 46
1.8.0 Other Interfaces 49
1.8.1 Criticality of 64
Application
1.8.2 Logical and Database 65 – 70
Requirements
1.9.0 Functional Hierarchy 72
54
Serial # Activity Day(s)
2.0.0 Designing System
Architecture
2.1.0 Identifying sub System 73
2.2.0 How sub Systems 74 – 75
Would Interact
2.3.0 Knowledge of Server, 76 – 79
Memory Processing
Capabilities
2.4.0 User-Server 80
Communication
2.5.0 Dependencies of sub 81 – 82
Systems
2.6.0 Limitation of User 83
hardware
55
Serial # Activity Day(s)
3.0.0 Prototype
Development
3.1.0 Developing User 84
Application
3.1.1 Designing Interactive
User Interfaces
3.2.0 Determine Functional 85 – 87
Requirement of
System
3.3.0 General Prototype 88 – 90
3.4.0 Deploy over Servers 92
3.5.0 Testing 93
3.5.1 Debug 94 – 100
3.5.2 Initial Launch 101
3.6.0 Improving and 102 – 107
Finalizing User
Interface
3.7.0 Testing 110
3.7.1 Debug 111 – 120
3.7.2 Final Launch 121
56
Serial # Activity Day(s)
4.0.0 Beta Launch
4.1.0 Deploying on Server 130
4.2.0 Testing 131 – 132
4.3.0 Debug 133 - 134
57
4.6 Technological Aspects
Programming Languages
Python
SQL
JavaScript
Other
IDE Pycharm
Versioning Control Git/GitHub
Database SQLite
Networking Protocols HTTP/HTTPs
58
CHAPTER V
59
5.2 Extract Relevant Text from URL, (FR-02)
This was a very challenging problem in our project. In order to classify the
news article as fake or credible we only needed the relevant text from page
source, on which our system applies Natural Language Processing to make
feature vectors. This was particularly hard as we had to make generic scrapper
that works for every news website. We used newspaper3k API to solve this
problem, which made it easier for us to extract only the news article title and
text (body).
60
5.5 Store Classification Result in Database, (FR-05)
We stored the result of every URL processed by our system in our database
alongside its title and text. This requirement helped us improve the
performance of our as it eliminated redundancy. If 2 users entered the same
URL our system will only process it once and will it store its classification
result in the database for subsequent queries.
61
5.8 Verifying Results, (FR-08)
After a month of processing a URL our system automatically checks the rating,
which is given by the users, of URL. If the rating is more than 50% our system
retains the classification result. But if the rating is less than 50% the
classification result is altered as poor rating shows incorrect classification by
the system.
62
5.10 Non-Functional Requirement Achieved
Performance Requirements
The system should respond to a user
query and return a result in less than
5 seconds.
Web crawling should be done in fast
time.
Feature extraction must be done in
milliseconds.
Time taken by ML algorithms
should be in milliseconds.
System should be able to handle
multiple simultaneous requests.
Security Requirements
User should be able to securely
login.
User password should be encrypted.
It is stored in the database in
encrypted form.
User password should be long and
contain special characters.
63
Table 5.10-3 Usability Requirements
Usability Requirements
The system should be user friendly
and easy to use
The system should not need extra
instruction manual to use
The user should be able to learn the
system in less than 5 minutes
64
CHAPTER VI
6.1 Conclusion
Fake news make people confuse about who to trust or not some people even
say that Donald Trump became president because of some fake twitter. In
order to tackle such problem, we are working on linguistics basic purely. Our
efficient scrapper extracts the title and text of the news and using Natural
language processing (NLP) we extracted 38 features and applied Support
vector machine(SVM) and Random Forest to detect whether the news is
authentic or fake.
This web application is solution to the very important problem on social media
platforms like Facebook and Twitter, to which every person has easy access.
News on the social media has a very large impact on the thought process of
the people, our web application provides people an easy way to determine the
credibility of any news article. Accuracy of 86.7% shows that our application
can be very useful in practical world. Even though there is chance that our web
application can predict the news wrong, user feedback mechanism has also
been added in the system so a user can vote if the news is correctly predicted.
After month or two, user votes will be manually checked and if the prediction
was wrong, result of that prediction will be changed manually. These predicted
news articles can be used to train the machine learning models and increase
the efficiency and accuracy of the application. With time and user feedback,
we can improve our software in terms of accuracy and user experience.
65
6.2 Future Work
We have combined two machine learning algorithms (SVM and Random
Forest). We are combining them in such a way that strong points of both of
these algorithms can be used to predict the credibility of News Article. Our
main focus is to improve the software as much as we can. As we know that the
greater the dataset for the machine learning models to train greater is the
chance that machine learning models will work better. So we will use large
scale dataset to train the machine learning models.
66
GLOSSARY
Name Description
NLP Natural Language Processing
SVM Support Vector Machine
URL Uniform Resource Locator
ML Machine Learning
APIs Application Programming Interface
SRS Software Requirement Specifications
HTTPs Hypertext Transfer Protocol
HTML Hypertext Markup Language
67
REFERENCES
Alice Toniolo, Federico Cerutti, Nir Oren, Tj Norman, and Katia Sycara.
Making Informed Decisions with Provenance and Argumentation Schemes. In
11th International Workshop on Argumentation in Multi-Agent Systems,
pages 1–20. Aamas2014.Lip6.Fr, 2014.
Conroy, Niall J., Victoria L. Rubin, and Yimin Chen. "Automatic deception
detection: Methods for finding fake news." Proceedings of the Association for
Information Science and Technology 52.1 (2015): 1-4.
Wang, William Yang. "" Liar, Liar Pants on Fire": A New Benchmark Dataset
for Fake News Detection." arXiv preprint arXiv:1705.00648 (2017).
Kolari, Pranam, Tim Finin, and Anupam Joshi. "SVMs for the Blogosphere:
Blog Identification and Splog Detection." AAAI spring symposium:
Computational approaches to analyzing weblogs. 2006.
68
Jin, Zhiwei, et al. "News credibility evaluation on microblog with a
hierarchical propagation model." Data Mining (ICDM), 2014 IEEE
International Conference on. IEEE, 2014.
Rubin, Victoria, et al. "Fake news or truth? using satirical cues to detect
potentially misleading news." Proceedings of the Second Workshop on
Computational Approaches to Deception Detection. 2016.
Brian Edmonds, Xiaojing Ji, Shiyi Li, Xingyu Liu. Fake News Detection Final
Report.
Davis, Wynne. "Fake Or Real? How To Self-Check The News And Get The
Facts." NPR. NPR,
<
http://www.npr.org/sections/alltechconsidered/2016/12/05/503581220/fake-
or-real-how-to-self-check-the-news-and-get-the-facts >.
69
Samir Bajaj. “The Pope Has a New Baby!” Fake News Detection Using Deep
Learning. Stanford University, 2017.
70
APPENDIX
Appendix - A
71