Sie sind auf Seite 1von 5

Prediction of Stock price based on Financial Data and Tweets

Divakar, Pradeep kumar and Chandrakala.S


Department of Computer Science,
Rajalakshmi Engineering College,
{divakar.v,pradeepkumar.s}.2011.cse@rajalakshmi.edu.in
chandrakala.s @rajalakshmi.edu.in

Abstract—Predictive models exploit patterns found in historical and transactional


data to identify risks and opportunities. In our work we use recent trends and
Finanacial data as resource. Our proposed system uses linear regression method
and the analysis of tweets which are used for better prediction of stock price. By
using both the present moment changes and the historical data it gives a more
realistic prediction method, although there are opportunities to expand this
research further using additional techniques and parameter tuning.

Keywords-stock market; regression; machine learning; sentiment analysis;

I. INTRODUCTION
Predictive analytics encompasses a variety of statistical techniques from
modeling, machine learning, and data mining that analyze current and historical facts to
make predictions about future, or otherwise unknown, events. The trading price oscillates
every second, so more advanced real-time stock price prediction is needed by financial.
The existing projects on price prediction use historical data or present data to predict the
stock price so it is not very realistic.
We explore two methods of prediction using tweets and financial data as
resources. In the proposed system, the resources we use are the tweets which is the
collective form of individual opinions and emotions, has very profound though maybe
subtle relationship with social events collected from Social media like Twitter. As a
result, by analyzing relevant tweets using proper machine learning algorithms, one could
grasp the public’s sentiment as well as attitude towards the stock’s price of interest,
which could intuitively predict the next move of it. Another resource we use in the
prediction is historical stock prices. Historical stock prices are used to predict the
direction of future stock prices. Collecting the past stock data, and training it so that it can
be used in the prediction of the stock price. A machine-learning tool is used in the
training of the stock data. Based on the training data we predict the stock value for raw
data.

II. MOTIVATION

Stock market price prediction is a problem that has the potential to be worth billions of
dollars and is actively researched by the largest financial corporations in the world. It is a
significant problem because it has no clear solution, although attempts can be made at
approximation using many different machine-learning techniques. The project allows
considering both the present moment changes and the historical data for predicting the
price of the stock since most of the methods consider only one type of data.

III. RELATED WORK

A variety of methods have been used to predict stock prices using machine
learning. One of the more interesting areas of research include using a type of
reinforcement learning called Q-learning[4] and using US’s export/import growth,
earnings for consumers, and other industry data to build a decision tree to determine if a
stock’s price will rise or fall[1].

The Q-learning approach has been shown to be effective, but it is unclear how
computationally intensive the algorithm is due to the large number of state alphas that
must be generated. The decision tree approach may be particularly useful when analyzing
a specific industry’s growth. There has also been research done as to how top-performing
stocks are defined and selected[5] and analysis on what can go wrong when modeling the
stock market with machine learning[3].

Most of the existing works uses only the historical financial data or by using only
the present data for the prediction of stock prices. But in our proposed system we have
tried to address this problem by taking both historical financial and present data for the
prediction.

Fig:1 Architecture diagram for stock price prediction

IV. PROPOSED WORK

A.Data Representation

The dataset for regression method is downloaded from ichart.finance.yahoo.com as a


csv(comma separated values) consisting of a stocks along with data on the volume,
shares out, closing price, and other features . The python package NumPy is used for
converting the data into -1 to 1 values in the training of the raw data .In the regression
method dates are converted into some integer values using Numpy packages which
makes easier for the stock price prediction.

In sentiment analysis, the tweets are used as the dataset. The tweets are fetched from
twitter based on a keyword using the Twitter4J package in Java.

B. Prediction through Linear Regression

The regression process is done through the Scikit -learn machine learning library.
This is the core for the price prediction functionality. The raw data along with csv data
are used as input for regression algorithms and it returns prediction results. In particular,
every training dataset must be normalized to a Gaussian normally distributed or normal-
looking distribution between -1 and 1 before the input matrix is fit to the chosen
regression model. After it is fit into the regression model the stock price prediction values
are obtained.

C. Prediction through Sentiment analysis

The tweets are analyzed using the Sentiment Analyzer in NLP package in Java
developed by Stanford University. Before using the Sentiment Analyzer we can train the
Sentiment analyzer with the available dataset, which is present in the internet. We can use
the Sentiment analyzer without training but the prediction accuracy will be less. First we
have to establish connection between the program and the twitter. Using the twitter4j
package we search the tweets in the twitter based on the keyword using the search (). We
store the results in the Arraylist since it increases the size dynamically. The tweets in the
Arraylist are passed to Sentiment Analyzer one by one for analyzing. The tweet score is
given for tweets one by one and it is added successively and also numbers of tweets are
calculated. Finally total score is divided by number of tweets and based on the score it is
predicted whether the stock price will increase or decrease.

V. RESULTS
We predicted the stock price of Samsung by using both the linear regression
method and tweet analyzing method. We took the date from 01-01-2005 to present date
for learning.

A. Linear Regression

Linear regression was less sensitive to normalization techniques. So good results


were appearing in the prediction even when a small number of training set were used
without normalization, while this caused the polynomial regression models to
overflow[2]. Linear regression also provided plausible results after normalization with no
parameter tuning required due to its simplified model, although the accuracy was less for
large prediction dates. It gives the prediction values closely for small prediction window.

B. Sentiment Analysis

The accuracy of the tweet analysis improved a lot after the training of the NLP
package by the dataset available in the internet. Before training the analysis produced
result accuracy of 60% but after the training of NLP package the accuracy of the analysis
improved by 10% to 70%.

VI. CONCLUSION AND FUTURE WORK

Several issues like Tweet analysis accuracy, Training Dataset size and
Normalisation are addressed in this paper. This method is also more realistic since it
combines both present moment changes and historical data to predict the stock price.

There are some areas for improvement in this method like identifying the efficient
keyword to be passed for fetching the tweets, Regression algorithm can predict the stock
prices efficiently only for small duration of future dates. Stochastic regression algorithm
can predict the price even closer than the linear regression but we need training set to be
very large.

REFERENCES:

. [1] C. Tsai, and S. Wang 2009. Stock Price Forecasting by “Hybrid Machine Learning
Techniques. Proceedings of the International MultiConference of Engineers and
Computer Scientists”, 20-26, 2009. 


. [2] Lucas Nunno 2012. “Stock Market Price Prediction Using Linear and Polynomial
Regression Models” University of New Mexico

. [3] Hurwitz, E, and T Marwala. 2009. ”Common Mistakes when Applying


Computational Intelligence and Machine Learning to Stock Market modelling.”
University of Johannesburg Press. 


. 
[4] Lee, Jae Won, Jonghun Park, Jangmin O, and Jongwoo Lee. 2007. ”A
Multiagent Approach to Q-Learning for Daily Stock Trading.” IEEE
TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS 864-877 


. [5] Yan, Robert, and Charles Ling. 2007. ”Machine Learning for Stock Selection.”
Industrial and Government Track Short Paper Collection 1038-1042. 


Das könnte Ihnen auch gefallen