Empirical Asset Pricing Via Machine Learning in China Stock Market

Abstract
In this project, a typical problem of empirical asset pricing, risk premium measurement, is discussed via
machine learning. The project consists of three major parts: data collection and processing, models construction,
and the most important part - the empirical study which is based on China stock market data. For the data
collection part: by reading papers, definitions and calculating methods of different characteristics are detected;
data related to those characteristics can be found in China Stock Market and Accounting Research (CSMAR)
database; SAS is used to process those data. In the second part of this project, methods in the machine learning
repertoire including principal components regression, extreme boosted regression trees, random forests and neural
networks are selected. Besides, ordinary least squares regression model is established as a reference to highlight
the distinctive features of pre-mentioned models. Lastly, in the empirical study, models are implemented with
previously processed data; out-of-sample return prediction Roos

2
is used to evaluate the performance of different
models in measuring risk premium and a comparative analysis of models is presented in a latter section of this
part.
3
CONTENTS CONTENTS
Contents
1 Introduction 5
1.1 Why Apply Machine Learning to Asset Pricing? . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2 Why China Stock Market? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2 Literature Review 5
3 Methodology 6
3.1 Ordinary Least squares . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.2 Penalized Linear Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.3 Principal Components Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.4 Boosted Trees (XGBoost) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.5 Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.6 Model Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
4 Empirical Study in China Stock Market 11

4.1 Data Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
4.1.1 Important Features Explanation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
4.1.2 Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4.1.3 Data Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4.1.4 Data Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4.2 Expected Outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4.3 Model Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4.4 Machine Learning Portfolio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
5 Difficulties and Future Work 20
6 Conclusion 21
7 References 22
8 Appendix 24
4
2 LITERATURE REVIEW
1 Introduction
1.1 Why Apply Machine Learning to Asset Pricing?
There are three main reasons to apply machine learning to asset pricing. The first reason is that machine learning
can deal with a large number of features especially when the number of features approaches the observation count
or features are highly correlated. Secondly, due to the diversity of machine learning, it can solve complex problems
of which function forms are ambiguous. Moreover, by adding regularization parameters or penalization parameters,
machine learning can avoid over-fitting problems.
1.2 Why China Stock Market?

The main reason is that although China stock market has been around for 28 years, there are a limited number of
papers studying asset pricing on the basis of China stock market. In my project, at least, whether factors which are
used to measure risk premium in US stock market are reasonable and suitable for China stock market is revealed.
The results are probably useful to further asset pricing research in China stock market.
2 Literature Review
The history of asset pricing can be traced back to three centuries ago, however, it is after the publication of Modern
Portfolio Theory (MPT), proposed by Markowitz (1952), that numerous asset pricing models emerge (Dimson and
Mussavian, 1999). The Capital Asset Pricing Model (CAPM), on the basis of the MPT, starts to regard asset
prices as market equilibrium instead of exogenously given values (Krause, 2001). CAPM of Sharpe (1964) which is
a modified form of single-index model predicts a linear relation between the expected return and beta coefficient.
Cox and Ross (1976) extends CAPM to a multi-factor Arbitrage Pricing Theory. However, APT still follows the
linear relation assumption. After approximate 20 years, non-linear APT is proposed by Bansal and Viswanathan
(1993). Researchers realize that the relation between factors and excess stock returns may be more complex and
ambiguous, not just a linear relation.
With the rapid development of computer technology, machine learning methods start to be utilized in the field
of finance. Chiu et al. (1994) assist auditors to perform risk assessment tasks systematically and consistently by
applying Artificial Neural Network and Expert System. In the same year, Hutchinson et al. (1994) predict the
prices of derivatives by a nonparametric method, learning networks. Sun et al. (2006) evaluate the credit risks of
commercial banks by Support Vector Machine (SVM). Regression trees are constructed to predict default risks in
customers’ credit cards by Khandani et al. (2010). Then Butaru et al. (2016) study the same topic and improve
the method of regression trees. Multi-layer neural networks are popular in recent years. Takeuchi and Lee (2013)
improve the stocks momentum trading strategies by applying deep learning. Heaton et al. (2017) design a multi-
layers neural networks to select portfolios automatically.
When Fama and French (1993) was published, there were approximately 40 factors discovered in total (Harvey
et al., 2016). However, from then until 2012, the total number of discovered factors reached 240. Green et al. (2013)
count 330 stock-level predictive factors in published papers and working papers. It is highly possible that there
exists non-trivial correlation among factors. Since the relationship between predictive factors and stock returns
is ambiguous - difficult to decide whether it is linear or non-linear - and the number of predictors becomes large,
machine learning methods frequently appear in the literature of measuring stock returns. A comparative study of
neural networks (NN) and traditional statistical regressions regarding stock performance is done by Refenes et al.
(1994), though with a very limited number of factors. In the last five years, neural networks are widely used by
researchers in forecasting stock returns. Lasfer et al. (2013) use Design of Experiments (DOE) with Artificial Neural
Networks to forecast financial series and identify the significant factors for the neural networks. Arnerić et al. (2014)
5
3 METHODOLOGY
apply Jordan Recurrent Neural Networks (JNN) to predict conditional variance of stock returns. Besides neural
networks, Kelly et al. (2017) evaluate the factor pricing model by dimensional reduction and Freyberger et al. (2017)
establish a non-linear function of risk premium by shrinkage and selection methods.
The advent of various machine learning methods provides diverse approaches to analysis problems. However,
a question that which machine learning method is the best or a relatively better approach to a specific problem
is raised. In the research of Krauss et al. (2017), they compare the effectiveness of boosted trees, random forests,
deep neural networks and a combination of these models in the field of statistical arbitrage based on S&P 500.
They conclude that the combination algorithm performs best among all algorithms (Krauss et al., 2017). Gu et al.
(2018) explore Principle Component Analysis, Partial Least squaress, boosting trees, random forests, and neural
networks based on US stock market and conclude that trees and neural nets perform better results. In China stock
market, although the number of literature of applying machine learning methods to asset pricing is limited, there
are several excellent papers. For example, Zhang et al. (2018) explore four machine learning methods including
SVM, NN, Naive Bayesian Classifier, and random forest in forecasting the future direction of the market based on
Shanghai Stock Exchange 50 index stocks and their results demonstrate ANN outperform other three models. Due
to different problems studied by researchers and different comparisons among models, there is no consistent answer
to the aforementioned question. The distinctive focus of my project is to firstly identify reasonable predictors in
China stock market based on a large dataset, secondly apply a group of machine learning methods which will be
discussed in Methodology section, and compare model performance of measuring excess stock returns.
3 Methodology
This section explains machine learning methods implemented in the study. In each subsection, one method is
described. Each description contains two parts: a general function of the model and an objective function. In this
proposal, formulas are displayed with brief explanations. Further details of each model will be demonstrated in the
final report.
For the ease of description, some general forms and corresponding explanations should be presented here. An
excess return of an asset is described as an additive model:
ri,t+1 = ft (ri,t+1 ) + ✏i,t+1 (3.1)
where
ft (ri,t+1 ) = g ⇤ (zi,t ) (3.2)
✏i,t+1 is an error term. Time is indexed as t = 1, 2, . . . , T and stocks are indexed as i = 1, 2, . . . , N . The objective
is to identify appropriate specification of ft (ri,t+1 ) which is a function of predictive signals that maximizes the
out-of-sample R2 for realized return ri,t+1 . Those predictive signals are denoted as zi,t with M-dimension i.e.
zi,t = [x1 , x2 , x3 , . . . , xm ] where xj represents j th predictive factor. g ⇤ (zi,t ) is a flexible function of those predictive
factors and it also demonstrates that information which is used to make predictions cannot contain the history prior
to time t or other firm stocks which is not i.
Additionally, feature scaling process is important before implementing models. For each predictor in vector zi,t ,
a normalized value replaces the original value x. For firm i at time t, t may be a month index, a quarter index or a
year index according to the frequency of x. The normalization can be expressed as: xstdµ , where µ is the mean value
of x in a specific time interval and std is the standard deviation of x in the same time interval.
6
3.1 Ordinary Least squares 3 METHODOLOGY
3.1 Ordinary Least squares

Ordinary Least squares (OLS) regression model is a simple linear model which is widely used in linear relation
problems. Although OLS may fail to fit the data when the number of predictors is large, I construct this model as
a bottom line for other complex models which probably have high dimensions.
General Form of Model. In simple linear model g ⇤ (⇧) can be described as a linear function of the predictor
vector and the parameter vector, sometimes called weight vector, j,
T
g(zi,t ; ✓) = zi,t ✓ (3.3)
This function cannot reveal the inner relations among predictors or non-linear effects.
Objective Function. OLS applies the principle of standard least squares (usually denoted as l2 ) by minimizing
the sum of the squares of the differences between the observed values and fitted values. The objective function, also
called cost function is:
N T N
1 XX 1 X 2
J(✓) = (ri,t+1 g(zi,t ; ✓))2 + ✓ (3.4)
N T i=1 t=1 2 j=1 j
The parameter vector is obtained by minimizing the cost function. Generally, l2 penalty is added which is the
second part of the above equation.
3.2 Penalized Linear Regression

If too many features are included in a model, the learned hypothesis function may fit the training set well but
fail to generalize to new examples. This problem is named as over-fitting problem. In order to avoid over-fitting,
traditionally, there are two ways: one is to reduce the number of features manually by model selection; the other
way is regularization which is discussed in the following. Regularization keeps the features count unchanged but
reduces the magnitude of parameters by adding a penalty to the cost function. This is a crucial feature of the
penalized linear regression model.
General Form of Model. It is the same as the general form of the OLS which is displayed in equation (3.3).
Objective Function. To be general, the form of the cost function with regularization is:
J(✓; ·) = J(✓) + (✓; ·) (3.5)
There are numerous forms of regularization term (✓; ·). In this paper, I select the Elastic Net penalty:
M M
1 X 2 X
(✓; ·) = ⇢ ✓ + (1 ⇢) |✓j | (3.6)
2 j=1 j j=1
where and ⇢ are non-negative regularization parameters which can be optimized by using the validation sample
data. There are two special cases relating to equation (3.8). When ⇢ = 0, (✓; ·) is the least absolute shrinkage
and selection operator (LASSO); when ⇢ = 1, (✓; ·) corresponds to the ridge regression. Lasso uses the l1 -norm
regularization and Ridge uses the l2 -norm regularization while Elastic Net combines these two regularizations. Both
l1 penalty and l2 penalty can reduce the risk of overfitting, but the l1 penalty brings an additional benefit, which
is easier to obtain a “sparse” solution than the l2 penalty. l1 performances better in features selection while l2 is
better in limiting features magnitude. Hence, elastic net combines these advantages together. In practice, using
validation sample can optimize and ⇢ for this algorithm.
7
3.3 Principal Components Regression 3 METHODOLOGY
3.3 Principal Components Regression

Principal Components Regression (PCR) is a technique for dimension reduction. This method condenses a matrix
of possibly correlated predictive features from M-dimension to a much smaller K-dimension matrix consisting of
uncorrelated features. The value of K can be determined by an iteration process based on the training sample. But
with the singular value decomposition (SVD) method, K can be easily calculated.
General Form of Model. Since this method deals with high dimension problems, the first thing to do with
the aforementioned additive model in equation (3.1) is to vectorize it. The vectorized form is:
R = (ZUK )✓ + " (3.7)
where R is a N T ⇥ 1 return vector; Z is a N T ⇥ M matrix corresponds to M-dimenscfion predictor vector zi,t ; UK

is a M ⇥ K matrix, also called Ureduce matrix in some literature, with columns u1 , u2 , u3 , . . . , uk , each uj represents
a vector of new linear combination weights used to calculate the j th approximate component; ✓ is defined in the
same way as in equation (3.3) but with dimension K ⇥ 1 , and "is a N T ⇥ 1 vector of error ✏i,t+1 .
Objective Function. To implement PCR, firstly compute “covariance matrix” and then compute “eigenvectors”
of “covariance matrix”. The “eigenvectors” correspond to the columns in UK . PCR obtains UK recursively. uj is
displayed in a mathematical way:
uj = arg max V ar(Zu) (3.8)

m
subject to u0 u = 1, u0 Z 0 Z[u1 , u2 , . . . , uj ] = 0
Actually, in equation (3.10), it is clear that objective function does not focus on the objective values. It is also
a disadvantage of PCR that the method does not take observed values into accounts just focus on reducing the
dimension of features set.
3.4 Boosted Trees (XGBoost)

Compared to the linear regression models mentioned above, boosted trees can solve the interaction problems among
different features. Boosted trees algorithms can automatically detect interactions between features. Additionally,
unlike the linear regression models, trees are nonparametric i.e. trees have no assumptions on data distribution and
the complexity of trees is governed by pre-speciied tuning parameters. Besides, trees can naturally handle numerical
and catogorical data together in one model, that is, it is not necessary to seperate indicators and numerical data
by using the tree model.
Before introducing boosted trees, it is necessary to introduce the definition of decision trees. Decision trees
algorithms can be categorized to two classes: one is called “classification trees” and the other is “regression trees”.
Regression trees are frequently used for continously changing values such as stock returns. The decision tree actually
divides the space with a hyperplane. Each time it divides, the current space is divided into two, so that each leaf node
becomes a disjoint area in the space. The division process has a professional name called “branching”. When making
decisions, trees grow step by step by checking values of different features, and finally make the sample fall into one
of the K regions (assuming there are K leaf nodes), as shown in a latter example. In the regression tree, a heuristic
approach is used. Suppose there are n features, each feature has si(i2(1,n) ) values, then the algorithm will traverse
all the features and test all the values of each feature in the space. The division is implemented until the values sj
of the feature i is taken, so that the loss function is minimized. Therefore, the values si of the feature i becomes
a decision point. The formula describing the process is as follows: min[minLoss(yi , g1 (⇤)) + minLoss(yi , g2 (⇤))]
i,sj c1 c2
where g1 (⇤) and g2 (⇤) represent “gain” scores of the right subtree and the left subtree respectively. To be specific,
“gain” scores are approximations of the unknown function g ⇤ (zi,t ) which are obtained by calculating the average
value of the target feature within each partition.
Here is an example to demonstrate the above definitions.
I choose two features in the data set to illustrate those definitions. Two features are “Beta” and “Size”.
8
3.5 Neural Networks 3 METHODOLOGY
Figure 1: Example of Regression Tree
Figure 1 exhibits the process of how a tree realizes regression by using aforementioned two features. Firstly,
observations are sorted in beta value. If the value is larger than 1, then the observation is assigned to category 3
and this is the end of this observation in this example. If the value is less than 1, then the observation comes to the
next node. If this observation has a size value less than 0.5, then it is assigned to category 1, otherwise it is assigned
to category 2. Category 1, 2, 3 are terminal nodes of the tree. After observations are all assigned to different
categories, a score can be obtained for each observation and the next step is to forecast. Predictions of observations
in each category are calculated as the average score among observations in that category. The following function is
a model to forecast by using aforementioned “average score” .
Model. in a general form, the function elucidates a tree with K leaves (terminal nodes) and L depth.
K
X
g(zi,t ; ✓, K, L) = ✓k ⇤ 1{zi,t 2 Ck (L)} (3.9)
k=1
where ✓s are calculated by training sample in practice and they are defined as the average scores of different
catogories, 1{zi,t 2 Ck (L)} is the indicator function when zi,t 2 Ck (L) then the indicator equals to 1 otherwise it is
equal to 0, Ck (L) is one of the K categories.
In the example, the model becomes
g(zi,t ; ✓, 3, 2) = ✓1 ⇤ 1{betai,t <1} ⇤ 1{sizei,t <0.5} + ✓2 ⇤ 1{betai,t <1} ⇤ 1{sizei,t >0.5} + ✓3 ⇤ 1{betai,t >1}
Objective Function and Computational Algorithm. The objective function of extreme gradient boosted
trees (XGBoost) is:
N X
X T K
X
J(✓) = l(ri,t , g(zi,t ; ✓, K, L)) + ⌦(✓) (3.10)
i=1 t=1 k=1
The first part of the equation (3.10) is the loss function and the second part is the regularization of increasing
the complexity of the trees. The regularization can be written as (with l2 penalty):
K
1 X 2
⌦(✓) = K + ✓
2 j=1 j
The goal of this model is to find appropriate “scores” for all leaf nodes to minimize the objective function.
3.5 Neural Networks

Another non-linear model is the neural network (NN) which is preferred for complex statistical problems. Because of
the complexity and feasibility of neural networks, they becomes the least explicable and most highly parameterized
machine learning models. What the neural networks do is using one or more hidden layers to compute complicate
features to feed into the final output layer instead of directly using input features.
Model. In this project, the focus is on the traditional “feed-forward” networks. A neural network generally
consists of three parts: an input layer, hidden layers (0,1,2,3,......), and an output layer. To understand further
9
3.5 Neural Networks 3 METHODOLOGY
details of neural networks, I use a simple example to explain how a neural network works. The network of this
example is displayed in the following figure:
Figure 2: Example of Neural Network
The input layer of this neural network has 4 input features, to be specific, 3 features contained in the data set
( z1 , z2 , z3 ) and 1 bias unit which is z0 . Each arrow is assigned with a weight parameter ✓ and ✓(⇤) in the graph
is a four-dimentional
P3 parameter vector. The output layer aggregates the wighted features together and makes a
forecast k=0 (zk ✓k ), that is to say, the simplest neural network here is the linear regression model. In the hidden
layer, a means “activation function”. With the outcomes from the input layer, the nodes in the hidden layer apply
their own activation functions and then output the results of utilizing activation functions into the output layer.
The formula of an activation function (neuron j in layer 1) is as follows (a declaration is needed: a0 is also a bias
unit.):
X3
(1) (0) (0) (0)
aj = f (✓j,0 + (zk ✓j,k ))
k=1
Therefore, the ultimate output forecast for this neural network is:
3
X
(1) (1) (1)
g(z; ✓) = ✓0 + aj ✓j
j=1
In this example, there are total (3+1)⇥4+4 parameters to be estimated. Hence a more complicated neural network
will have more parameters.
Generally, the numbers of nodes in hidden layers, if a neural network has hidden layers, should between the num-
ber of total inputs and 1. Normally, researchers follow the geometric pyramid rule which is established by Masters
(1993) to decide the numbers of nodes in hidden layers. The geometric pyramid rule (Masters, 1993)mentions:
• For neural networks with one hidden layer (NN1):
p
N HN = n⇥m (3.11)
where n is the number of input features and m is the number of output nodes.
• For neural networks with two hidden layers (NN2):
N HN1 = m ⇥ r2 N HN2 = m ⇥ r (3.12)

p
where n and m have the same definition with NN1, r = m.
3 n
10
3.6 Model Performance Evaluation 4 EMPIRICAL STUDY IN CHINA STOCK MARKET
• For neural networks with three hidden layers (NN3):
N HN1 = m ⇥ r3 N HN2 = m ⇥ r2 N HN3 = m ⇥ r (3.13)

p
where r = m.
4 n
There are numerous potential activation functions to be chosen, such as sigmoid, ReLU, tanh, softmax. I apply
the same activation function at all nodes and choose a frequently-used activation function - ReLU. Compared to
sigmoid activation function, three reasons explains why researchers recently prefer ReLU. Firstly, ReLU is lucid.
Secondly, it has no possibility of vanishing gradient. The last reason is that in practice ReLU tend to show better
convergence performance than sigmoid (Krizhevsky et al., 2012). ReLU function is defined as:
(
0 if x < 0
ReLU (x) =
x otherwise
Let J (l) be the number of neurons in each layer l = 1, 2, ..., L not including the input layer. The output of
neuron j in layer l defines as
(l) 0 (l 1)
aj = ReLU (a(l 1) ✓ j ) (3.14)
Therefore, the final output of neural networks is:
1)0 (L 1)
g(z; ✓) = a(L ✓ (3.15)
Objective Function. a common solver to train a large data set is “adam” which refers to stochastic gradient-
based optimizer proposed by Kingma and Ba (2014). I evaluate the neural network weight parameters by minimizing
the l2 norm objective function of forecast errors. Hornik et al. (1989) declare that under reasonable penalty
conditions, a consistent and asymptotic normal prediction is produced by the estimation of the minimization of the
l2 error.
3.6 Model Performance Evaluation

The out-of-sample R2 is used to evaluate the performance of each predictive model in measuring the excess stock
returns. The formula is: P
2 (i,t)2test (ri,t+1 r̂i,t+1 )2
Roos = 1 P 2 (3.16)
(i,t)2test ri,t+1
To be more specific, (i, t) 2 test means that the performance evaluation only conducts on the testing sample not
the validation or the training sample.
4 Empirical Study in China Stock Market

4.1 Data Set
4.1.1 Important Features Explanation
According to my literature review, certain influential stock-level predictors (features) are studied by former re-
searchers and those features can be divided into four categories. Some features are selected to be explained as the
following:
• Price trend features:
– Short-term reversal (mom1m): use the average stock return for last month to represent the stock return
for this month.
11
4.1 Data Set 4 EMPIRICAL STUDY IN CHINA STOCK MARKET
– Stock momentum (mom12m): it is a rolling measurement. Use the average stock return for last 12
months to represent the stock return for this month.
– Momentum change (chmom): this is also called acceleration which focuses on the change in 6-month
momentum. The acceleration of the ith firm at time t is computed as:
tY1 tY7
Ai,t = (1 + Ri,s ) (1 + Ri,s )
s=t 6 s=t 12
where t is the month index and Ri,s is the buy-and-hold return for month s which is calculated by
the closing price on the last trading day in month s minus the closing price on the first trading day in
month s. The equation can be simply explained as the buy-and-hold returns for last 6 months less the
buy-and-hold returns in the time interval (t-12 , t-7).
The above definitions are based on the theoretical papers which are shown in the Appendix. However, in
practice, log-returns instead of simple returns are frequently used. The relationship between log-returns and
simple returns is as follow: ri,t = ln(1 + Ri,t ). Hence, the formula of the momentum change (chmom) can be
Pt 1 Pt 7
simply replaced by Ai,t = s=t 6 ri,s s=t 12 ri,s .
• Liquidity features:
– RMB trading volume liquidity (RMBvol): for monthly trading volume liquidity, it is
20
X (Hi + Li )
⇥ Vi
i=1
2
where Hi , Li are the highest price and the lowest price on the ith trading day in that month, respectively;
Vi represents the total trading volume on that trading day.
– Current Ratio (currat): Current ratio is an important index to measure the financial security status of
companies and companies’ ability of short-term solvency. To be specific, the ratio is calculated by current
assets and current liabilities.
Current assets
Current ratio =
Current liabilities
Current assets include: cash, accounts receivable, tradable securities, and inventories; current liabilities
include: accounts payable, notes payable, tax payable and other expenses payable.
– Share turnover (turn): it measures how difficult for investors to sell shares of a particular stock on the
market in a given period. It is defined as the total number of shares traded in a month divided by the
number of shares outstanding over the period.
• Risk measurements:
– Idiosyncratic volatility (idiovol): it refers to the residual variance obtained by regressing daily stock
returns on market index return. To be more specific, it can be calculated by the residual stock volatility
from Capital Asset Pricing Model.
– Market beta (beta): it is a measure of systematic risk of a security in comparison to the entire market.
12
4.1 Data Set 4 EMPIRICAL STUDY IN CHINA STOCK MARKET
• Fundamental signals:
– Asset growth (agr): the percentage change in total assets of two consecutive fiscal years, that is,
AssetGrowtht = (Assett Assett 1 )/Assett 1
where t is the year index.

– Earnings-to-price (ep): it is a ratio that a firm’s annual earnings are scaled by its market value of common
stocks (typically reported on December 31st).
4.1.2 Data Collection
Because the empirical study is based on China stock market data, all characteristic data are collected from China
Stock Market and Accounting Research (CSMAR), a powerful database offering data on China stock market and
financial statements of China listed companies.
Mentioned in Section 2 Literature Review, all characteristics I choose for this empirical study in China Stock
Market are contained in the studies implemented by Green et al. (2013) and Gu et al. (2018). Originally, there
were 94 characteristics studied by Gu et al. (2018) based on US stock market. For each characteristic, firstly, I
read its corresponding paper containing the definition and the method to calculate it. Following its definition, I,
cooperating with another two students, find data of the exact or similar characteristic in CSMAR. In the process
of collecting data, we discovered that for some items, such as, Research and Development (R&D) and Weighted
Average Cost of Capital (WACC), no data provided by China stock market; for certain items, Dividend Initiation,
for instance, majority records are missing. Hence, we deleted those characteristics related to those items from 94
characteristics. Finally, the data set I use in my study has 64 characteristics which are listed in tables in Appendix.
4.1.3 Data Process
For a massive data set, online SAS, a software specializing in data management and predictive analysis, is finally
chosen. Additionally, the other advantage of choosing online SAS is that it has an interface with CSMAR database
provided by Wharton Research Database Service (WRDS) which makes it easy to directly process data from online
database.
After computing all features in SAS, tables containing final values of features will be exported as csv files
(.csv) and then imported to Pycharm (Python IDE) where all models mentioned in Section 3 Methodology can be
implemented.
4.1.4 Data Description
Since China stock market emerged in 1990, the data set I use in the empirical study has a time span from 1990
to 2017. However, before 1998, the dataset has over 30 percent missing values. In order to reduce the inaccuracy
caused by the high percentage of missing values, I conservatively reconstruct the data set which has a time span
from 1998 to 2017. It contains 3783 stocks in China A-share market. Since the features contains industry adjusted
characteristics, I have to declare that 2926 stocks have industry code. Hence, the industry-adjusted features are
based on these 2926 stocks. There are 64 features in total, 19 features of which are monthly features while the
remaining features are yearly features. I summarize the statistical characteristics of 64 features including the number
of non-missing values, minimum, 1st percentile, 5th percentile, 95th percentile, 99th percentile and maximum. The
summary table is in the Appendix.
13
4.2 Expected Outcomes 4 EMPIRICAL STUDY IN CHINA STOCK MARKET
4.2 Expected Outcomes

The expected outcomes of this empirical study in China stock market are:
• Obtain the reasonable and suitable characteristics for China stock market.
• Identify the best performing models.
• Detect significant characteristics that all models agree on to influence the asset pricing.
• Discover the important role of machine learning models in the field of risk premium measurement.
4.3 Model Performance

Regarding machine learning algorithms, deciding ratios of training data set and testing data set is important.
According to Gu et al. (2018), they use 50 to 50 as their training and testing data set’s ratio. Hence, I follow the
similar instruction. For time series data set, randomly separating training and testing data set is not advisable.
Therefore, I initially divide 20 years of data into 12 years of training sample and the remaining 8 years of data
serves as out-of-sample testing data set.
Table 1: Out-of-sample Stock-level Return Prediction Performance of All Models (Percentage Roos
2
)
OLS OLS OLS Enet

indmom, logmmt, mmt12 logmmt, indmom, retvol logmmt, mmt12, retvol l1_ratio=0.2
0.4388 0.4163 0.1576 0.2321
XGB XGB NN(8) NN(16, 4) NN(23, 8, 3)
max_depth=4 max_depth=3 1 hidden layer 2 hidden layer 3 hidden layer
0.7321 1.1436 0.4565 0.5880 0.7592
Table 1 illustrates the comparison of different models in terms of their corresponding out-of-sample Roos
2
. Ordi-
nary Least squares model as mentioned in the Methodology (Section 3) is a baseline model. I firstly compute the
covariance between each feature and the stock return based on the traing data set. Secondly, rank all absolute values
of covariances from high to low and select top 5 features. Roodman (2007) mentions that too many instruments will
cause bias in Ordinary Least squares model. Therefore, I choose 3 different features from top 5 features to create
Ordinary Least squares model. The total number of combination is equal to 10. Then I training all ten OLS models
on testing data set and rank the corresponding Roos 2
. In table 1 I choose the highest three OLS models to present
prediction performance. In addition, for OLS models, l2 penalty is chosen for regularization. “Enet” in Table 1
is abbreviate of Elastic Net, a penalized linear regression model. Since Enet combines l1 penalty and l2 penalty
together, the parameter for Enet in Table 1 presents the weight I choose for the combination. “XGB” in Table 1
represents extreme gradient boosted trees. The parameter, “max depth”, can be simply regarded as the complexity
of trees. Increasing the number of max depth will make the model more complex. Meanwile, the possibility of
overfitting will also increase. In Table 1, I compare two extreme gradient boosted tree models. For a more com-
plicated XGBoost model, its Roos 2
is clearly lower than the other XGBoost model. This comparison demonstrates
that increasing the complexity of a model probably sacrifice the accuracy of a model and this inaccuracy is possibly
triggered by overfitting. “NN” in Table 1 stands for neural networks. The numbers in the parentheses represent
the total numbers of nodes in hidden layers. Mentioned in Methodology, I set the numbers of nodes in each hidden
layer based on Masters (1993) geometric pyramid rule. The input n for neural networks equals 64 and the output
m equals 1. Hence, follow the formula (3.11) the number of nodes for one-hidden-layer neural network is 8. r is
equal to 4 for two-hidden-layer neural network while r equals to 2.83 for three-hidden-layer neural network. Then,
similarly, follow the formula (3.12, 3.13) the numbers of nodes in different neural networks are decided. NN(16, 4)
means the neural network has two hidden layers, the first hidden layer has 16 nodes and the second hidden layer has
4 modes. NN(23, 8, 3) means that the neural network has three hidden layers which has 23, 8, 3 nodes respectively
in the first, the second and the third hidden layer. The comparison among three neural networks models explains
that appropriate increase in model complexity can enhance the performance of predictions.
A conclusion can be drawn from Table 1 that among the out-of-sample stock-level return prediction performance
of 9 models, extreme gradient boosted trees algorithm outperform other algorithms and neural networks outperform
ordinary least squares models.
14
4.3 Model Performance 4 EMPIRICAL STUDY IN CHINA STOCK MARKET
Additionally, different models have different requirements for data. Ordinary least squares models are highly
sensitive to features’ magnitudes and these models cannot handle missing values (NaN in data set). So I use
normalized strandard data set and replace all missing values with zero. On the other hand, XGBoost algorithm
can deal with extremly large or small data including missing values. Due to the nature of XGBoost algorithm, I
use original data set for trees (without normalization and replacement). Therefore, based on the data, XGBoost
algorithm dominates ordinary least squares models.
Figure 3: Result for Principal Component Analysis
Figure 3 shows the explained variance ratio of each feature. In the graph, besides the first two features, remaining
features have no significant differences in explained variance ratios. That is to say, the effect of reducing dimensions
is not significant. Hence, I drop the principal component regression for my data set.
In general, the dominant model for the data set is the tree model. The prediction performance of this model
showed in the following table:
Table 2: Roos
2
of Tree Model by Test Sample Starting Year
Year 2010 2011 2012 2013 2014 2015 2016 2017
2
Roos -0.0453 -0.1042 -0.0116 0.0108 0.0547 0.0328 0.0264 -0.0445
After data processing, the first item in Section 4.2 Expected Outcomes has been achieved. The Roos
2
s of different
models showed in Table 1 and Table 2 have achieved the second expected outcomes. For the remaining explanations
of model performance the focus is the third item mentioned in Section 4.2 Expected Outcomes: Detect significant
characteristics that all models agree on.
15
Figure 4: Feature Importance By Model
These 6 graphs show the top 20 features importance in ordinary least sqaure model, elastic net model, xgboost
tree model and three neural networks. In this stage, the focus is in the model not between different models.
Hence, the feature importance showed in the graphs is relative values not absolute values. In other words, the
most important feature in each model always has the feature importance score equal to 1, and the following feature
importance scores are calculated by comparing to the largest feature importance score.
In the ordinary least squares model, I use the absolute covariance between each feature and the return as a score
to approach feature importance. First four features are relative to momentum. Momentum is correlated to the past
returns. In this case, the OLS model indicates that past returns have significant influence on the future returns of
the same stocks.
In the graph of Enet model, the feature importance scores are obviously different for different features. The
highest importance score belongs to the difference between the percentage change in sales and the percentage change
in accounts receivable which is distinct from other five models.
From the graph of the tree model, top 20 features all have importance scores excess 0.3. Unlike the linear
regressions models, tree models can consider all features carefully and trees do not have obvious bias on certain
features. Among the top 10 features, besides the features related to returns, most of the rest features are related
16
to volatility which is different from linear regression models.

The neural network with 1 hidden layer shows a similar pattern to tree models for top 8 features but the rest
deviates from each other. Compare 3 neural networks, it is clearly that with the increasing complexity of the neural
networks the importance scores of the same variables are quite different. Some features such as dividend yield (dy)
and gross profitability (gma) appear for the first time in the complex neural networks. This phenomena exactly
elucidates the difficulty in neural networks, that is, neural networks cannot explain why certain features have high
weights compare to others.
Return on assets and return on equity appear in linear regression models while they are not significantly im-
portant for non-linear models. Additionally, the non-linear models (trees and neural networks) pay attention to
industry-adjusted features, such as Industy-adjusted size (idtsize) and Industry sales concentration (herf) while the
linear regression models ignore the importance of these features.
17
Figure 5: Feature Importance (64 features)
18
4.4 Machine Learning Portfolio 4 EMPIRICAL STUDY IN CHINA STOCK MARKET
Figure 5 displays the feature importance score of 64 features together. The label for the first column is “Useful
data”. I count the number of non-missing values for each feature. Then divide the counts by the total number
of the data set to get scores for the “Useful data” column. The total number of the whole data set is equal to
43665. After computing scores, I rank 64 features from high to low based on scores. The scores for 64 features
within each model are represented by the colour in cells in Figure 5. The darker the cell is, the more higher the
score is for that feature. Via this graph, it is crystal to see how a feature is important within each model and
whether the importance is related to the number of non-missing values in the whole data set. Also, the Figure
5 markedly exhibits significant characteristics that all models agree on by looking at the color of the entire row.
From the graph, the entire rows for return-relative characteristics are coloured in dark blue: 6-month momentum
(mmt6), 12-month momentum (mmt12), 36-month momentum (mmt36), return volatility (retvol) and industry
momentum (indmom). Besides, these six models are unanimous with the importance of liquidity characteristics:
liquidity volatility of RMB trading volume (std_rmbvol), liquidity volatility of turnover (std_turnover), Cash flow
volatility (stdcf), and Share turnover (turnover_m). Market-relative features, such as “beta”, book to market ratio
(BM), change in shares outstanding (chcsho) and size are also essential. Certain accounting characteristics such
as cash productivity (cashpr), corporate investment (cinvest) and change in tax expense have higher influence on
future stock returns compares to other accounting characteristics.
4.4 Machine Learning Portfolio

I evaluate the out-of-sample performance of the tree model based on portfolios. The first step is to create portfolios.
I compute one-month-ahead stock-return predictions in the test sample of the tree model at the end of each month.
Then rank stock-return predictions from high to low. Based on these predictions, I divide all stocks into 10 decile.
Decile 1 contains stocks whose predicted returns are undder the lowest 10 percentage of all ranked stock-return
predictions. Similarly, other deciles can be formed based on the predicted returns. I reconstruct the portfolio
every month with equal weights, because in this way the objective function minimizes the equal-weighted average
of squared prediction errors. Assume that I have enough money to invest in all stocks in a portfolio. The table
below exhibits the average monthly real returns and the average monthly predicted returns in the test sample from
2010 to 2017:
Table 3: Performance of Portfolio

2010 2011 2012 2013
Pred Real Pred Real Pred Real Pred Real
1 -0.0141 0.0017 -0.0197 -0.0276 -0.0228 0.0054 -0.0134 -0.0074
2 0.0037 0.0056 -0.0051 -0.0287 -0.0074 0.0080 0.0031 0.0038
3 0.0133 0.0066 0.0022 -0.0274 0.0003 0.0106 0.0122 0.0084
4 0.0209 0.0085 0.0084 -0.0258 0.0064 0.0121 0.0196 0.0140
5 0.0277 0.0081 0.0141 -0.0279 0.0120 0.0149 0.0260 0.0193
6 0.0341 0.0107 0.0197 -0.0250 0.0175 0.0144 0.0324 0.0243
7 0.0408 0.0147 0.0259 -0.0225 0.023 0.0164 0.0391 0.0257
8 0.0486 0.0158 0.0332 -0.0204 0.0297 0.0187 0.0469 0.0295
9 0.0585 0.0204 0.0429 -0.0216 0.0384 0.0178 0.0567 0.0349
10 0.0776 0.0311 0.0635 -0.0128 0.0581 0.0209 0.0773 0.0434
2014 2015 2016 2017
Pred Real Pred Real Pred Real Pred Real
1 -0.0016 0.0408 -0.0138 0.0033 -0.0168 0.0093 -0.0179 -0.0165
2 0.0142 0.0430 0.0045 0.0247 -0.0006 0.0159 -0.0052 -0.0154
3 0.0225 0.0458 0.0138 0.0255 0.0073 0.0161 0.0016 -0.0125
4 0.0290 0.0441 0.0212 0.0305 0.0134 0.0199 0.0070 -0.0135
5 0.0350 0.0392 0.0278 0.0368 0.0187 0.0268 0.0120 -0.0127
6 0.0409 0.0428 0.0342 0.0344 0.0240 0.0270 0.0168 -0.0113
7 0.0471 0.0467 0.0408 0.0428 0.0295 0.0248 0.0219 -0.0137
8 0.0542 0.0454 0.0487 0.0452 0.0357 0.0211 0.0280 -0.0107
9 00634 0.0455 0.0589 0.0449 0.0437 0.0217 0.0370 -0.0129
10 0.0843 0.0565 0.0830 0.1169 0.0650 0.0550 0.0635 0.0143
19
5 DIFFICULTIES AND FUTURE WORK
From Table 3, the signs of decile 1s and that of decile 10s are always opposite on the basis of prediction-sorted
portfolio. Hence, based on the prediction-sorted portfolio, I should long decile 10 and short decile 1 to get a better
performance of a machine learning long-short portfolio. However, compared to prediction-sorted portfolio, the signs
of real-return decile 1s and that of decile 10s are not always opposite. The testing data set have a time span of 8
years and each year has 12 months. That is to say, the total time periods for the testing sample is 96. I calculate
the signs of decile1s multiplied by the signs of decile10s. Results show that only 16 months they have opposite
signs. Therefore, based on my data set, merely long decile 10 may have better performance. To decide whether
choose the long-short portfolio or the decile 10 portfolio, I calculate the cumulative returns of these two portfolios.
The graph shows below:
Figure 6: Long-short Portfolio V. S. Decile 10
Note: Cumulative returns: I invest money into a portfolio for one month. After one month, I will get r1 amount
of returns then I put returns in my bank account. After another month, the money I put in my bank account will
become r1 ⇤ (1 + rf ) + r2 . rf here is the monthly risk free rate which is equal to 0.208%. The cumulative returns
showed in the graph represent the total earnings in my bank account.
The graph displays that before 2014 October, the long-short portfolio outperforms the decile 10 portfolio while
after October the decile 10 portfolio outperforms the long-short portfolio. Either portfolio can earn positive returns.
This demonstrates that the machine learning method I use for the data set successfully predict returns.
5 Difficulties and Future Work

This project mainly consists of two parts: data collection and models implementing. For the first part, although I
spend approximate 10 months on it, I cannot garuantee that the data I use is one handred percent correct. During
the process of calculating features, I detect that the original database (CSMAR) contains certain abnormal values.
For example, for “liability” in the balance sheet, normally it should be non-negative values while in CSMAR negative
liability is reported. Another example is about “leverage”. Under normal circumstances, listed companies would
control their leverage ratio and would not accept a high leverage ratio for a long time. But the possibility of a
leverage which is higher than 100 is not zero. This may be because the leverage of the entire industry which that
company belongs to is relatively high. However, in the calculation of leverage for certain companies, the values are
calculated in excess of ten thousands, which is extremely unusual. This case proves that the database may have
errors in the values of the debt or the equity. Therefore, for scholars who have interests to study the China stock
market in the future, data cleaning will still be an seminal problem.
My second part is to implement machine learning algorithms and make predictions. Using python to write
machine learning algorithms is efficient, because there are numbers of ready-made packages that can be used,
such as scikit-learn package and xgboost package. However, the difficult part is to decide the values of the hyper
parameters. One solution to this problem is to use “gridsearch” which requires high-performance hardware. Another
way to solve the problem is to use validation set. If I continue to study on the same topic in the future, I will
attempt to use validation sample.
20
6 CONCLUSION
6 Conclusion
All expected outcomes mentioned in section 4.2 have been achieved. By using SAS, 64 features are computed and
summarized. After construct models via python, the empirical study of China stock market starts. The out-of-
sample Roos
2
is used to evaluate models’ performance. By comparing the values of Roos2
. I detect a dominant model
which is extreme gradient boosted tree model. Combining the feature importance scores of different models together,
I recognize the essential features which are related to future stock returns. The majority of these important features
are relative to past returns and the industry average performance. After identify the dominant model for stock
return prediction, I continue to create machine learning portfolios to see whether the model successfully predict
returns. Hence, based on the rank of predicted returns of this model, I create 10 different portfolios. By comparing
the real returns in the test sample of the decile 10 portfolio (containing stocks with the highest 10% predicted
returns) and that of the decile 1 portfolio (containing stocks with the lowest 10% predicted returns), I display the
performance of different portfolio choices (one is long-short portfolio and the other is decile 10 portfolio). Either
portfolio earns positive returns during the time span of the test sample which is from 2010 to 2017. Therefore,
the machine learning model successfully predicts the stock returns. Finally, the findings of this project assist to
manifest that machine learning algorithms are growing and will have promising influence in the future financial
market in China.
21
7 REFERENCES
7 References
Arnerić, J., Poklepović, T., Aljinović, Z., 2014. Garch based artificial neural networks in forecasting conditional
variance of stock returns. Croatian Operational Research Review 5, 329–343.
Bansal, R., Viswanathan, S., 1993. No arbitrage and arbitrage pricing: A new approach. The Journal of Finance
48, 1231–1262.
Butaru, F., Chen, Q., Clark, B., Das, S., Lo, A.W., Siddique, A., 2016. Risk and risk management in the credit
card industry. Journal of Banking & Finance 72, 218–239.
Chiu, C.T., et al., 1994. An intelligent forecasting support system in auditing: expert system and neural network
approach, in: System Sciences, 1994. Proceedings of the Twenty-Seventh Hawaii International Conference on,
IEEE. pp. 272–280.
Cox, J.C., Ross, S.A., 1976. The valuation of options for alternative stochastic processes. Journal of financial
economics 3, 145–166.
Dimson, E., Mussavian, M., 1999. Three centuries of asset pricing. Journal of Banking & Finance 23, 1745–1769.
Fama, E.F., French, K.R., 1993. Common risk factors in the returns on stocks and bonds. Journal of financial
economics 33, 3–56.
Freyberger, J., Neuhierl, A., Weber, M., 2017. Dissecting characteristics nonparametrically. Technical Report.
National Bureau of Economic Research.
Green, J., Hand, J.R., Zhang, X.F., 2013. The supraview of return predictive signals. Review of Accounting Studies
18, 692–730.
Gu, S., Kelly, B.T., Xiu, D., 2018. Empirical asset pricing via machine learning .
Harvey, C.R., Liu, Y., Zhu, H., 2016. âŠ and the cross-section of expected returns. The Review of Financial Studies
29, 5–68.
Heaton, J., Polson, N., Witte, J.H., 2017. Deep learning for finance: deep portfolios. Applied Stochastic Models in
Business and Industry 33, 3–12.
Hornik, K., Stinchcombe, M., White, H., 1989. Multilayer feedforward networks are universal approximators. Neural
networks 2, 359–366.
Hutchinson, J.M., Lo, A.W., Poggio, T., 1994. A nonparametric approach to pricing and hedging derivative securities
via learning networks. The Journal of Finance 49, 851–889.
Kelly, B.T., Pruitt, S., Su, Y., 2017. Some characteristics are risk exposures, and the rest are irrelevant .
Khandani, A.E., Kim, A.J., Lo, A.W., 2010. Consumer credit-risk models via machine-learning algorithms. Journal
of Banking & Finance 34, 2767–2787.
Kingma, D.P., Ba, J., 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 .
Krause, A., 2001. An overview of asset pricing models. University of Bath: UK .
Krauss, C., Do, X.A., Huck, N., 2017. Deep neural networks, gradient-boosted trees, random forests: Statistical
arbitrage on the s&p 500. European Journal of Operational Research 259, 689–702.
Krizhevsky, A., Sutskever, I., Hinton, G.E., 2012. Imagenet classification with deep convolutional neural networks,
in: Advances in neural information processing systems, pp. 1097–1105.
Lasfer, A., El-Baz, H., Zualkernan, I., 2013. Neural network design parameters for forecasting financial time series,
in: Modeling, Simulation and Applied Optimization (ICMSAO), 2013 5th International Conference on, IEEE.
pp. 1–4.
Markowitz, H., 1952. Portfolio selection. The journal of finance 7, 77–91.
Masters, T., 1993. Practical neural network recipes in C++. Morgan Kaufmann.
22
7 REFERENCES
Refenes, A.N., Zapranis, A., Francis, G., 1994. Stock performance modeling using neural networks: a comparative
study with regression models. Neural networks 7, 375–388.
Roodman, D., 2007. A short note on the theme of too many instruments. Center for Global Development Working
Paper 125.
Sharpe, W.F., 1964. Capital asset prices: A theory of market equilibrium under conditions of risk. The journal of
finance 19, 425–442.
Sun, W., Yang, C.g., Qi, J.x., 2006. Credit risk assessment in commercial banks based on support vector machines,
in: Machine Learning and Cybernetics, 2006 International Conference on, IEEE. pp. 2430–2433.
Takeuchi, L., Lee, Y.Y.A., 2013. Applying deep learning to enhance momentum trading strategies in stocks, in:
Technical Report. Stanford University.
Zhang, C., Ji, Z., Zhang, J., Wang, Y., Zhao, X., Yang, Y., 2018. Predicting chinese stock market price trend
using machine learning approach, in: Proceedings of the 2nd International Conference on Computer Science and
Application Engineering, ACM. p. 83.
23
8 APPENDIX
8 Appendix
Data Summary
24
Table 4: Details of Selected Variables
No. Acronym Firm Charateristic Paper’s author(s) Year, Journal Frequency
1 acc Working capital accruals Sloan 1996, TAR Annual
2 age Number of years since firm listed Jiang, Lee, Zhang 2005, RAS Monthly
3 agr Asset growth Cooper, Gulen, Schill 2008, JF Annual
4 beta Beta Fama, MacBeth 1973, JPE Monthly
5 beta2 Beta squared Fama, MacBeth 1973, JPE Monthly
6 bm Book-to-market Rosenberg, Reid, Lanstein 1985, JPM Annual
7 bm_ia Industry-adjusted book-to-market Asness, Porter, Stevens 2000, WP Annual
8 cash Cash holdings Palazzo 2012, JFE Annual
9 cashdebt Cash flow to debt Ou, Penman 1989, JAE Annual
10 cashpr Cash productivity Chandrashekar, Rao 2009, WP Annual
11 cfp Cash flow to price ratio Desai, Rajgopal, Venkatachalam 2004, TAR Annual
25
12 cfp_ia Industry-adjusted cash flow to price ratio Asness, Porter, Stevens 2000, WP Annual
13 chato Change in asset turnover Soliman 2008, TAR Annual
14 chcsho Change in shares outstanding Pontiff, Woodgate 2008, JF Annual
15 chemp Change in employees Asness, Porter, Stevens 2000, WP Annual
16 chinv Change in inventory Thomas, Zhang 2002, RAS Annual
17 chmmt Change in 6-month momentum Gettleman, Marks 2006, WP Monthly
18 chtx Change in tax expense Thomas, Zhang 2011, TAR Annual
19 cinvest Corporate investment Titman, Wei, Xie 2004, JFQA Annual
20 cnvdebt Convertible debt indicator Valta 2016, JFQA Annual
21 currat Current ratio Ou, Penmany6 1989, JAE Annual
22 divom Dividend omission Michaely, Thaler, Womack 1995, JF Annual
23 dy Dividend to price (dividend yield) Litzenberger, Ramaswamy 1982, JF Annual
24 egr Growth in common shareholder equity Richardson, Sloan, Soliman, Tuna 2005, JAE Annual
25 ep Earnings to price Basu 1977, JF Annual
8 APPENDIX
Table 5: Details of Selected Variables (Continued)
26 evol Earnings volatility Francis, LaFond, Olsson, Schipper 2004, TAR Annual
27 gma Gross profitability Novy-Marx 2013, JFE Annual
28 grCAPX Growth in capital expenditure Anderson, Garcia-Feijoo 2006, JF Annual
29 grltnoa Growth in long term net operating assets Fairfield, Whisenant, Yohn 2003, TAR Annual
30 herf Industry sales concentration Hou, Robinson 2006, JF Annual
31 idiovol Idiosyncratic return volatility Ali, Hwang, Trombley 2003, JFE Monthly
32 idtsize Industy-adjusted size Asness, Porter, Stevens 2000, WP Monthly
33 illiqAmihud_m Illiquidity Amihud 2002, JFM Monthly
34 indmom Industry momentum Moskowitz, Grinblatt 1999, JF Monthly
35 lev Leverage Bhandari 1988, JF Annual
36 lgr Growth in long-term debt Richardson, Sloan, Soliman, Tuna 2005, JAE Annual
26
37 maxret Maximum daily return Bali, Cakici, Whitelaw 2011, JFE Monthly
38 mmt12 12-month momentum Jegadeesh 1990, JF Monthly
39 mom1m 1-month momentum(logmmt) Jegadeesh, Titman 1993, JF Monthly
40 mmt36 36-month momentum Jegadeesh, Titman 1993, JF Monthly
41 mmt6 6-month momentum Jegadeesh, Titman 1993, JF Monthly
42 oprprof Operating profitability Fama, French 2015, JFE Annual
43 orgcap Organizational capital Eisfeldt, Papanikolaou 2013, JF Annual
44 pchcurrat % change in current ratio Ou, Penman 1989, JAE Annual
45 pchdepr % change in depreciation Holthausen, Larcker 1992, JAE Annual
46 pchsale_pchinvt % change in sales-% change in inventory Abarbanell, Bushee 1998, TAR Annual
47 pchsale_pchrect % change in sales-% change in accounts receivable Abarbanell, Bushee 1998, TAR Annual
48 pchsaleinv % change sales-to-inventory Ou, Penman 1989, JAE Annual
49 pctacc Percent accruals Hafzalla, Lundholm, Winkle 2011, TAR Annual
50 ps Financial statements score(F_score) Piotroski 2000, JAR Annual
8 APPENDIX
Table 6: Details of Selected Variables (Continued)
51 retvol Return volatility Ang, Hodrick, Xing, Zhang 2006, JF Monthly
52 rmbvol RMB trading volume Chordia, Subrahmanyam, Anshuman 2001, JFE Monthly
53 roa Return on assets Balakrishnan, Bartov, Faurel 2010, JAE Annual
54 roe Return on equity Hou, Xue, Zhang 2015, RFS Annual
55 roic Return on invested capital Brown, Rowe 2007, WP Annual
56 salecash Sales to cash Ou, Penman 1989, JAE Annual
27
57 salerec Sales to receivable Ou, Penman 1989, JAE Annual
58 salep Sales to price Barbee, Mukherji, Raines 1996, FAJ Annual
59 sgr Sales growth Lakonishok, Shleifer, Vishny 1994, JF Annual
60 size Size Banz 1981, JFE Monthly
61 std_RMBvol Volatility of liquidity (RMB trading volume) Chordia, Subrahmanyam, Anshuman 2001, JFE Monthly
62 std_turnover Volatility of liquidity (share turnover) Chordia, Subrahmanyam, Anshuman 2001, JFE Monthly
63 stdcf Cash flow volatility Huang 2009, JEF Annual
64 turnover_m Share turnover Datar, Naik, Radeliffe 1998, JFM Monthly
8 APPENDIX

Empirical Asset Pricing Via Machine Learning in China Stock Market

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Empirical Asset Pricing Via Machine Learning in China Stock Market

Hochgeladen von

Copyright:

Verfügbare Formate

Abstract

previously processed data; out-of-sample return prediction Roos

4 Empirical Study in China Stock Market 11

5 Diﬃculties and Future Work 20

1.2 Why China Stock Market?

ri,t+1 = ft (ri,t+1 ) + ✏i,t+1 (3.1)

3.1 Ordinary Least squares

3.2 Penalized Linear Regression

J(✓; ·) = J(✓) + (✓; ·) (3.5)

3.3 Principal Components Regression

R = (ZUK )✓ + " (3.7)

where R is a N T ⇥ 1 return vector; Z is a N T ⇥ M matrix corresponds to M-dimenscfion predictor vector zi,t ; UK

uj = arg max V ar(Zu) (3.8)

3.4 Boosted Trees (XGBoost)

Figure 1: Example of Regression Tree

3.5 Neural Networks

Figure 2: Example of Neural Network

• For neural networks with two hidden layers (NN2):

N HN1 = m ⇥ r2 N HN2 = m ⇥ r (3.12)

• For neural networks with three hidden layers (NN3):

N HN1 = m ⇥ r3 N HN2 = m ⇥ r2 N HN3 = m ⇥ r (3.13)

3.6 Model Performance Evaluation

4 Empirical Study in China Stock Market

• Price trend features:

AssetGrowtht = (Assett Assett 1 )/Assett 1

where t is the year index.

4.1.2 Data Collection

4.1.3 Data Process

4.1.4 Data Description

4.2 Expected Outcomes

4.3 Model Performance

OLS OLS OLS Enet

Figure 3: Result for Principal Component Analysis

Figure 4: Feature Importance By Model

to volatility which is diﬀerent from linear regression models.

Figure 5: Feature Importance (64 features)

4.4 Machine Learning Portfolio

Table 3: Performance of Portfolio

Figure 6: Long-short Portfolio V. S. Decile 10

5 Diﬃculties and Future Work

Das könnte Ihnen auch gefallen