(18 intermediate revisions by 2 users not shown)
Line 32: Line 32:
 
===2.2 Principle of BP neural network===
 
===2.2 Principle of BP neural network===
  
A typical BP neural network includes an input layer, one or more hidden layers, and an output layer. Its network structure is shown in Figure 1. The algorithm learning process of BP neural network is mainly composed of input forward propagation and error back propagation. In the forward propagation process, input samples are input from the input layer, processed by the hidden layer units, and the actual output value of each unit is calculated according to the weight and threshold. If the actual output value and the expected value reach a predetermined error range at this time, the learning process ends successfully. The back-propagation method is to adjust the weight through the network error in the back, and modify the weight matrix according to the actual output and the expected output to reduce the error of the neural network structure [14,15].
+
A typical BP neural network includes an input layer, one or more hidden layers, and an output layer. Its network structure is shown in [[#img-1|Figure 1]]. The algorithm learning process of BP neural network is mainly composed of input forward propagation and error back propagation. In the forward propagation process, input samples are input from the input layer, processed by the hidden layer units, and the actual output value of each unit is calculated according to the weight and threshold. If the actual output value and the expected value reach a predetermined error range at this time, the learning process ends successfully. The back-propagation method is to adjust the weight through the network error in the back, and modify the weight matrix according to the actual output and the expected output to reduce the error of the neural network structure [14,15].
  
 +
<div id='img-1'></div>
 
{| style="text-align: center; border: 1px solid #BBB; margin: 1em auto; width: auto;max-width: auto;"
 
{| style="text-align: center; border: 1px solid #BBB; margin: 1em auto; width: auto;max-width: auto;"
 
|-
 
|-
Line 64: Line 65:
 
{| style="text-align: center; margin:auto;"  
 
{| style="text-align: center; margin:auto;"  
 
|-
 
|-
| <math display="inline">h_j=f(V_j^TX^T)\mbox{ },\mbox{ }j=1,2,\cdots ,m.</math>
+
| <math display="inline">h_j=f\left(V_j^TX^T \right)\mbox{ },\mbox{ }j=1,2,\cdots ,m.</math>
 
|}
 
|}
 
| style="width: 5px;text-align: right;white-space: nowrap;" | (2)
 
| style="width: 5px;text-align: right;white-space: nowrap;" | (2)
Line 74: Line 75:
 
{| style="text-align: center; margin:auto;"  
 
{| style="text-align: center; margin:auto;"  
 
|-
 
|-
| <math display="inline">y_k=f(W_k^TH^T)\mbox{ },\mbox{ }k=1,2,\cdots ,l.</math>
+
| <math display="inline">y_k=f\left(W_k^TH^T\right)\mbox{ },\mbox{ }k=1,2,\cdots ,l.</math>
 
|}
 
|}
 
| style="width: 5px;text-align: right;white-space: nowrap;" | (3)
 
| style="width: 5px;text-align: right;white-space: nowrap;" | (3)
Line 87: Line 88:
 
{| style="text-align: center; margin:auto;"  
 
{| style="text-align: center; margin:auto;"  
 
|-
 
|-
| <math display="inline">e=(1/2)\sum_{k=1}^l{\left(d_k-y_k\right)}^2</math>
+
| <math display="inline">e=(1/2)\displaystyle\sum_{k=1}^l{\left(d_k-y_k\right)}^2</math>
 
|}
 
|}
 
| style="width: 5px;text-align: right;white-space: nowrap;" | (4)
 
| style="width: 5px;text-align: right;white-space: nowrap;" | (4)
Line 100: Line 101:
 
{| style="text-align: center; margin:auto;"  
 
{| style="text-align: center; margin:auto;"  
 
|-
 
|-
| <math display="inline">{\delta }_j^h=({\sum }_{k=1}^l{\delta }_k^yW_{jk})\cdot h_j\cdot (1-</math><math>h_j)\mbox{ },\mbox{ }\mbox{ }j=1,2,\cdots ,m.</math>
+
| <math display="inline">{\delta }_j^h=\left({\displaystyle\sum }_{k=1}^l{\delta }_k^yW_{jk}\right)\cdot h_j\cdot (1-</math><math>h_j)\mbox{ },\mbox{ }\mbox{ }j=1,2,\cdots ,m.</math>
 
|}
 
|}
 
| style="width: 5px;text-align: right;white-space: nowrap;" | (5)
 
| style="width: 5px;text-align: right;white-space: nowrap;" | (5)
Line 146: Line 147:
 
{| style="text-align: center; margin:auto;"  
 
{| style="text-align: center; margin:auto;"  
 
|-
 
|-
| <math>E=(1/2){\sum }_{p=1}^p\sum_{k=1}^l{\left(d_k-y_k\right)}^2</math>
+
| <math>E=(1/2){\displaystyle\sum }_{p=1}^p\displaystyle\sum_{k=1}^l{\left(d_k-y_k\right)}^2</math>
 
|}
 
|}
 
| style="width: 5px;text-align: right;white-space: nowrap;" | (9)
 
| style="width: 5px;text-align: right;white-space: nowrap;" | (9)
Line 170: Line 171:
 
</div>
 
</div>
  
{| style="margin: 1em auto 0.1em auto;border-collapse: collapse;font-size:85%;width:85%;"  
+
{| class="wikitable" style="margin: 1em auto 0.1em auto;border-collapse: collapse;font-size:85%;width:auto:85%;"  
 
|-
 
|-
style="border-top: 1pt solid black;"|65
+
|  65
style="border-top: 1pt solid black;"|1
+
|  1
style="border-top: 1pt solid black;"|1
+
|  1
style="border-top: 1pt solid black;"|1
+
|  1
style="border-top: 1pt solid black;"|1
+
|  1
style="border-top: 1pt solid black;"|0.217464
+
|  0.217464
style="border-top: 1pt solid black;"|0.689387
+
|  0.689387
style="border-top: 1pt solid black;"|0.615622
+
|  0.615622
style="border-top: 1pt solid black;"|0.933314
+
|  0.933314
style="border-top: 1pt solid black;"|1.076462
+
|  1.076462
 
|-
 
|-
 
| 802
 
| 802
Line 271: Line 272:
 
| 0.778539
 
| 0.778539
 
|-
 
|-
style="border-bottom: 1pt solid black;"|521
+
|  521
style="border-bottom: 1pt solid black;"|0.710836
+
|  0.710836
style="border-bottom: 1pt solid black;"|1
+
|  1
style="border-bottom: 1pt solid black;"|0.302097
+
|  0.302097
style="border-bottom: 1pt solid black;"|0.244671
+
|  0.244671
|  style="border-bottom: 1pt solid black;"|0.396943
+
| 0.396943
style="border-bottom: 1pt solid black;"|0.315258
+
|  0.315258
style="border-bottom: 1pt solid black;"|0.393372
+
|  0.393372
style="border-bottom: 1pt solid black;"|0.913728
+
|  0.913728
style="border-bottom: 1pt solid black;"|0.75364
+
|  0.75364
 
|}
 
|}
  
  
In this experiment, we also use historical data to evaluate the model, and the verification method is full set verification. Figure 2 shows the accuracy and error rate of the model classification. Obviously, the accuracy is significantly higher than the error rate. In finance, it is not easy to achieve 72% accuracy. So, as long as the number of transactions is enough, the probability of profit is very considerable.
+
In this experiment, we also use historical data to evaluate the model, and the verification method is full set verification. [[#img-2|Figure 2]] shows the accuracy and error rate of the model classification. Obviously, the accuracy is significantly higher than the error rate. In finance, it is not easy to achieve 72% accuracy. So, as long as the number of transactions is enough, the probability of profit is very considerable.
  
{| style="text-align: center; border: 1px solid #BBB; margin: 1em auto; width: 85%;"
+
<div id='img-2'></div>
 +
{| style="text-align: center; border: 1px solid #BBB; margin: 1em auto; width: auto;max-width: auto;"
 
|-
 
|-
|style="padding:10px;"|  [[Image:Draft_Wang_861907375-image29.png|334px]]
+
| style="padding:10px;"|  [[Image:Draft_Wang_861907375-image29.png|center|334px]]
| style="padding:10px;"|  [[Image:Draft_Wang_861907375-image30.png|358px]]
+
| style="padding:10px;"|  [[Image:Wang_2020a_3767_Figura2b.png|center|358px]]
 
|- style="text-align: center; font-size: 75%;"
 
|- style="text-align: center; font-size: 75%;"
 
| colspan="2" style="padding-bottom:10px;"| '''Figure 2'''. Evaluation results of the model
 
| colspan="2" style="padding-bottom:10px;"| '''Figure 2'''. Evaluation results of the model
Line 318: Line 320:
 
{| style="text-align: center; margin:auto;"  
 
{| style="text-align: center; margin:auto;"  
 
|-
 
|-
| <math display="inline">u_p=\sum_{i=1}^nu_iw_i=U^TW</math>
+
| <math display="inline">u_p=\displaystyle\sum_{i=1}^nu_iw_i=U^TW</math>
 
|}
 
|}
 
| style="width: 5px;text-align: right;white-space: nowrap;" | (11)
 
| style="width: 5px;text-align: right;white-space: nowrap;" | (11)
Line 348: Line 350:
 
{| style="text-align: center; margin:auto;"  
 
{| style="text-align: center; margin:auto;"  
 
|-
 
|-
| <math display="inline">{\sigma }_p^2=D(r)=\sum_{i=1}^n\sum_{j=1}^n\left(w_iw_j{\sigma }_{ij}\right)=</math><math>W^TEW</math>
+
| <math display="inline">{\sigma }_p^2=D(r)=\displaystyle\sum_{i=1}^n\displaystyle\sum_{j=1}^n\left(w_iw_j{\sigma }_{ij}\right)=</math><math>W^TEW</math>
 
|}
 
|}
 
| style="width: 5px;text-align: right;white-space: nowrap;" | (13)
 
| style="width: 5px;text-align: right;white-space: nowrap;" | (13)
Line 364: Line 366:
 
min={\sigma }_p^2=W^TEW\\
 
min={\sigma }_p^2=W^TEW\\
 
u_p=U^TW\\
 
u_p=U^TW\\
\sum_{i=1}^nw_i=1
+
\displaystyle\sum_{i=1}^nw_i=1
 
\end{array}\right.</math>
 
\end{array}\right.</math>
 
|}
 
|}
Line 408: Line 410:
 
Constructing the Lagrange multiplier function  <math display="inline">L=W^TEW+\lambda^T (AW-B)</math>, where <math display="inline">\lambda =[\lambda_1,\lambda_2]^T</math>.
 
Constructing the Lagrange multiplier function  <math display="inline">L=W^TEW+\lambda^T (AW-B)</math>, where <math display="inline">\lambda =[\lambda_1,\lambda_2]^T</math>.
  
Let  <math display="inline">\frac{\partial L}{\partial \lambda }=0, \frac{\partial L}{\partial W}=0,</math> that is
+
Let  <math display="inline">\displaystyle\frac{\partial L}{\partial \lambda }=0, \displaystyle\frac{\partial L}{\partial W}=0,</math> that is
  
 
{| class="formulaSCP" style="width: 100%; text-align: center;"  
 
{| class="formulaSCP" style="width: 100%; text-align: center;"  
Line 438: Line 440:
 
==4. Simulation results and analysis==
 
==4. Simulation results and analysis==
  
The proposed portfolio theoretical model is verified and simulated by MATLAB software. Now we are ready to invest in 8 stocks, just select the top 8 from the stock ranking table 1 given in the previous section, which are recorded as P<sub>1</sub>, P<sub>2</sub>,..., P<sub>8</sub> respectively. The simulation results are shown in Figure 3 and Figure 4.
+
The proposed portfolio theoretical model is verified and simulated by MATLAB software. Now we are ready to invest in 8 stocks, just select the top 8 from the stock ranking table 1 given in the previous section, which are recorded as <math>P_1 , P_2,\cdots, P_8</math> respectively. The simulation results are shown in Figures [[#img-3|3]] and [[#img-4|4]].
  
<div class="center" style="width: auto; margin-left: auto; margin-right: auto;">
+
<div id='img-3'></div>
[[Image:Draft_Wang_861907375-image58.png|372px]] </div>
+
{| style="text-align: center; border: 1px solid #BBB; margin: 1em auto; width: auto;max-width: auto;"
 +
|-
 +
|style="padding:10px;"| [[Image:Wang_2020a_2053_Figura3.png|372px]]
 +
|- style="text-align: center; font-size: 75%;"
 +
| colspan="1" style="padding-bottom:10px;"| '''Figure 3.''' Effective frontier curve
 +
|}
  
<div class="center" style="width: auto; margin-left: auto; margin-right: auto;">
+
<div id='img-4'></div>
<span style="text-align: center; font-size: 75%;">'''Figure 3.''' Effective frontier curve</span></div>
+
{| style="text-align: center; border: 1px solid #BBB; margin: 1em auto; width: auto;max-width: auto;"
 
+
|-
Here, we need to focus on Figure 3. With this chart, we can easily see the distribution curve of risk and return. This will provide us with a basis for deciding which set of portfolios to choose. When we choose a point on the curve, we get a set of investment weights. If you are an investor who seeks high returns without fear of high risks, you can choose the top set of portfolios. Of course, most people will choose a relatively compromise solution, that is, the benefits are greater, but the risks can be tolerated.
+
|style="padding:10px;"| [[Image:Wang_2020a_2972_Figura4.png|372px]]
 +
|- style="text-align: center; font-size: 75%;"
 +
| colspan="1" style="padding-bottom:10px;"| '''Figure 4.''' Distribution of investment weight
 +
|}
  
<div class="center" style="width: auto; margin-left: auto; margin-right: auto;">
 
[[Image:Draft_Wang_861907375-image59.png|372px]] </div>
 
  
<div class="center" style="width: auto; margin-left: auto; margin-right: auto;">
+
Here, we need to focus on [[#img-3|Figure 3]]. With this chart, we can easily see the distribution curve of risk and return. This will provide us with a basis for deciding which set of portfolios to choose. When we choose a point on the curve, we get a set of investment weights. If you are an investor who seeks high returns without fear of high risks, you can choose the top set of portfolios. Of course, most people will choose a relatively compromise solution, that is, the benefits are greater, but the risks can be tolerated.
<span style="text-align: center; font-size: 75%;">'''Figure 4.''' Distribution of investment weight</span></div>
+
  
Figure 4 is an investment weight allocation chart for different risk appetites. When we choose an abscissa, it corresponds to a portfolio. Of course, we can also directly calculate the specific weight distribution data from the model. But in the form of a graph, it is more intuitive to see the difference in portfolio schemes under different risk preferences. The specific manifestation is that the investment ratio of each stock is different. When you choose a preference, you can directly get the specific investment allocation plan.
+
[[#img-4|Figure 4]] is an investment weight allocation chart for different risk appetites. When we choose an abscissa, it corresponds to a portfolio. Of course, we can also directly calculate the specific weight distribution data from the model. But in the form of a graph, it is more intuitive to see the difference in portfolio schemes under different risk preferences. The specific manifestation is that the investment ratio of each stock is different. When you choose a preference, you can directly get the specific investment allocation plan.
  
 
==5. Conclusion==
 
==5. Conclusion==
Line 466: Line 473:
 
==References==
 
==References==
  
[1] Wenjing Ouyang, Samuel H. Szewczyk. Stock price informativeness on the sensitivity of strategic M&A investment to Q[J]. Review of Quantitative Finance & Accounting, 2018, 50(3):745-774.
+
<div class="auto" style="text-align: left;width: auto; margin-left: auto; margin-right: auto;font-size: 85%;">
  
[2] Chava, S., Wang, R., & Zou, H. Covenants, Creditors’ Simultaneous Equity Holdings, and Firm Investment Policies. Journal of Financial and Quantitative Analysis, 2019,54(2), 481-512.
+
[1] Ouyangn W., Szewczyk S.H. Stock price informativeness on the sensitivity of strategic M&A investment to Q. Review of Quantitative Finance & Accounting, 50(3):745-774, 2018.
  
[3] Han-ding, ZHANG, Yin-xian. Investment risk evaluation of existing building energy-saving renovation project for ESCO[J]. Ecological Economy, 2018(3):180-189.
+
[2] Chava S., Wang R., Zou H. Covenants, creditors’ simultaneous equity holdings, and firm investment policies. Journal of Financial and Quantitative Analysis, 54(2):481-512, 2019.
  
[4] Huiqi Gan. Does CEO managerial ability matter? Evidence from corporate investment efficiency[J]. Review of Quantitative Finance & Accounting, 2019, 52(4):1085-1118.
+
[3] Guo H., Zhang Y., Wu S., Shang L. Investment risk evaluation of existing building energy-saving renovation project for ESCO. Ecological Economy, 27(3):180-189, 2018.
  
[5] Ferrando, Annalisa, Preuss, Carsten. What finance for what investment? Survey-based evidence for European companies[J]. Eib Working Papers, 2018(5):1-39.
+
[4] Huiqi Gan. Does CEO managerial ability matter? Evidence from corporate investment efficiency. Review of Quantitative Finance & Accounting, 52(4):1085-1118, 2019.
  
[6] Muhittin A. Serdar, Mustafa Serteser, Yasemin Ucal, etc. An Assessment of HbA1c in Diabetes Mellitus and Pre-diabetes Diagnosis: a Multi-centered Data Mining Study[J]. Applied Biochemistry and Biotechnology, 2019(Suppl1):1-13.
+
[5] Ferrando A., Preuss C. What finance for what investment? Survey-based evidence for European companies. Econ. Polit., 35:1015–1053, 2018.
  
[7] Sorensen E H. Miller K L, Ooi C K. The decision tree approach to stock selection-An evolving tree model performs the best[J]. Journal of Portfolio Management. 2000,27(1):42-52.
+
[6] Serdar M.A., Serteser M., Ucal Y., etc. An assessment of HbA1c in diabetes mellitus and pre-diabetes diagnosis: a multi-centered data mining study. Applied Biochemistry and Biotechnology, 190(Suppl1):1-13, 2019.
  
[8] Piotroski, Joseph D . Value Investing: The Use of Historical Financial Statement Information to Separate Winners from Losers[J]. Journal of Accounting Research, 2001, 38(2):43-51.
+
[7] Sorensen E.H. Miller K.L., Ooi C.K. The decision tree approach to stock selection-An evolving tree model performs the best. Journal of Portfolio Management, 27(1):42-52, 2000.
  
[9] Fama E F , French K R . A Five-factor Asset Pricing Model[J]. Journal of Financial Economics, 2015,116(1):1-22.
+
[8] Piotroski J.D. Value investing: The use of historical financial statement information to separate winners from losers. Journal of Accounting Research, 38(2):43-51, 2001.
  
[10] Jigar Patel,Sahil Shah,Priyank Thakkar,K Kotecha. Predicting stock and stock price index movement using Trend Deterministic Data Preparation and machine learning technique[J]. Expert Systems with Applications.2015,42(1):259-268.
+
[9] Fama E.F., French K.R. A five-factor asset pricing model. Journal of Financial Economics, 116(1):1-22, 2015.
  
[11] Pernilla Svefors, Oleg Sysoev, Eva-Charlotte Ekstrom,etc. Relative importance of prenatal and postnatal determinants of stunting: data mining approaches to the MINIMat cohort, Bangladesh[J]. BMJ Open, 2019, 9(8):e025154.
+
[10] Patel J., Shah S., Thakkar P., Kotecha K. Predicting stock and stock price index movement using Trend Deterministic Data Preparation and machine learning technique. Expert Systems with Applications, 42(1):259-268, 2015.
  
[12] Alireza Arabameri, Biswajeet Pradhan, Khalil Rezaei. Spatial prediction of gully erosion using ALOS PALSAR data and ensemble bivariate and data mining models[J]. Geosciences Journal, 2019, 1:1-18.
+
[11] Svefors P., Sysoev O., Ekstrom E.C., et al. Relative importance of prenatal and postnatal determinants of stunting: data mining approaches to the MINIMat cohort Bangladesh. BMJ Open, 9(8):e025154, 2019.
  
[13] Yali Dong, Huimin Wang. Robust Output Feedback Stabilization for Uncertain Discrete-Time Stochastic Neural Networks with Time-Varying Delay[J]. Neural Processing Letters, 2019:1-21.
+
[12] Arabameri A., Pradhan B., Rezaei K. Spatial prediction of gully erosion using ALOS PALSAR data and ensemble bivariate and data mining models. Geosciences Journal, 23:669–686, 2019.
  
[14] Meng-Xiao Li, Su-Qin Yu, Wei Zhang. Segmentation of retinal fluid based on deep learning: application of three-dimensional fully convolutional neural networks in optical coherence tomography images[J]. International Journal of Ophthalmology, 2019, 12(6):1012-1020.
+
[13] Dong Y., Wang H. Robust output feedback stabilization for uncertain discrete-time stochastic neural networks with time-varying delay. Neural Processing Letters, 51:83–103, 2020.
  
[15] Marwin H. S. Segler, Mike Preuss, Mark P. Waller. Planning chemical syntheses with deep neural networks and symbolic AI[J]. Nature, 2018, 555(7698):604-610.
+
[14] Li M.X., Yu S.Q., Zhang W. Segmentation of retinal fluid based on deep learning: application of three-dimensional fully convolutional neural networks in optical coherence tomography images. International Journal of Ophthalmology, 12(6):1012-1020, 2019.
 +
 
 +
[15] Segler M.H.S., Preuss M., Waller M.P. Planning chemical syntheses with deep neural networks and symbolic AI. Nature, 555(7698):604-610, 2018.
 +
 
 +
</div>

Latest revision as of 15:56, 18 February 2021

Abstract

Quantitative investment is the process of establishing mathematical models using statistics, information technology, and mathematics to quantify and implement risks, returns, and traditional investment concepts. However, due to the backwardness of computing tools in the past, quantitative investment has not received much recognition. With the improvement of computer science and quantitative analysis theory, traditional fundamental analysis and the use of sampling statistical technology to build advanced mathematical models for investment analysis have failed to meet the requirements of investors. Therefore, the Quantitative investment strategies based on data mining technology are receiving more and more attention. In this paper, we uses MATLAB software to capture big data from financial and economic websites, and then uses neural network training models to predict the trend of stock changes, and finally establishes a suitable quantitative stock selection model. The simulation results show that only by using quantitative stock selection strategies to curb risks and selecting a suitable investment portfolio can achieve the ideal goals in the stock market.

Keywords: Quantitative investment, data mining, neural network, portfolio

1. Introduction

In recent years, due to the continuous development of the stock market, more and more attention is paid to the quantitative investment technology [1-3]. Quantitative investment system is becoming mature gradually. With the continuous improvement of the stock market rules, the number of listed stocks and their associated data are increasing. There is a lot of complex stock data containing useful information, which cannot be found through conventional methods. However,the data mining technology developed in recent years can help us mining data information from the vast number of stock data [4-6]. By analyzing these data, we can get the information we want. In terms of factor stock selection, some researchers have successfully proposed a quantitative stock selection model based on multiple factors [7,8].These systems can use quantitative methods to analyze some transaction data and financial indicators of listed companies. At the same time, they combine statistical testing methods to help investors find the most valuable investment portfolio. But while some methods are convenient and easy to operate, they ignore the issues of correlation and overlap between factors [9]. Using the shortest distance hierarchical clustering method, we can reduce the massive stock price series, which not only simplifies the workload, but also more intelligent. But the shortest distance method is easy to make the samples in the class more and more, so it is an extreme method. Jigar Patel compared four prediction models, including artificial neural network (ANN), support vector machine (SVM), random forest and Naive Bayes, and then got the optimal prediction model [10].

2. Basic theory and method

2.1 Data mining

Data mining is the process of extracting the hidden and unknown useful information and knowledge from a large amount of incomplete, noisy, fuzzy and random practical application data [11,12]. The core of data mining is to use algorithms to train the processed input and output data and obtain models. Then, the model is verified, so that the model can describe the relationship between data and input to a certain extent. Finally, the model is used to calculate the newly input data to obtain a new output which can be used for interpretation and application [13]. The content of data mining mainly includes association, regression, classification, clustering, prediction and diagnosis.

2.2 Principle of BP neural network

A typical BP neural network includes an input layer, one or more hidden layers, and an output layer. Its network structure is shown in Figure 1. The algorithm learning process of BP neural network is mainly composed of input forward propagation and error back propagation. In the forward propagation process, input samples are input from the input layer, processed by the hidden layer units, and the actual output value of each unit is calculated according to the weight and threshold. If the actual output value and the expected value reach a predetermined error range at this time, the learning process ends successfully. The back-propagation method is to adjust the weight through the network error in the back, and modify the weight matrix according to the actual output and the expected output to reduce the error of the neural network structure [14,15].

Draft Wang 861907375-picture- 1.svg
Figure 1. Structure of BP neural network model


First, we define the following variables and arguments. Input layer vector , hidden layer output vector , output layer output vector , expected value output vector , weighted connection matrix from input layer to hidden layer , matrix of weights from the hidden layer to the output layer . The specific implementation steps of the BP neural network are as follows:

Step 1. The initialization matrices and of the network are determined by the activation function range. We determine the maximum number of trainings and the learning accuracy value , and choose the activation function:

(1)


Step 2. Data preprocessing, we select sample data input, get the output of hidden layer and output layer :

(2)
(3)


Step 3. Calculating the error using the actual output value and the expected output value of the network:

(4)


Step 4. Calculating the partial derivative of the error function with respect to every neuron of the hidden layer and the output layer:

(5)
(6)


Step 5. Using the error signal to adjust the connection weight of each layer, let be the weight from the hidden layer to the output layer, and be the weight from the input layer to the hidden layer

(7)
(8)


Step 6. Calculating Global Error:

(9)


Step 7. The global error is compared with the precision value. If the global error is less than the given precision value, or the number of trainings exceeds the maximum number of times , the algorithm ends at this time; otherwise, the learning continues.

2.3 Simulation experiments to predict stocks

Data is the foundation of data mining. Many financial websites have rich and reliable transaction data, such as Yahoo, Sina and Tencent. Yahoo has an interface with MATLAB, so we use MATLAB to obtain these transaction data from Yahoo. The important function “fetch” in MATLAB is used as follows:

Data=fetch(Connect,’security’,’FromDate’,’ToDate’)

Among them, ‘Connect’ indicates the location where the data was obtained, such as Yahoo. ‘Security’ indicates which stock data to obtain. ‘FromDate’ is the start time of the specified time range. ‘ToDate’ is the end time of the specified time range. In this paper, we use this method to obtain the stocks of Shenzhen Stock Exchange from 1 to 1000 and save them in Excel. After the data is standardized, training samples and prediction samples are obtained. We then use the neural network model described in Section 2.2 to train the samples and implement predictions.

The model results in a sort table of all stocks, as shown in Table 1. The ranking is based on the data predicted by the last column, which can be understood as the probability of future growth of the stock. The effect of this result is that in the actual process of stock buying and selling, we can choose the top stocks to buy, and vice versa. This provides conditions for buying and selling in quantitative stock selection.

Table 1. Model prediction results (first 10 lines)
65 1 1 1 1 0.217464 0.689387 0.615622 0.933314 1.076462
802 0.649562 0.714952 0.590378 0.669138 0.533305 0.493489 0.119175 0.450005 0.995385
985 0.489474 0.388007 0.219643 0.032438 0.289402 0.922103 0.458649 0.370715 0.985637
582 0.350914 0.507703 0.590378 0.669138 0.58377 0.410922 0.118595 0.226798 0.940392
66 0.846695 0.593295 0.590378 0.669138 0.551252 0.670699 0.293865 0.605941 0.885136
751 1 1 0.87818 0.881371 0.332703 0.595813 0.626997 0.948292 0.88133
707 0 0.650724 0.302097 0.244671 0.699561 0.544556 0.236814 0.403214 0.830667
819 1 0.888569 0.87818 0.881371 0.613117 0.822664 0.666953 0.978776 0.826818
522 0.343439 0.942634 0.417467 0.456905 0.029334 0.000374 0.035146 0.607885 0.778539
521 0.710836 1 0.302097 0.244671 0.396943 0.315258 0.393372 0.913728 0.75364


In this experiment, we also use historical data to evaluate the model, and the verification method is full set verification. Figure 2 shows the accuracy and error rate of the model classification. Obviously, the accuracy is significantly higher than the error rate. In finance, it is not easy to achieve 72% accuracy. So, as long as the number of transactions is enough, the probability of profit is very considerable.

Draft Wang 861907375-image29.png
Wang 2020a 3767 Figura2b.png
Figure 2. Evaluation results of the model

3. Portfolio model

In this section, we build a portfolio model to determine the best weight for each stock investment. Suppose we want to invest in 8 stocks, just select the top 8 from the stock ranking table 1 given in the previous section.

Assume that the investor chooses sorts of securities to invest, and the proportion of various securities in the total investment is , which is represented by a vector as The yields are respectively, which is represented by a vector as The expected rate of return are which is represented by a vector as Then the yield of the securities investment portfolio is the weighted average of the yields of various securities:

(10)


Expected rate of yield is the weighted average of the expected rate of yield of various securities, namely:

(11)


We use the covariance to indicate the degree of correlation between the i-th security and the j-th security investment yield. In particular, Let be the covariance matrix of . That is

(12)


Then, the risk of the portfolio is

(13)


In order to minimize the investment risk as much as possible, we establish the following model:

(14)


Assuming the covariance matrix is a positive definite matrix, let

(15)


Then, the portfolio model can be transformed into

(16)


Constructing the Lagrange multiplier function , where .

Let that is

(17)


Therefore, is the optimal portfolio weight for a given expected rate of return. Under this weight, the risk of the portfolio is minimized, which is

(18)

4. Simulation results and analysis

The proposed portfolio theoretical model is verified and simulated by MATLAB software. Now we are ready to invest in 8 stocks, just select the top 8 from the stock ranking table 1 given in the previous section, which are recorded as respectively. The simulation results are shown in Figures 3 and 4.

Wang 2020a 2053 Figura3.png
Figure 3. Effective frontier curve
Wang 2020a 2972 Figura4.png
Figure 4. Distribution of investment weight


Here, we need to focus on Figure 3. With this chart, we can easily see the distribution curve of risk and return. This will provide us with a basis for deciding which set of portfolios to choose. When we choose a point on the curve, we get a set of investment weights. If you are an investor who seeks high returns without fear of high risks, you can choose the top set of portfolios. Of course, most people will choose a relatively compromise solution, that is, the benefits are greater, but the risks can be tolerated.

Figure 4 is an investment weight allocation chart for different risk appetites. When we choose an abscissa, it corresponds to a portfolio. Of course, we can also directly calculate the specific weight distribution data from the model. But in the form of a graph, it is more intuitive to see the difference in portfolio schemes under different risk preferences. The specific manifestation is that the investment ratio of each stock is different. When you choose a preference, you can directly get the specific investment allocation plan.

5. Conclusion

In the field of quantitative investment, investors' attention has been paid to quantitative stock selection strategies based on data mining technology. For investors, the key is to design good indicators and improve the accuracy of the model, thereby improving the profitability of the model and maximizing the potential of the data and model. Based on the observation and analysis of the Beidou navigation plate, the stocks with the most investment value in the plate were finally selected. While selecting better stocks, using quantitative timing strategies to suppress risks, and then selecting a suitable investment portfolio, in order to achieve the ideal goal of high returns and low risks in the stock market.

Acknowledgement

This work has been partially supported by the Key projects of natural science research of the higher education institutions of Anhui (grant no. KJ2016A530).

References

[1] Ouyangn W., Szewczyk S.H. Stock price informativeness on the sensitivity of strategic M&A investment to Q. Review of Quantitative Finance & Accounting, 50(3):745-774, 2018.

[2] Chava S., Wang R., Zou H. Covenants, creditors’ simultaneous equity holdings, and firm investment policies. Journal of Financial and Quantitative Analysis, 54(2):481-512, 2019.

[3] Guo H., Zhang Y., Wu S., Shang L. Investment risk evaluation of existing building energy-saving renovation project for ESCO. Ecological Economy, 27(3):180-189, 2018.

[4] Huiqi Gan. Does CEO managerial ability matter? Evidence from corporate investment efficiency. Review of Quantitative Finance & Accounting, 52(4):1085-1118, 2019.

[5] Ferrando A., Preuss C. What finance for what investment? Survey-based evidence for European companies. Econ. Polit., 35:1015–1053, 2018.

[6] Serdar M.A., Serteser M., Ucal Y., etc. An assessment of HbA1c in diabetes mellitus and pre-diabetes diagnosis: a multi-centered data mining study. Applied Biochemistry and Biotechnology, 190(Suppl1):1-13, 2019.

[7] Sorensen E.H. Miller K.L., Ooi C.K. The decision tree approach to stock selection-An evolving tree model performs the best. Journal of Portfolio Management, 27(1):42-52, 2000.

[8] Piotroski J.D. Value investing: The use of historical financial statement information to separate winners from losers. Journal of Accounting Research, 38(2):43-51, 2001.

[9] Fama E.F., French K.R. A five-factor asset pricing model. Journal of Financial Economics, 116(1):1-22, 2015.

[10] Patel J., Shah S., Thakkar P., Kotecha K. Predicting stock and stock price index movement using Trend Deterministic Data Preparation and machine learning technique. Expert Systems with Applications, 42(1):259-268, 2015.

[11] Svefors P., Sysoev O., Ekstrom E.C., et al. Relative importance of prenatal and postnatal determinants of stunting: data mining approaches to the MINIMat cohort Bangladesh. BMJ Open, 9(8):e025154, 2019.

[12] Arabameri A., Pradhan B., Rezaei K. Spatial prediction of gully erosion using ALOS PALSAR data and ensemble bivariate and data mining models. Geosciences Journal, 23:669–686, 2019.

[13] Dong Y., Wang H. Robust output feedback stabilization for uncertain discrete-time stochastic neural networks with time-varying delay. Neural Processing Letters, 51:83–103, 2020.

[14] Li M.X., Yu S.Q., Zhang W. Segmentation of retinal fluid based on deep learning: application of three-dimensional fully convolutional neural networks in optical coherence tomography images. International Journal of Ophthalmology, 12(6):1012-1020, 2019.

[15] Segler M.H.S., Preuss M., Waller M.P. Planning chemical syntheses with deep neural networks and symbolic AI. Nature, 555(7698):604-610, 2018.

Back to Top

Document information

Published on 30/03/20
Accepted on 25/03/20
Submitted on 18/02/20

Volume 36, Issue 1, 2020
DOI: 10.23967/j.rimni.2020.03.006
Licence: CC BY-NC-SA license

Document Score

0

Views 694
Recommendations 0

Share this document