Página 15 - ANAlitica6

Versión de HTML Básico

A Review of Artificial Neural Networks: How Well Do They Perform in Forecasting Time Series?
Analíti a
k
6
Revista de Análisis Estadístico
Journal of Statistical Analysis
Figura 2.
IBM daily closing prices and returns. Source: Economa-
tica.
Step 1:
Parameter values. The optimal values of the para-
meters depend on the application and they are not easy to
determine
a priori
. In this case, they are chosen according
to similar studies
η
=
0,3 and the momentum term
=
0,2
(Pérez-Rodríguez
et al
. 2005; Theofilatos
et al
. 2010).
Step 2:
Size of the training. According to the desired ac-
curacy on the test set, it is suggested that
P
≥ |
w
|
1
a
log
n
1
a
where
P
denotes the size of the training set,
|
w
|
denotes the
numbers of weights to be trained,
a
stands for the expected
accuracy on the test (in our case, the value is given by 0.95),
and is the number of nodes (Mehrotra
et al
. 2000).
Step 3:
We now analyze the error among the different net-
work architectures. The empirical evidence suggests that
when the size of the networks is increasing, the training
performance improves; however, if the one-lag and multi-
lag performances deteriorate, the networks are oversized.
The process may stop (e.g. establish an acceptable error) or
continue until it is found the “best” network. If the last pro-
cedure is chosen, we can compare the training error with
the forecast error (one-lag testing + multilag testing) and
choose the network just before the forecast error increases
according to the training error.
In Table 2 is presented the mean squared error from dif-
ferent network architectures. The first column contains the
training error which decreases when the size of the net-
work increases. The last column contains the forecast error
which is always bigger than the training error but at some
point the forecast error begins to have a greater difference
in relation to the training error, which is shown in Figu-
re 3. At this point the training error continues decreasing
because the network is larger than required so the network
memorize information and gradually lose the ability to res-
pond to new information (forecast error). According to the
obtained results the “best” network architectures is 6-8-1
(the series is roughly explained from the lag number 6 with
8 hidden layers) with a training error of .000486 and a fore-
cast error of .000507 while the remainder of the results are
discarded (a lot of time and information is wasted).
Tabla 2.
Mean Squared Errors. Source: author elaboration.
1
2
1+2
Network
architectures
a
Training One-lag Testing Multilag Testing Forecast error
3
2-1
.006402
.007236
.005764
0.006500
3-1
.004078
.004721
.003574
0.004148
4-1
.002949
.003478
.002522
0.003000
5-1
.002322
.002794
.001945
0.002370
6-1
.001907
.002342
.001567
0.001955
7-1
.001619
.002010
.001306
0.001658
4
6-1
.001457
.001838
.001161
0.001500
7-1
.001243
.001591
.000971
0.001281
5
6-1
.001182
.001520
.000917
0.001219
7-1
.000735
.000996
.000529
0.000763
6
8-1
.000486
.000692
.000322
0.000507
7
9-1
.000336
.000505
.000321
0.000413
10-1 .000268
.000414
.000319
0.000367
8 10-1 .000254
.000397
.000308
0.000353
10
9-1
.000250
.000391
.000276
0.000334
10-1 .000189
.000307
.000264
0.000286
11-1 .000165
.000274
.000235
0.000255
a
Input-hidden-output nodes
Analítika,
Revista de análisis estadístico
, 3 (2013), Vol. 6(2): 7-15
13