Página 15 - ANAlitica6

A Review of Artificial Neural Networks: How Well Do They Perform in Forecasting Time Series?

Analíti a

k

6

Revista de Análisis Estadístico

Journal of Statistical Analysis

Figura 2.

IBM daily closing prices and returns. Source: Economa-

tica.

Step 1:

Parameter values. The optimal values of the para-

meters depend on the application and they are not easy to

determine

a priori

. In this case, they are chosen according

to similar studies

η

=

0,3 and the momentum term

=

0,2

(Pérez-Rodríguez

et al

. 2005; Theofilatos

et al

. 2010).

Step 2:

Size of the training. According to the desired ac-

curacy on the test set, it is suggested that

P

≥ |

w

|

1

−

a

log

n

1

−

a

where

P

denotes the size of the training set,

|

w

|

denotes the

numbers of weights to be trained,

a

stands for the expected

accuracy on the test (in our case, the value is given by 0.95),

and is the number of nodes (Mehrotra

et al

. 2000).

Step 3:

We now analyze the error among the different net-

work architectures. The empirical evidence suggests that

when the size of the networks is increasing, the training

performance improves; however, if the one-lag and multi-

lag performances deteriorate, the networks are oversized.

The process may stop (e.g. establish an acceptable error) or

continue until it is found the “best” network. If the last pro-

cedure is chosen, we can compare the training error with

the forecast error (one-lag testing + multilag testing) and

choose the network just before the forecast error increases

according to the training error.

In Table 2 is presented the mean squared error from dif-

ferent network architectures. The first column contains the

training error which decreases when the size of the net-

work increases. The last column contains the forecast error

which is always bigger than the training error but at some

point the forecast error begins to have a greater difference

in relation to the training error, which is shown in Figu-

re 3. At this point the training error continues decreasing

because the network is larger than required so the network

memorize information and gradually lose the ability to res-

pond to new information (forecast error). According to the

obtained results the “best” network architectures is 6-8-1

(the series is roughly explained from the lag number 6 with

8 hidden layers) with a training error of .000486 and a fore-

cast error of .000507 while the remainder of the results are

discarded (a lot of time and information is wasted).

Tabla 2.

Mean Squared Errors. Source: author elaboration.

1

2

1+2

Network

architectures

a

Training One-lag Testing Multilag Testing Forecast error

3

2-1

.006402

.007236

.005764

0.006500

3-1

.004078

.004721

.003574

0.004148

4-1

.002949

.003478

.002522

0.003000

5-1

.002322

.002794

.001945

0.002370

6-1

.001907

.002342

.001567

0.001955

7-1

.001619

.002010

.001306

0.001658

4

6-1

.001457

.001838

.001161

0.001500

7-1

.001243

.001591

.000971

0.001281

5

6-1

.001182

.001520

.000917

0.001219

7-1

.000735

.000996

.000529

0.000763

6

8-1

.000486

.000692

.000322

0.000507

7

9-1

.000336

.000505

.000321

0.000413

10-1 .000268

.000414

.000319

0.000367

8 10-1 .000254

.000397

.000308

0.000353

10

9-1

.000250

.000391

.000276

0.000334

10-1 .000189

.000307

.000264

0.000286

11-1 .000165

.000274

.000235

0.000255

a

Input-hidden-output nodes

Analítika,

Revista de análisis estadístico

, 3 (2013), Vol. 6(2): 7-15

13