Página 11 - ANAlitica6

A Review of Artificial Neural Networks: How Well Do They Perform in Forecasting Time Series?

Analíti a

k

6

Revista de Análisis Estadístico

Journal of Statistical Analysis

formances of ANN in terms of their result in applications

taking as a benchmark the MLP. In section 6, we present an

application. Finally, in section 7, we conclude.

2 The Multilayer Perceptron

The neuron (or node) is the basic unit of a neural net-

work. In the case of the MLP, it includes an input layer

(that does not do any processing), one output layer and at

least one hidden layer. The layers consist of a set of nodes;

in the case of the hidden layer its inputs come from units

in the previous layer and send its outputs to the next la-

yer. The input and output layers indicate the flow of infor-

mation during the training phase where the learning algo-

rithm is implemented. The MLP generally learns by means

of a backpropagation algorithm, which is basically a gra-

dient technique. It has also been implemented variants of

the algorithm to work on the problem of slow convergen-

ce (for example, momentum term, see Haykin, 1994). Once

the trained process is carried out, the network weights are

frozen and can be used to compute output values for new

input samples. In what follows, we provide a brief expla-

nation of the backpropagation algorithm.

The network learning is a process in which the weights,

w

, are adapted by a continuous interaction

(

k

)

with the en-

vironment, in such a way that

w

nj

(

k

+

1

) =

w

nj

(

k

) +

∆

w

nj

(

k

)

where

w

(

k

)

is the previous value of the weight vector and

w

(

k

+

1

)

is the updated value. The learning algorithm is a

set of rules to solve the learning problem and determine

the values

w

nj

(

k

)

.

One of the most important algorithms is that of the

error correction. Consider the

n

-th neuron in the iteration.

Let

y

n

be the response of this neuron;

x

(

k

)

is the vector of

environment stimuli, and

{

x

(

k

)

,

d

n

(

k

)

}

is the pair of trai-

ning. Define the following error signal equation:

e

n

(

k

) =

d

n

(

k

)

−

y

n

(

k

)

The objective is to minimize the cost function (criterion)

which takes into account this error. After selecting the cri-

teria, the problem of error correction learning becomes one

of optimization. Consider a function

ǫ

(

w

)

, which is a conti-

nuously differentiable function of a weight vector. The fun-

ction

ǫ

(

w

)

transforms the elements from

w

to real numbers.

We need to find an optimal solution

w

∗

that satisfies the

condition:

ǫ

(

w

∗

)

≤

ǫ

(

w

)

.

Then it is necessary to solve an optimization problem wit-

hout constraints posed as: the cost function minimization

e

(

w

)

with respect to the weight

vector

. The necessary con-

dition for optimality is given by:

∇

ǫ

(

w

∗

) =

0

where

∇

is the gradient operator. An important class of

optimization algorithms without constraints is based on

the idea of iterative descent (gradient descent method and

Newton’s method). Starting with an initial condition

w

(

0

)

,

it generates a sequence

w

(

1

)

,

w

(

2

)

, . . ., such that the cost

function

ǫ

(

w

)

decreases in every algorithm iteration. It is

desirable that the algorithm eventually converge in to the

optimal solution in such a way that

ǫ

(

w

(

k

+

1

))

<

ǫ

(

w

(

k

))

In the descent gradient method, the successive adjustments

are applied to the weight vector, in the direction of the gra-

dient descent. For convenience, we will use the following

notation:

g

=

∇

ǫ

(

w

)

.

The gradient descent algorithm can be written formally as:

w

(

k

+

1

) =

w

(

k

)

−

η

g

(

k

)

where

η

is a positive constant called the learning rate, and

g

(

k

)

is the gradient vector evaluated at

w

(

k

)

. Therefore, the

correction applied to the weight vector can be written as:

∆

w

(

k

) =

w

(

k

+

1

)

−

w

(

k

) =

−

η

g

(

k

)

.

This method converges slowly to an optimal solution

w

∗

.

However, the learning rate has a larger impact on this con-

vergent behavior. When

η

is small, the path of

w

(

k

)

, over

the plane

W

is smooth. When

η

is large, the path of

w

(

k

)

over the plane

W

is oscillatory, and when

η

exceeds a cer-

tain critical value, the path

w

(

k

)

over the plane

W

becomes

unstable. Thus, the backpropagation algorithm is a tech-

nique to implement the method of descent gradient in a

weight space for a multilayer network. The basic idea is to

efficiently calculate the partial derivatives of an approxi-

mate function of the behavior by the neural network with

respect to all the elements of the adjustable vector of para-

meters

w

for a given value of the input vector

x

.

3 Types of ANNs

The specialized literature identifies several groups of

networks used as approximators and/or classifiers. This

section provides a classification in terms of the general cha-

racteristics of the ANNs.

1. In the first group, we can find the Feedforward Net-

works (FFNs), like MLP. Its main feature is that their

connection is forward so they do not establish any con-

nections between the nodes on the same layer or pre-

vious nodes. The networks that share this feature are:

The Radial Basic Function (RBF) (Bildirici

et al

. 2010;

Dhamija and Bhalla, 2011; Cheng, 1996); the Genera-

lized Regression Neural Network (GRNN) (Enke and

Thawornwong, 2005; Mostafa, 2010); the Group Met-

hod of Data Handling Network (GMDHN) (Pham and

Lui, 1995); the Probabilistic Neural Network (PNN) (En-

ke and Thawornwong, 2005; Thawornwong and Enke,

2004); the Dynamic Neural Network (DNN) (Guresen,

Kayakutlu and Daim, 2010) and the Cerebellar Model

Articulation Controller (CMAC) (Chen, 1996).

Analítika,

Revista de análisis estadístico

, 3 (2013), Vol. 6(2): 7-15

9