A Review of Artificial Neural Networks: How Well Do They Perform in Forecasting Time Series?
Analíti a
k
6
Revista de Análisis Estadístico
Journal of Statistical Analysis
formances of ANN in terms of their result in applications
taking as a benchmark the MLP. In section 6, we present an
application. Finally, in section 7, we conclude.
2 The Multilayer Perceptron
The neuron (or node) is the basic unit of a neural net-
work. In the case of the MLP, it includes an input layer
(that does not do any processing), one output layer and at
least one hidden layer. The layers consist of a set of nodes;
in the case of the hidden layer its inputs come from units
in the previous layer and send its outputs to the next la-
yer. The input and output layers indicate the flow of infor-
mation during the training phase where the learning algo-
rithm is implemented. The MLP generally learns by means
of a backpropagation algorithm, which is basically a gra-
dient technique. It has also been implemented variants of
the algorithm to work on the problem of slow convergen-
ce (for example, momentum term, see Haykin, 1994). Once
the trained process is carried out, the network weights are
frozen and can be used to compute output values for new
input samples. In what follows, we provide a brief expla-
nation of the backpropagation algorithm.
The network learning is a process in which the weights,
w
, are adapted by a continuous interaction
(
k
)
with the en-
vironment, in such a way that
w
nj
(
k
+
1
) =
w
nj
(
k
) +
∆
w
nj
(
k
)
where
w
(
k
)
is the previous value of the weight vector and
w
(
k
+
1
)
is the updated value. The learning algorithm is a
set of rules to solve the learning problem and determine
the values
w
nj
(
k
)
.
One of the most important algorithms is that of the
error correction. Consider the
n
-th neuron in the iteration.
Let
y
n
be the response of this neuron;
x
(
k
)
is the vector of
environment stimuli, and
{
x
(
k
)
,
d
n
(
k
)
}
is the pair of trai-
ning. Define the following error signal equation:
e
n
(
k
) =
d
n
(
k
)
−
y
n
(
k
)
The objective is to minimize the cost function (criterion)
which takes into account this error. After selecting the cri-
teria, the problem of error correction learning becomes one
of optimization. Consider a function
ǫ
(
w
)
, which is a conti-
nuously differentiable function of a weight vector. The fun-
ction
ǫ
(
w
)
transforms the elements from
w
to real numbers.
We need to find an optimal solution
w
∗
that satisfies the
condition:
ǫ
(
w
∗
)
≤
ǫ
(
w
)
.
Then it is necessary to solve an optimization problem wit-
hout constraints posed as: the cost function minimization
e
(
w
)
with respect to the weight
vector
. The necessary con-
dition for optimality is given by:
∇
ǫ
(
w
∗
) =
0
where
∇
is the gradient operator. An important class of
optimization algorithms without constraints is based on
the idea of iterative descent (gradient descent method and
Newton’s method). Starting with an initial condition
w
(
0
)
,
it generates a sequence
w
(
1
)
,
w
(
2
)
, . . ., such that the cost
function
ǫ
(
w
)
decreases in every algorithm iteration. It is
desirable that the algorithm eventually converge in to the
optimal solution in such a way that
ǫ
(
w
(
k
+
1
))
<
ǫ
(
w
(
k
))
In the descent gradient method, the successive adjustments
are applied to the weight vector, in the direction of the gra-
dient descent. For convenience, we will use the following
notation:
g
=
∇
ǫ
(
w
)
.
The gradient descent algorithm can be written formally as:
w
(
k
+
1
) =
w
(
k
)
−
η
g
(
k
)
where
η
is a positive constant called the learning rate, and
g
(
k
)
is the gradient vector evaluated at
w
(
k
)
. Therefore, the
correction applied to the weight vector can be written as:
∆
w
(
k
) =
w
(
k
+
1
)
−
w
(
k
) =
−
η
g
(
k
)
.
This method converges slowly to an optimal solution
w
∗
.
However, the learning rate has a larger impact on this con-
vergent behavior. When
η
is small, the path of
w
(
k
)
, over
the plane
W
is smooth. When
η
is large, the path of
w
(
k
)
over the plane
W
is oscillatory, and when
η
exceeds a cer-
tain critical value, the path
w
(
k
)
over the plane
W
becomes
unstable. Thus, the backpropagation algorithm is a tech-
nique to implement the method of descent gradient in a
weight space for a multilayer network. The basic idea is to
efficiently calculate the partial derivatives of an approxi-
mate function of the behavior by the neural network with
respect to all the elements of the adjustable vector of para-
meters
w
for a given value of the input vector
x
.
3 Types of ANNs
The specialized literature identifies several groups of
networks used as approximators and/or classifiers. This
section provides a classification in terms of the general cha-
racteristics of the ANNs.
1. In the first group, we can find the Feedforward Net-
works (FFNs), like MLP. Its main feature is that their
connection is forward so they do not establish any con-
nections between the nodes on the same layer or pre-
vious nodes. The networks that share this feature are:
The Radial Basic Function (RBF) (Bildirici
et al
. 2010;
Dhamija and Bhalla, 2011; Cheng, 1996); the Genera-
lized Regression Neural Network (GRNN) (Enke and
Thawornwong, 2005; Mostafa, 2010); the Group Met-
hod of Data Handling Network (GMDHN) (Pham and
Lui, 1995); the Probabilistic Neural Network (PNN) (En-
ke and Thawornwong, 2005; Thawornwong and Enke,
2004); the Dynamic Neural Network (DNN) (Guresen,
Kayakutlu and Daim, 2010) and the Cerebellar Model
Articulation Controller (CMAC) (Chen, 1996).
Analítika,
Revista de análisis estadístico
, 3 (2013), Vol. 6(2): 7-15
9