A Review of Artificial Neural Networks: How Well Do They Perform in Forecasting Time Series?
Analíti a
k
6
Revista de Análisis Estadístico
Journal of Statistical Analysis
c) The sample is typically divided into two stages (trai-
ning and testing). Furthermore, it is necessary to esta-
blish the number of nodes in the hidden layer before
starting the training stage. An alternative to overco-
me these problems is to apply the GRNN, because it
does not require estimating the number of nodes in the
hidden layer and all the available information can be
used for the network training, therefore no early stop-
ping technique is required during its training (Enke
and Thawornwong, 2005; Leung
et al
. 2000; Mostafa,
2010). Also the DNN or the PRNN do not need training
or early stopping techniques (Enke and Thawornwong,
2005; Thawornwong and Enke, 2004).
d) In the MLP the processing nodes are located in the hid-
den and output layers sharing the same type of pro-
cessor (using it as a classifier, the processing nodes are
non-linear, but as an approximator the output node is
linear), while in the RBF the nodes in the hidden layer
have certain properties that help to different learning
purposes, which could provide a more accurate fore-
cast (Hutchinson, Lo and Poggio, 1994), or the SVM, in
which the choice of kernel function is a critical decision
for prediction efficiency (Kara
et al
. 2011).
e) During the training all weights are modified, and the-
refore, learning is slow. In contrast, RPNN have only
one set of weights in the layer to train, which facilita-
tes the learning process (Ghazali
et al
. 2007, 2009 and
2011). Another possibility could be the FLN (Hussain
et al
. 2008) or the Partially Connected Network (PCN),
which selects the connection between nodes randomly
(Chang
et al
. 2012).
f) In the MLP, it is necessary to define the range in the
initialization weights (usually very small) however,
there is no consensus for specific applications so it is
usually chosen at the designer’s discretion or having as
a reference similar applications. An alternative to this
limitation is to apply MDNs that share the same net-
work architecture but different ranges in the initializa-
tion weights (Adeodato
et al
. 2011; Zang and Berardi,
2001).
g) The MLP employs an algorithm which is basically a
gradient technique. It implies that the problem is non-
convex and its solution is a local minimum. SVM uses
the structural risk minimization theory so the problem
is a convex optimization problem, which means that
the optimal solution is global (Ince and Trafalis, 2006).
h) The MLP is characterized by more learning interferen-
ce for inputs distant from any training vector. A so-
lution for this problem is to use CMACN, which can
get one-step learning where MLP cannot (Lu and Wu,
2011).
5 Applications and Performance of
ANN
In what follows, Table 1 summarizes the information
according to the previous classification. On the basis of the
obtained results in thirty reviewed papers, we observe that
more than 40 % of the analyzed researches support the idea
that the MLP is the best network or at least it has the same
performance with respect to the proposal networks. With
regard to the investigations that were in favor of other mo-
dels (e.g. econometrics models) (Kodogiannis and Lolis,
2002; Yumlu and Gurense, 2005), we excluded them, and
analyzed only the performance of the different proposed
networks, but none of the cases the MLP stands.
The main idea of this review is to point out the ad-
vantages and delimitations of the MLP with respect to ot-
her available networks by comparing not only the learning
process, but also the architecture design. The issue of the
type of connections between the nodes (like RCNs suggest)
could not be so successful in several applications. The main
drawback associated with RCNs is that they need more ti-
me to learn than the standard networks because their out-
puts pass through the network more than once (depending
on the type of the RCNs) before the final output. Anot-
her issue is if apply or not in a network some optimization
technique, like in SVM, instead of gradient technique.
The types of networks that have shown superiority
over the MLP are the RBF, GRNN, MDNs and DRPNN.
Both RBF and GRNN are FFNs. In these types of networks
the training may be in terms of global or local basis fun-
ctions. The MLP applies a global basic function (usually
sigmoidal), and this function have non-negligible values
throughout all measurement space, so many iterations are
required to find a combination that has an acceptable error
in all parts of the measurement space for which training
data are available. On the other hand, GRNN and RBF are
based on a localized basic function, which provides an im-
portant advantage of instant learning. The GRNN is based
on the estimation of probability density functions, and RBF
is based on iterative function approximation. Although,
PRNN and GRNN are based on the estimation of probabi-
lity density functions, the reason that GRNN and RBF out-
perform in compared with PRNN could be the used of the
regression method (Chen, 1996).
The fact that the MDNs perform better than the MLP
is because they are more precise techniques for the initia-
lization weights, but when MDNs mix different architec-
tures sharing the weights range, the result is poor. In the
case of the DRPNN with respect to MLP, they have only a
single layer of learnable weights, so it will reduce the net-
work complexity. Therefore, PLNs are appropriated when
the number of inputs to the model and the training beco-
mes extremely large, so the training procedure for ordinary
networks like MLP becomes very slow. The fact that some
dynamic versions succeed (although it implies more con-
nections) is because their architecture is very simple. Anot-
her case (isolated) is the CMAC that performs better than
MLP or RBF, so MLP cannot elude the problem of slow
learning.
Analítika,
Revista de análisis estadístico
, 3 (2013), Vol. 6(2): 7-15
11