Página 13 - ANAlitica6

A Review of Artificial Neural Networks: How Well Do They Perform in Forecasting Time Series?

Analíti a

k

6

Revista de Análisis Estadístico

Journal of Statistical Analysis

c) The sample is typically divided into two stages (trai-

ning and testing). Furthermore, it is necessary to esta-

blish the number of nodes in the hidden layer before

starting the training stage. An alternative to overco-

me these problems is to apply the GRNN, because it

does not require estimating the number of nodes in the

hidden layer and all the available information can be

used for the network training, therefore no early stop-

ping technique is required during its training (Enke

and Thawornwong, 2005; Leung

et al

. 2000; Mostafa,

2010). Also the DNN or the PRNN do not need training

or early stopping techniques (Enke and Thawornwong,

2005; Thawornwong and Enke, 2004).

d) In the MLP the processing nodes are located in the hid-

den and output layers sharing the same type of pro-

cessor (using it as a classifier, the processing nodes are

non-linear, but as an approximator the output node is

linear), while in the RBF the nodes in the hidden layer

have certain properties that help to different learning

purposes, which could provide a more accurate fore-

cast (Hutchinson, Lo and Poggio, 1994), or the SVM, in

which the choice of kernel function is a critical decision

for prediction efficiency (Kara

et al

. 2011).

e) During the training all weights are modified, and the-

refore, learning is slow. In contrast, RPNN have only

one set of weights in the layer to train, which facilita-

tes the learning process (Ghazali

et al

. 2007, 2009 and

2011). Another possibility could be the FLN (Hussain

et al

. 2008) or the Partially Connected Network (PCN),

which selects the connection between nodes randomly

(Chang

et al

. 2012).

f) In the MLP, it is necessary to define the range in the

initialization weights (usually very small) however,

there is no consensus for specific applications so it is

usually chosen at the designer’s discretion or having as

a reference similar applications. An alternative to this

limitation is to apply MDNs that share the same net-

work architecture but different ranges in the initializa-

tion weights (Adeodato

et al

. 2011; Zang and Berardi,

2001).

g) The MLP employs an algorithm which is basically a

gradient technique. It implies that the problem is non-

convex and its solution is a local minimum. SVM uses

the structural risk minimization theory so the problem

is a convex optimization problem, which means that

the optimal solution is global (Ince and Trafalis, 2006).

h) The MLP is characterized by more learning interferen-

ce for inputs distant from any training vector. A so-

lution for this problem is to use CMACN, which can

get one-step learning where MLP cannot (Lu and Wu,

2011).

5 Applications and Performance of

ANN

In what follows, Table 1 summarizes the information

according to the previous classification. On the basis of the

obtained results in thirty reviewed papers, we observe that

more than 40 % of the analyzed researches support the idea

that the MLP is the best network or at least it has the same

performance with respect to the proposal networks. With

regard to the investigations that were in favor of other mo-

dels (e.g. econometrics models) (Kodogiannis and Lolis,

2002; Yumlu and Gurense, 2005), we excluded them, and

analyzed only the performance of the different proposed

networks, but none of the cases the MLP stands.

The main idea of this review is to point out the ad-

vantages and delimitations of the MLP with respect to ot-

her available networks by comparing not only the learning

process, but also the architecture design. The issue of the

type of connections between the nodes (like RCNs suggest)

could not be so successful in several applications. The main

drawback associated with RCNs is that they need more ti-

me to learn than the standard networks because their out-

puts pass through the network more than once (depending

on the type of the RCNs) before the final output. Anot-

her issue is if apply or not in a network some optimization

technique, like in SVM, instead of gradient technique.

The types of networks that have shown superiority

over the MLP are the RBF, GRNN, MDNs and DRPNN.

Both RBF and GRNN are FFNs. In these types of networks

the training may be in terms of global or local basis fun-

ctions. The MLP applies a global basic function (usually

sigmoidal), and this function have non-negligible values

throughout all measurement space, so many iterations are

required to find a combination that has an acceptable error

in all parts of the measurement space for which training

data are available. On the other hand, GRNN and RBF are

based on a localized basic function, which provides an im-

portant advantage of instant learning. The GRNN is based

on the estimation of probability density functions, and RBF

is based on iterative function approximation. Although,

PRNN and GRNN are based on the estimation of probabi-

lity density functions, the reason that GRNN and RBF out-

perform in compared with PRNN could be the used of the

regression method (Chen, 1996).

The fact that the MDNs perform better than the MLP

is because they are more precise techniques for the initia-

lization weights, but when MDNs mix different architec-

tures sharing the weights range, the result is poor. In the

case of the DRPNN with respect to MLP, they have only a

single layer of learnable weights, so it will reduce the net-

work complexity. Therefore, PLNs are appropriated when

the number of inputs to the model and the training beco-

mes extremely large, so the training procedure for ordinary

networks like MLP becomes very slow. The fact that some

dynamic versions succeed (although it implies more con-

nections) is because their architecture is very simple. Anot-

her case (isolated) is the CMAC that performs better than

MLP or RBF, so MLP cannot elude the problem of slow

learning.

Analítika,

Revista de análisis estadístico

, 3 (2013), Vol. 6(2): 7-15

11