Artificial Neural Networks (ANNs)

PDFPrintE-mail

Article Index
Artificial Neural Networks (ANNs)
หน้า2
All Pages

MEart1

"This chapter deals with a new enhanced computer technology, which is a branch of artificial intellig"

Introduction

This chapter deals with a new enhanced computer technology, which is a branch of artificial intelligence, called Artificial Neural Networks (ANNs). Different terms and definitions are given. Basic concepts of ANNs are explained, i.e. neural cells or neurons, perceptron, Hopfield net. Next, various aspects of the ANNs' modelling are described, i.e. the processes, selecting and representing the variables, hidden layers and nodes, weights and biases, summation and transformation function, learning rate and momentum. Training process is also presented, i.e. definitions, methods, back-propagation training algorithm and Generalized Delta Rule (GDR), and updating the network. The samples used for network modelling are discussed, i.e., methods of sampling, amount of samples, and how they will be fed into the network during training and testing. After that, the testing process is explained. The following part explains the outputs obtained from the ANNs. Then, advantages and awareness of using the ANNs are addressed. Then, researches, developments in ANNs, as well as application of the ANNs are expressed. Finally, this Chapter summarizes the possibility to use the ANNs as a new approach and tool for pre-design estimating of construction costs and duration.

Definitions

Artificial neural networks (ANNs) have some different names which are: 1) connectionist models;

2) parallel distributed processing models; 3) neuromorphic systems; and 4) neural computing. They can be defined by any or a combination among the followings which also shows their properties.

1) ANNs models are composed of many non-linear computational elements, operating in parallel and arranged in patterns reminiscent of biological neural nets (Lippmann, 1987).

2) ANNs are paralleled, distributed information processing structure consisting of processing elements which can possess a local memory and can carry out localized information processing operations. They are interconnected via unidirectional signal channels called connections. Each processing element has a single output connection that branched or fans out into as many collateral connections as desired (Nielsen, 1989).

3) ANNs are types of information processing system whose architectures are inspired by the structure of human biological neural systems (Caudill and Butler, 1990).

4) Neural networks concentrate on machine learning which was based on the concept of self-adjustment of internal control parameters. The artificial neural network environment consist of five primary components; learning domain, neural nets, learning strategies, learning process, and analysis process (Adeli, 1992).

5) Artificial neural net is a kind of machine learning. It is a computational procedure, and composed of simple elementary functions such as summation and multiplication (Arciszewski and Ziarko, 1992).

6) ANNs are information processing technology inspired by studies of the brain and nervous system. They composed of a collection of neurons (or nodes or processing elements, or units) which are grouped in layers. They accept several inputs, perform a series of operations on them, and produce one or more outputs. They are similar to a subroutine that works best in classifying, modelling and forecasting (Klimasauskas, 1993).

7) ANNs are collections of simple computational elements called neurons that are interconnected (Berry and Trigueros, 1993).

8) ANNs are models that emulate a biological neural network. They compose of artificial neurons or neurons which are the processing elements-PEs. They are information processing technologies inspired by studies of the brain and nervous system. They are implementation of software simulation of massively parallel process involving processing elements interconnected in a network architecture (Medsker et al., 1993),

9) ANNs are composition of neurons or processing elements, and connections that are organized in layers (Salchenberger, et al., 1993).

10) ANNs are connectionist systems that have an ability to learn and generalize from examples, to provide meaningful solutions to problem even when input data contain errors or are incomplete. They can adapt solution overtime to compensate for changing circumstances. They process information rapidly and also transfer readily between computing systems (Flood and Kartam, 1994).

11) ANNs are computational devices constructed from a large number of parallel processing devices. Individually, the neurons perform trivial functions, but collectively, they are capable of solving very complicated problems. In other words, they are capable of learning from example, can infer solutions to problems beyond those to which they are exposed during training. They can provide meaningful answers even when the data to be processed include errors or are incomplete. They can process information extremely rapidly (Gagarin et al., 1994).

12) ANNs are AI software technology that represents objects or pieces of information as nodes and expresses relationships between them as links to provide a powerful and flexible way of representing knowledge (Paulson, 1995).

13) ANNs are computational models composed of many non-linear processing elements arranged in patterned similar to biological neuron networks. Typically, they have an activation value associated with each node and a weight value associated with each connection. An activation function governs the firing of nodes and the propagation of data through networks connections in massive parallelism. The networks can also be trained with examples through connection weight adjustments (Tan et al., 1996).

This research uses the name "neural networks" for the models, and the term "node" rather than neuron or neural cell. However, the neural networks and the related terms conform to all the definitions given above.

Basic Concepts of ANNs

1. Neural cells or neurons

Lippmann (1987), Chester (1993) and Kireetoh (1995) described the behavior of neural cells as living neural, receives multiple inputs from other neurons via branching input (afferent) path called "dendrites". The combined stimuli from these input signals activate a region called "axon hillock", where an outgoing (efferent) tendril called "axon" connects to the cell body. The axon then transmits the neuron's output to still other neurons through their dendrite. In some cases, the output that the neuron transmits along its axon goes directly to muscle of gland cells in order to activate or inhibit the functions those cells perform. The gap between an output axon of one neuron and the input dendrites of another is the location of synapses. Information transfer across a synapse is controlled by bio-chemical agents but a process that is modeled in electronic neurons by the changing of synaptic weight. It is estimated that the brain contains in the order of 1011 neurons, and 1014 to 1016 synaptic interconnections among these.

ANN technology is a branch of artificial intelligence (AI) that attempts to achieve human brain-like capability (Lippmann, 1987; Caudill and Butler, 1990; Klimasauskas, 1993; Medsker et al. 1993). Various kinds of the ANN structure are based on biological nervous system which can exhibit a surprising number of the brain's characteristics, e.g. learn from experience, generalize from previous examples to new problems by inferring solutions to problems beyond those to which they are exposed during training. They can provide meaningful answers even when the data to be processed include errors or are incomplete (Karunasekera, 1992; Hawlet et al., 1993; Mesker et al., 1993; Chao and Skibniewski, 1994; Flood and Kartam, 1994a, and 1994b; Gagarin et al., 1994). They can process information extremely rapidly when applied to solve real world problems. ANNs have been mobilized for building neuro-computing architectures in physical hardware that can think and act intelligently like human beings. ANNs can be built either by developing a neuro-computer called machine or neuro-software languages called programs (Forsyth, 1992; Mesker et al., 1993; Adeli, 1996).

2. Perceptron

A simple form of ANNs, called perceptron has been introduced by Rosenblatt in 1957. The perceptron means a simple network with only an input and output layer, or a neural network which has no hidden layer. Another definition of perceptron is the way to group together experiences that are similar, and how to differentiate them from dissimilar experiences (Smith, 1993). The perceptron is the theory of statistical separability which concerns mathematical analysis of the behavior of a class of network models (Rosenblalt, 1950s).

On the other hand, multi-layered perceptron means a feed-forward net with one or more layers of nodes between the input and output nodes. The additional layers contain hidden nodes that are not directly connected to both the input and output nodes. The multi-layer perceptron overcomes many limitations of single layer perceptron (Lippmann, 1987). Their capabilities stem from the non-linear relationships among the nodes.

3. Hopfield net

Hopfield net is a kind of network used with binary inputs. This is most appropriate when exact binary representation are possible as black-and-white images, yes-and-no answers or on-and-off switch. In 1982, Hopfield designed a neural network that revived the technology, bringing it out of the neural dark age of the 1970s (Chester. 1993). He devised an array of neurons that were fully interconnected with each neuron feeding its output to all others. The concept is that all the neurons will transmit signals back and forth to each other in a closed feedback loop until their states became stable. This concept does not make use of the feed forward mechanism of adjusting synaptic input weights of nodes in order to tune the outputs of those nodes as those presented in the perceptron. Instead, a Hopfield net makes feedback the central feature of the network.

ANNs Modeling

1. ANNs modelling processes

Neural networks concentrate on machine learning which is based on the concept of self-adjustment of internal control parameters. The artificial neural network environment consists of five primary components; learning domain, neural nets, learning strategies, learning process, and analysis process (Adeli, 1992). Accordingly, neural network based modelling process involves five main aspects which are: 1) data acquisition, analysis and problem representation; 2) architecture determination; 3) learning process determination; 4) training of the network; and 5) testing of the trained network for generalization evaluation (Wu and Lim, 1993). Elazouni et al. (1997) classified ANNs modelling into three main phases: 1) design; 2) implementation; and 3) recall or use for problem solving. The design phase consists of two tasks: problem analysis and problem structuring. The implementation consists of three main aspects: 1) acquiring the knowledge (including data collection); 2) selecting the network configuration; and 3) training and testing the network.

2. Selecting the variables

An ANN model consists of independent variables (or inputs) and dependent variables (or outputs). Selecting variables to be used in the model involves two considerations (Smith, 1993). First, the information might be transformed to make it more useful to the network. Second, selection among the transformed variables will be based on predictiveness and covariance. Basically, a selected independent variable is predictive if the dependent variables correlated with it. By contrast, if two independent variables are correlated with each other, the correlated inputs make the model more sensitive to the statistical peculiarities of particular samples. They accentuate the overfitting problem and limit generalization. For these reasons, the model should include only the independent variables that are predictive of the dependent variables but do not covary with each other or one another, regardless of what modeling technique is used. To minimize training samples and training time, only the major affecting factors which have strong influence on the specific problem should be considered in setting input nodes (Wu and Lim, 1993).

Yeh et al. (1993) outlined four criteria for selecting attributes and training examples. First, availability of attributes; by which attributes should be clearly observable without sophisticated experience, expensive cost, and long-time period. Second, unnecessary or insufficient condition of attributes which reduce the classification reliability must be avoided. Third, a good training set should contain common, unusual and rare cases, and such kind of training set cannot be obtained by random sampling from the problem domain. Fourth, the more training examples, the better learning results will be obtained.

3. Representing the variables

Smith (1993) explained that the way independent variables are represented by the input nodes of the network has a major impact on the training of the network and on the performance of the resulting model. The ability of the network is mainly referred to as its effectiveness in generalizing. The amount of computation and the time required for learning are both greatly influenced by the form of representation used. There are two types of independent and dependent variables: 1) quantitative; and 2) class variables. The quantitative or continuous valued variable can be any number. It is not necessary to fall within the bounds of the applied sigmoid function. It also is possible to scale or normalize the quantitative variables to some standard range such as 0 to 1, -1 to 1, or none (Smith, 1993; Yeh et al., 1993; Elazouni et al., 1997). Elazouni et al. (1997) opined that the networks usually provide improved performance when the data are normalized. It is necessary to avoid excessive generalization in which the network learns about examples at one extreme but applies to examples at the other extreme. One solution is to cut the variable up and represent it with several nodes. By doing this, the network can only generalize to examples that are reasonably close by. The lessons learned from each example during training are localized or limited. The two nodes between which the input values are located, will then be partially turned on. This attractive method is called "interpolation representation" (Hutchison, 1989; Smith, 1993). It is appropriate since no information about the precise value of the variable is lost. It also permits generalization because the values nearby have similar representations. On the other hand, the class variables are discrete, logical or symbolic states. The class variables use binary representation. One binary output can represent the black-and-white images, yes-and-no answers or on-and-off switch (Smith, 1993; Kireetoh, 1995). Pezeshk et al. (1996) used single binary output in a different way, e.g. zero and one to represent clay and sand.

For multiple outputs, the value 1 indicates that the object or event belongs to the class represented by that node, while the value 0 indicates that it does not. The number of nodes would not be equal to the number of classes (Smith, 1993). There may be one less node than there are classes. All the classes but one are represented by turning on the appropriate node, and the remaining class is represented by not turning on any node. This can reduce computational time. On the other hand, Chau et al. (1997) assigned the attributes of qualitative class dependent variables (outputs) in a different way, e.g. -2 to 2 were used to identify bad, slightly bad, average, slightly good, and good, respectively.

It is possible to mix both quantitative and class variables among the inputs to a single network (Smith, 1993). Such a mixture however, does raise an issue for the algorithmic implementation of the mathematics. Another alternative is to binarize the quantitative variable, represent it by using binary input nodes. There are two major problems with this approach, i.e. discrimination and generalization. This representation makes it impossible for the network to discriminate between examples whose values are within the sub-range of the same node. No binary representation of a quantity can completely resolve the problems of discrimination and generalization. However, this problems can be reduced by using "ensemble coding" whereby several nodes are turned on. Some are to represent broad ranges of values and help the network generalize while the others represent narrow ranges of values and help it discriminate.

4. Hidden Layer and hidden Node

In multi-layered perceptron, hidden layer means a third layer of processing elements or units in between the input and output layers that increases computational power. In principle, the hidden layer can be more than one layer. The network can approximate a target function of any complexity if it has enough hidden nodes. The hidden layers of nodes make multi-layered perceptron attractive as a statistical modeling tool (Lippmann, 1987; Karunasekera, 1992; Hawley et al., 1993; Khoshgoftaar and Lanning, 1995). The output of hidden nodes can be considered as a new variable, i.e. an input to the nodes on the next layer or the nodes on the output layer (or dependent variable). They contain interesting information about the relationship being modelled. The new variables fired from the nodes on the hidden layer, and along with the net topology are known as internal representations, and can make the modelling process self-explanatory. Consequently, the neural network approach is attractive as a form of machine learning (Berry and Trigueiros, 1993).

Too few hidden nodes (or too small networks) for a given problem will cause back-propagation not to converge to a solution (Karunasekera, 1992). However, many hidden nodes cause a much longer learning period. At some point, increasing the number of hidden nodes does not greatly increase the ability of the neural network to classify (William, 1993). On the other hand, too many units on a layer can make a network to become over specific, particularly on the extreme case where the number of units on the first processing layer is equal to the number of examples in the training set (Rumelhart, 1988). Too many hidden nodes can overfit, such that the network can model the accidental structure of the noise in the sample as well as the inherent structure of the target function (Smith, 1993). Therefore, minimum sized network which uses as few hidden units as possible is important for efficient classification and good generalization.(Khan et al., 1993). Berke and Hajela (1991) suggested that the number of hidden nodes should be between the average and the sum of nodes on the input and output layers. Rogers and Ramarsh (1992) suggested that a good initial guess for hidden nodes is to take the sum of nodes on the input and output layers. Soemardi (1996) suggested that the number of hidden nodes should be 75% of the of input nodes. Thus, experience shows that the number of hidden nodes have a maximum limit of the sum of the input and output nodes but the minimum could be either 75% of the input nodes or the average of the input and output nodes.

5. Weights and biases

Weights are defined as the strength of input connections which are expressed by a real number. The processing nodes receive inputs through links. Each link has a weight attached to it. The sum of the weights make up a value that updates the processing nodes, the output excitation to get either on or off. The weights are the relative strength (mathematical value) of the initial entering data or the various connections that transfer data from layer to layer (Medsker et al., 1993). They are the relative importance of each input to a processing element (Medsker et al., 1993). In practice, the weights would be initiated and assigned to the network prior to the start of training. The weights initiation techniques are also important in order to control and obtain the convergence of training and training time. For each network, the number of unknowns is equal to the sum of the weights and biases. For a given network, the number of weights is the product of the number of nodes on all links, and the number of biases is the sum of numbers of all the nodes.

6. Summation and transformation function

Summation function is a function which finds the weighted average of all inputs elements (or nodes) to each processing elements (or nodes). It simply multiplies the input values by the weights and totals them up for a weighted sum. The transformation function (or transfer function or local memory) is a relationship between the internal activation level (N) of the neuron (called activation function) and the outputs. The transformation function is a kind of sigmoid function. A function f(N) will be a sigmoid function if it has two certain characteristics: 1) it is bounded; and 2) the value of a sigmoid function always increases as N increases (Smith, 1993). A number of different functions have these characteristics and thus qualify as sigmoid functions. Any of them may be used in the neural network. Usually, the micro-computer with single processor, called Von Neumann computer is used to train and test networks. In fact, the biological neural system architecture is completely different from the Von Neumann architecture. This difference significantly affects the type of functions each computational model can best perform.

7. Learning rate and momentum

Back propagation is a time-consuming algorithm when either the size of the net is large or the number of the training patterns is large (Khan et al., 1993). Back propagation has some limitations. There is no guarantee that the network can be trained in a finite amount of time. It employs gradient descent, i.e. follows the slope of the error surface downward and constantly adjusts the weights towards minimum. Therefore, it has the danger of getting trapped in a local minimum before achieving the global minimum. It is important to select the correct learning rate and momentum term when using back propagation. Unfortunately, there is little guidance, other than experience, which is based on trial-and-error (Anderson et al., 1993; Khan et al., 1993).

Learning rate, 1 is the constant of proportionality which provides dynamic access to the rate at which weights may be changed. A high learning rate corresponds to rapid learning which may push the training towards a local minimum or cause oscillation. In turn, when applying small learning rates, the time to reach a global minimum will be considerably increased (Khan et al., 1993). Learning rates for each layer of the same network can be different.

The remedy for problems of choosing learning rate is to apply a momentum factor, which is multiplied by the previous weight change so that while the learning rate is controlled the changes are still rapid (Khan et al., 1993). The role of the momentum term, is to smooth out the weight changes, which helps to protect network learning from oscillation (Anderson et al., 1993). A rule of thumb is that the learning rate for the last hidden layer should be twice that of the output layer. If there were no connections that jump layers, the learning rate for each prior hidden layer should be twice that of the prior hidden layer (Klimasauskas, 1993).

Training

1. Definitions

The term "training" or "learning" can be one of, or a combination of the following definitions:

1) Training means a process whereby error is used to modify the weights so that the network gives a more correct answer the next time (Klimasauskas, 1993).

2) Learning is a mechanical process which may be decision trees, called explanation trees. It is used for providing decision rules (Adeli and Yeh, 1990).

3) Learning is the process whereby the ANN learns from its mistake. It usually involves three tasks: 1) computes outputs; 2) compare outputs with desired outputs; and 3) adjusts the weights and repeats the process (Medsker et al., 1993).

In this research, the two terms "training" and "learning" are used interchangeably. Training (or learning) is the process by which the weights and biases are initialized randomly. It deals with splitting the samples prior to feeding them to the networks. These also include the algorithm used for minimizing the system error, and criteria for stopping training.



Last Updated on Wednesday, 14 December 2011 17:35