Ann

ARTIFICIAL NEURAL NETWORKS: REVIVED
ABSTRACT There are basically two major approaches in the field of artificial intelligence (AI) for realizing human intelligence in machines. One is Symbolic AI and the other is based on low-level microscopic biological models. The examples of the latter approach are Artificial Neural Networks (ANNs) and genetic algorithms. [3] This paper deals with the basic concepts of Artificial Neural Networks (ANNs); the different network topologies and one of the standard algorithms. It helps to provide sufficient knowledge regarding ANNs and the ways through which it simulates the human brain. INTRODUCTION Neural networks make use of some of the organizational principles that characterize the human brain. They do so by making use of highly interconnected processing elements (nodes) that usually operate in parallel and are configured in regular architectures. This collective behavior of neural networks, demonstrates the basic functioning of human brain like the ability to learn, recall and generalize from training patterns or data. The three basic entities used to specify the models of Neural Networks are: 1. Models of the processing elements 2. Models of interconnections and structures (network topology) 3. Learning rules (the ways information is stored in the network). Characteristics of ANNs: [4] Massive parallelism Distributed representation and computation Learning ability Generalization ability Adaptivity Inherent contextual information processing Fault tolerance Low energy consumption
Applications of ANNs: Artificial Neural Networks (ANNs) can be used to solve a variety of problems such as pattern recognition, prediction, associative memory, optimization and control. Conventional approaches could solve these problems but they are not flexible enough to perform well outside their domain. ANNs provide exciting alternatives and many applications could benefit by using them. [4] Advantages of ANNs [2] Inherently massively parallel May be fault tolerant due to parallelism May be designed to be adaptive Little need for extensive characterization of problem (other than through the training set) Disadvantages of ANNs [2] No clear rules or design guidelines for arbitrary application No general way to access the internal operation of the network Training may be difficult or impossible
1
Difficult to predict future network performance (generalization)
BIOLOGICAL NEURAL NETWORK Neuron: It is the most basic element of the human body that has thinking capability. It is a special biological cell that processes information. [4] It is composed of a cell body, or soma, and two types of out-reaching treelike branches: the axon and the dendrites.
Fig.1 The cell body has a nucleus that contains information about hereditary traits and plasma that holds the molecular equipment for producing material needed by the neuron. A neuron receives signals (impulses) from other neurons through its dendrites (receivers) and transmits signals generated by its body along the axon (transmitter) which eventually branches into strands and sub strands. At the terminals of these strands are synapses. A synapse is an elementary structure and functional unit between two neurons (an axon strand of one neuron and a dendrite of another).Neurons are connected to each other and communicate with each other using very short train of pulses, typically milliseconds in duration. DEFINITION OF ARTIFICIAL NEURAL NETWORKS Just as a biological neuron works, an artificial neuron simulates it in a very similar manner. Definition: A structure (network) composed of a number of interconnected units (artificial neurons). Each unit has an input/output characteristic and implements a local computation or function. The output of any unit is determined by its I/O characteristics, its interconnection to other units and (possibly) external inputs. Although hand crafting of the network is possible, the network usually develops an overall functionality through one or more forms of training. [2] ANNs do not constitute one network, but a diverse family of networks. The overall function or functionality achieved is determined by the network topology, the individual neuron characteristics, and the learning or training strategy and training data.
McCULLOCH AND PITTS NEURON
Fig.2 The figure above shows a simple mathematical model of the above mentioned biological neuron proposed by McCulloch and Pitts in 1943, usually called as an M-P neuron. In this model, depending upon whether the input sum is above or below a certain threshold N , the Nth processing element computes a weighted sum of its inputs and outputs yN=1 (firing) or 0 (not firing).[3] The weight wjN represents the strength of the synapse (called the connection or link) connecting neuron j (source) to neuron N (destination).A positive weight corresponds to an excitatory synapse, and a negative weight corresponds to an inhibitory synapse. If wjN = 0, then there is no connection between the two neurons. NEURAL NETWORK ARCHITECTURES A single node is insufficient for practical problems, and networks with a large number of nodes are frequently used. [1] The way nodes are connected determines how computations proceed and constitutes an
2
important early design decision by a neural network developer. These networks are designed based on the different types of interconnections between biological neurons in the human body. The different architectures are as follows: A. Fully connected networks B. Layered networks C. Acyclic networks D. Feedforward networks E. Modular networks A. Fully Connected Networks It is the most general network architecture wherein every node is connected to every other node, and these connections may be either excitatory (positive weights), inhibitory (negative weights), or irrelevant (almost zero weights). Every other architecture is considered to be its subset, obtained by setting some of its weights to zero. Practically the situation wherein every node is connected to every other node never arises and hence, this architecture is seldom used. B. Layered Networks These are the networks wherein nodes are partitioned into subsets called layers, with connections that may lead from layer i to layer j, where i < j but not vice-versa. We adopt a convention that a single input arrives at and is distributed to other nodes by each node of the layer 0 or input layer.
Fig.3 No computations occur at nodes in layer 0 (input layer), and there are no intra-layer connections among nodes in this layer. However, connections with arbitrary weights, may exist from any node in layer i to any node in layer j for j>=i; intra-layer connections may exist. C. Acyclic Networks This is a subclass of the layered architecture in which there are no intra-connections between the nodes of the same layer, that is, a connection may exist between any node in layer i and any node in layer j for i < j, but a connection is not allowed for i = j. Thus, computation is simpler as compared to layered architecture.
Fig.4 D. Feedforward Networks This is a subset of acyclic networks. Here, there can be a connection between a node of layer i only to the nodes of layer i+1.
3
These networks are described by a sequence of numbers indicating the number of nodes in each layer. For instance, the network shown in the figure below is a 3 - 4 - 2 feedforward network; it contains three nodes in the input layer (layer 0), four nodes in the hidden layer (layer 1), two nodes in the output layer (layer 2).
Fig.5 Generally, with no more than four layers, these networks are the most common neural networks in use. E. Modular Networks In this network architecture, there are several modules with sparse interconnection between different modules. The tasks are solved in smaller modules and then combine together these modules in a logical manner. Modules can be organized in several different ways as follows: Hierarchical organization Successive refinement Input modularity
PERCEPTRONS Rosenblatt (1958) defines a perceptron to be a machine that learns, using examples, to assign input vectors (samples) to different classes, using a linear function of the inputs.[1] In its simplest form, a perceptron has a single output whose values determine to which of two classes each input pattern belongs. Such a perceptron can be represented by a single node that applies a step function to the net weighted sum of the inputs. The input pattern is considered to belong to one class or the other depending on whether the node output is zero or one. LINEAR SEPARABILITY For two dimensional input, if there exists a line (whose equation is w0 + w1x1 +w2x2 = 0) that separates all samples of one class from the other class, then an appropriate perceptron (with weight w0, w1, w2 for the connections from inputs 1, x1, x2, respectively) can be derived from the equation of the separating line. Such classification problems are said to be linearly separable, that is, separable by a linear combinations of input. Therefore, if the samples are not linearly separable, that is, no straight line can possibly separate samples belonging to two classes, then they cannot be any simple perceptron that achieves the classification task. This is the fundamental limitation of a simple perceptron. PERCEPTRON TRAINING ALGORITHM Algorithm Perceptron; [1] Start with a randomly chosen weight vector w0; Let k = 1; while there exists input vectors that are misclassified by wk-1 , do
4
Let ij be a misclassified input vector; Let xk = class (ij). ij, implying that wk-1. xk < 0; Update the weight vector to wk = wk-1 + xk; Increment k; end while; A learning procedure called the perceptron training algorithm can be used to obtain mechanically the weights of a perceptron that separates two classes, whenever possible. The perceptron developed in this manner can be used to classify new samples, based on whether the node output is 0 or 1 for the new input vector. We denote vectors by rows of numbers. If w = (w1, w2 wn) and x=(x1, x2 xn) represent two vectors then their dot product or scalar product w.x is defined as (w1x1+w2x2+..wnxn). The Euclidean length || v || of a vector v is (v.v)1/2 . The perceptron training algorithm does not assume any prior knowledge about the specific classification problem being solved i.e. initial weights are random. Input samples are repeatedly presented and the performance of the perceptron observed. If the performance on a given input sample is satisfactory i.e., the current network output is the same as the desired output for a given sample, the weights are not changing in this step. But if the network output differs from the desired output the weights must be changed in such a way as to reduce system error. CASE STUDY Problem: Develop a perceptron for the AND function with bipolar inputs and targets. Solution: The training pattern for AND gate can be,
INPUT TARGET X1 X2 class (ij) 1 1 0 -1 1 -1 1 -1 -1 -1 -1 -1 Step 1: Initial weights w0= -1 and class (ij) =1; k =1; =1 Step 2: For input pair ij= (1, 1), wk-1= -1 Step 3: Calculate xk = 1 x 1 = 1 Step 4: Calculate wk-1.xk = -1 x 1 = -1 Step 5: If wk-1.xk < 0 do Step 6, 7 else END Step 6: Update the weight vector wk = wk-1 +xk wk= -1 + (1 x 1) = 0 Step 7: Increment k TERMINATION CRITERIA For many neural network learning algorithms, one simple termination criteria is: Halt when the goal is achieved . For perceptrons and other classifiers the goal is the correct classification of all samples, assuming these are linearly separable. So the perceptron training algorithm can be allowed to run until all samples are correctly classified. Termination is assured if is sufficiently small, and samples are linearly separable. The above termination criteria allows the procedure to run indefinitely if the samples are not linearly separable or if the choice of is inappropriate. To detect such a case we can compare the amount of
5
progress achieved in recent past. If the numbers of misclassifications have not changed in a large number of steps, the samples may not be linearly separable. However if the problem is with the choice of then experimenting with different value of may yield improvement. [1] CHOICE OF LEARNING RATE The examination of extreme cases can help derive a good choice of .if is too large (eg: =1000000), then the components of w = x can have very large magnitudes, assuming components of x are not infinitely small. [1] Consequently each weight update swings perceptron output completely in one direction so that the perceptron now considers all samples to be in the same class as the most recent one. This effect is reversed when a new sample of other class is presented: now the system considers all samples to be in that class. The system thus oscillates between extremes. If equals some infinitely small value, the change in weights in each step is going to be infinitely small, assuming components of x are not very large in magnitude. CONCLUSION Thus in this paper we put light on some very fundamental concepts of Artificial neural networks and how the perceptron algorithm helps the system in classifying the raw unclassified input available in a similar manner just as the human brain.

Ann

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Ann

Hochgeladen von

Copyright:

Verfügbare Formate

ARTIFICIAL NEURAL NETWORKS: REVIVED

Difficult to predict future network performance (generalization)

McCULLOCH AND PITTS NEURON

Das könnte Ihnen auch gefallen