Sie sind auf Seite 1von 10

Trends in Neural Network Research and an Application to Computer Vision

Department of Information Science, Bielefeld University P.O.B. 100 131, D{33615 Bielefeld, FR Germany

ENNO LITTMANN

ABSTRACT In recent years, arti cial neural networks (ANNs) begin to turn from a `black box' technique into a scienti cally well-founded research eld. While originally inspired by models of biological neurons, current ANNs are part of non-linear statistics, too. The paper gives a summary of current trends in neural network research. Typical representatives of the main neural network types are presented and their advantages and problems are discussed. Major elds for applications of neural networks are robotics and computer vision. A hybrid system is presented combining classical and neural techniques to extract hand orientation from real images.

1 Introduction
In recent years, neural networks have become established as a major tool for tackling problems from such diverse elds as pattern recognition, medical diagnosis, composition of music, task scheduling, speech recognition and generation, process control, game playing and even forecasting of bond rates. But although it sometimes seems that neural networks are advertised as an \universal algorithmic cure" that is able to solve almost any problem at hand, serious e orts are made to improve the understanding of the basic strengths and weaknesses of the neural network approach. A summary of these current trends in neural network research will be given in section 2. Most of the successful current applications use neural networks to implement some kind of signal transformation that cannot be computed satisfactorily by any other means, be it for lack of knowledge about the underlying process, or because a more traditional approach would turn out more complicated or expensive. For such types of tasks, neural networks o er a highly attractive approach, and since this type of task may occur in a wide variety of elds, neural networks have proven useful in very diverse domains. Typical representatives of the main neural network types are presented in section 3 and their advantages and problems are discussed. Major elds for applications of neural networks are robotics and computer vision. If issues of perception or movement control are concerned we have nowhere come even remotely close to the information processing capabilities of even simple animals, such as bees or birds. Therefore, to understand the working principles of neural networks, the elds of robotics and computer vision seem to be particularly adequate and promising areas, both for studying the kind of information processing

tasks that are well solved by biological networks, and for exploring ways to replicate these capabilities with arti cial neural networks. One of the key problems is the robust recognition, grasping and manipulation of objects in an arbitrary environment. Part of this task is the visual control of hand movements. In section 4 a hybrid system is presented that combines classical methods and neural networks to extract the hand orientation from real images.

2 Current Trends
The recent years were mainly dominated by contributions showing what problems could be tackled by neural networks in principle . The claim of biological plausibility was gradually abandoned in favour of networks that solved actual problems. Thus, the close relationship of most of the NN algorithms with classical methods known from statistics is more and more rediscovered and inspires the analysis of the properties of ANNs as well as improvements in their performance. Currently, we can distinguish ve main streams of research: Faithful modeling of biological neural systems on various levels Technical NNs based on and improved by statistical and probabilistic methods Optimization of the NN design New learning methods for real-world (real-time) applications Solution-oriented hybrid systems
2.1 Biological neural systems In former years, the biological plausibility of a NN model was a widely used criterion to judge the \neural" character of an approach. Today, the mentioned argument has lost some of its importance, since the opinion is widely spread that if nature needed some architecture or algorithm it will have found ways to realize it in neural structures. The new challenge is the faithful modeling of biological data. Such models could be able to prove hypotheses about the biological system by synthesis and serve as tools to optimally design and guide further biological experiments. Examples are models to explain the synchronization phenomena, networks consisting of spiking neurons, or even modeling the behavior of a synapse itself. All these models are limited to only one or few levels within the whole scope of possible description levels reaching from membrane processes to high-level vision or language understanding. 2.2 Technical neural networks Even nowadays, NN research still has a touch of alchemy. We do not know what type (and structure, size ...) of network should be used for what type of data, we do not even know if there are rules for this at all. Well-known methods are applied in the context of neural networks to analyze the properties of given

data with interpretable results, thus providing a rm basis for further research. Investigations on the use of mutual information to analyze the importance of the input data for the output 16] will help to select optimized feature vectors. Also, probabilistic methods are used to optimize learning procedures wherever possible. The implementation of an on-line version of the EM algorithm for the training of expert networks by Jordan and Jacobs 6] is an example for this. Well-known numerical techniques can be enhanced with adaptive methods to yield a network architecture with a great transparency. 2.3 Optimization of the NN design A high-tech neural network based on the heuristics of its designer - this is no longer a satisfying situation. Furthermore, it should be noted that in most current applications the neural component mostly consists of a single or a small number of \neural modules", usually of moderate size and complexity, and it is amazing that such small systems can already perform the tasks that have been demonstrated so far. This may give us some notion of what performance may be expected if we manage to move arti cial neural systems closer to the level of complexity that we observe even in simple biological neural systems. Three basic approaches deal with this problem. Genetic and evolutionary algorithms can be used not only to train networks but also to select and evolve network structures from network populations (for an introduction, cf. 3]). Incremental approaches like the cascade correlation approach by Fahlman 1], the cascade architecture by Littmann and Ritter 10], or the GCS by Fritzke 2], start with very small networks that grow until a certain criterion is met. The opposite approach is represented by pruning methods like the optimal brain surgeon 4], where \unnecessary" weights or nodes are culled from an initially oversized network. 2.4 New learning methods The overwhelming part of publications is concerned with supervised learning so far. Real-world applications, however, require on-line learning from few examples, processing of time-varying signals or predictive control. A special issue of IEEE Transcations on NN on recurrent networks is about to appear. Autonomous systems need unsupervised learning approaches like reinforcement learning (Q learning, Dyna Q). A further key issue are strategies to learn actively, to explore the environment, instead of passively transforming examples into weights. The vast amount of information we receive if we exploit visual information leads to the demand for methods to guide some focus of attention in order to reduce the information to a reasonable amount. 2.5 Solution-oriented hybrid systems Although many projects aim at applications, the number of neural networks being used in the eld is rather small. Often, the NN approach does not lead to a substantial performance improvement so that there is no need to abandon the established, well-understood techniques. So far, we cannot predict reliably what performance can be expected and what e ort must be made to solve a given task

with a NN approach. Thus, if we aim at applications for the practical use, we must accept that ANNs are employed for only part of a complex task. Since there is a lack of methods to train very large systems, problems must be decomposed into parts as it can be done e.g. by semantic networks. Wherever explicit knowledge or exact methods are available they de nitely should be used to reduce the complexity of the remaining problems. Whenever the decomposition is too complex or no exact knowledge about the underlying processes available, ANNs should be taken into account. This requires a close cooperation of experts from various elds.

3 Backpropagation vs. Self-Organizing Maps


The two most important network types in supervised learning are multilayer perceptrons (MLP) trained by error-backpropagation (or variations) and selforganizing maps with the extension to learning vector quantization (LVQ) or local linear maps (LLM). 3.1 Multi-layer perceptrons Simple perceptrons have been proved to be restricted to the solution of linearly separable problems. These limitations do not apply to feed-forward networks with \hidden" layers between the input and output layer and nonlinear units at least in the hidden layer. With the reinvention of a suitable learning rule by Rumelhart et al. 15] the training of such networks became tractable. The majority of current research projects involves MLPs or related approaches. The layers consist of units that calculate a weighted sum of their inputs followed by some nonlinear operation in the case of nonlinear units. For the adaptation of the weights a cost function is de ned. Using gradient descent, the resulting error can be backpropagated through the network and the weights be updated according to the delta rule (for a full account, cf. 5]). 3.2 Local linear Maps LLM networks have been introduced by Ritter ( 12, 14]). They are related to self-organizing maps 7] and the GRBF-approach. The architecture is computationally e cient and learns fast. Basically, it approximates a non-linear transformation by a set of locally valid, linear mappings (Fig.2), where mapping r (r = 1 : : :N , N being the number of \units" of the network) is given by yr = wrout + Ar (x ? wrin ): (1) Here, x denotes the input vector, wrin 2 IRL and wrout 2 IRM (\input" and \output weight vector") are elements of the input and output space (of dimensionality L and M , resp.), and Ar is a M L-matrix. The output of the network is a weighted superposition of the outputs yr of the individual maps. In the simplest case, the contribution of the map s with the minimal distance in the input space is used (\winner-take-all"-network). This leads to a tesselation of the input space.
( ) ( ) ( ) ( )

Output Space

S S S
Bias Input vector Hidden layer

S S

Units

Matrix

Output units

Input Space

Figure 1: Multi-layer perceptron

Figure 2: LLM Network

To perform the required transformation between input and output space the necessary values of the network parameters are learned during a training phase. For this purpose correct input-output pairs (x ; y ), = 1; 2; : : :T , of a set of T training samples are presented repeatedly and in a random sequence. The input and output weight vectors as well as the matrix coe cients are adapted according to the following simple error-correction rules: wsin = (x ? wsin ); wsout = (y ? y net ) ? As wsin ; As = (ds)? (y ? y net )(x ? wsin )T : During this adaptation process each training step parameter i decays exponentially from a large initial value to a small nal value. 3.3 Comparison Main features of the MLP are the fast recall due to the inherent parallelity and the forming of internal representations in the hidden layer(s). The problems of local minima and slow learning progress have given rise to a variety of partially useful enhancements like the inclusion of momentum terms or higher order learning rules such as quickprop. The main problem, however, the optimal structure and learning rate for a given task, is far from being solved. LLMs are very fast to train and are quite robust against getting stuck in local minima. This is due to the unsupervised forming of the input quantization and the guaranteed convergence of the linear mapping performed by the matrices connected to the output vectors. The con guration of the input reference vectors re ects the topology of the input space. The main disadvantage is the long search time needed to nd the \winner node" or, if more nodes contribute to the output,
( ) ( ) ( ) 1 ( ) ( ) ( ) 2 ( ) ( ) ( ) 3 2 1 ( ) ( ) ( ) ( )

the calculation of many outputs instead of one.

4 A hybrid system for hand posture recognition


One of the main elds for applications of ANNs is signal processing and interpretation. Recognition and description of objects is one of the main problems in computer vision. Furthermore, this task is one of the key capabilities required for the successful operation of both biological organisms and arti cial robots. A key problem in our robot lab is the visual recognition of a hand and the characterization of its orientation. This poses a challenging vision problem the solution of which is of great practical interest, for instance to facilitate the control of multi ngered anthropomorphic manipulators. In the following, a hybrid system for hand posture recognition will be presented including not only neural networks but also classical tools from image processing, both approaches together organized in a semantic network. The sequence demonstrates how the method, originally limited to computer-rendered or at least presegmented images, is integrated into a system that works with real images.

Figure 3: Possible hand orientations with extreme values for roll and yaw whereas pitch was xed at 0 . (roll,yaw) have the values (a) (35 , ?55 ), (b)(?35 , ?55 ), (c)(?35 , 0 ), (d)(35 , 0 ).

4.1 Hand posture recognition In order to extract the hand orientation from images only containing a hand with a black background, the required mapping of a suitable feature space into the orientation space is learned by a LLM network. The images show hands within a certain range of permitted orientations (Fig. 3). First, the image is preprocessed. After thresholding, a laplace lter operates on the image and negative lter values are clipped. The resulting image mainly contains the edge information of the original image and forms the basis for a subsequent operation with gabor lters. These lters are orientation sensitive and well suited to exploit the edge information. We use a 3 3 \ lter grid" centered at the area of interest with four di erent orientations at each grid location. The resulting 36-dimensional feature vector forms the input for the LLM network (see also 11]). At present, this network calculates two of the three relevant angles|roll and yaw|, whereas the third (pitch) is a constant through all images. While this method yields good results for images with \no" background, it

does not work with arbitrary backgrounds.

Figure 4: Complex scenes with a roboter hand

Figure 5: One-step skin-color activity map for the image of Fig. 4

Figure 6: Segmented image with instances of hand-arm-complex (every instance is marked by one grey value)

Figure 7: Optimal hand instance

4.2 Hybrid image segmentation Thus, a segmentation of the hand must precede the orientation recognition. The rst part is realized by a LLM network, too. A network is trained to map the local color information in the surround of each pixel onto a real-valued \skin-color activity" value (see Fig. 5 for an example). The training is based on samples from subregions of two of the training images. The 9-dimensional input for each pixel contains the RGB values of the pixel itself as well as the average RGB values within a 3 3 and a 5 5 neighborhood, thus introducing a moderate amount of spatial context information into the network. In the hybrid approach, the network provides a \skin-color activity map" that is the basis for the detection of the hand-arm-complex with classical and knowledgebased methods. These include rst a median ltering to eliminate small regions of activity. Based on the histogram of the ltered skin-color activity map a threshold

is calculated. For the resulting binary image a region labeling is performed (see Fig. 6). Each labeled region is transformed into a normalized orientation by a Karhunen-Loeve transformation. Finally, the region with the smallest distance to a subspace spanned by \eigenhands" derived from a training set of hands is selected. This method is adopted from the eigenface approach (cf. 17]). Based on general knowledge about the shape and proportions of hands and arms the hand is extracted from the complex. The resulting region is remapped and represents the area of interest for evaluating the orientation of the hand (see Fig. 7). From this image the hand orientation can be extracted according to the approach mentioned above. With this hybrid system, we can extract the hand region from all 200 images of our database.

Figure 8: Complex scene in low resolution

Figure 9: Segmented low resolution image

Figure 10: Candidate regions in low resolution

Figure 11: Segmented high resolution image

4.3 Neural image segmentation In real images, there are lots of single pixels with RGB values identical to those of hand pixels. This leads to a noisy segmentation result. Therefore, we chose a two-step process to accelerate and improve the neural segmentation. First, the resolution is decreased by a factor of 10 (Fig. 8). The segmentation of this blurred image with a special network is very fast (Fig. 9). Now only regions with at least four connected pixels are selected as candidate regions. Using a threshold with low sensitivity, we achieve high speci ty and get almost exclusively hand pixels (albeit

not all of them)(Fig. 10). In the second step, we use the candidate regions and their surrounding as masks for the segmentation of the original image (Fig. 11). This yields a signi cant reduction in the number of pixels to be ltered with the full resolution. After this step, the approach yields only one (and the correct) hand region for 190 of 200 images. The further steps of the hybrid approach have not been replaced yet, but investigations are under way. 4.4 Lighting calibration Additionally, the result of the segmentation by the LLM-network as shown in Fig. 5 su ers from the variations in the images introduced by changing lighting conditions, camera quality and calibration. One possibility to achieve more robustness against such e ects is a softcalibration of the image. The idea is similar to the white-calibration of standard cameras. We need a reference table including white and maybe colored spots to be seen in at least one of the images of each series with constant conditions. Then, this calibration information could be used to transform the actual RGB values of the image pixels into the virtual RGB values they would have had, if the image had been taken under the conditions the neural networks had been trained for. This task is not particularly di cult if there are many reference points available. The aim, however, is to nd ways to calibrate the images without pre-known reference points. Rather, it should be based on extreme values of color, hue, saturation, or intensity in the images and use these points to parametrize the mapping for the soft-calibration. An architecture that seems to be particularly suited for this task is the parametrized self-organizing map (PSOM) proposed by Ritter 13].

5 Conclusion
The sacri ce of the biological plausibility claim has opened the neural network research for a variety of new methods. In particular, well-understood statistical tools have found entrance to the design and investigation of neural algorithms and help to improve our understanding of the working principles. Furthermore, the still existing skepticism towards the capabilities of ANNs could be reduced, while the knowledge of neural network experts about similarities to classical approaches has increased. In our application we do not claim any biological plausibility. We have decomposed the problem of the detection of complex objects in real images into subtasks like the recogniton of hand orientation, image segmentation, or color calibration. Each subtask employs classical techniques as well as neural networks wherever it appears to be convenient. With this system, we actually can solve the task under some constraints but have lost the ease with which anyone of us can nd and track hands in an almost arbitrary environment. The implementation of systems including neural networks to solve actual problems is necessary to justify the research in this eld and to investigate the computational power of the algorithms. Nevertheless, the biological model should never be forgotten. Lighting in-

variance or segmentation are symbolic names for properties that probably will not exist in nature in this distinctness. We will have to learn how to compose and how to train large systems solving a variety of problems simultaneously. Thus, we might learn to exploit those synergetic e ects that our own capabilities are based on.

6 Acknowledgements
This work was supported by the German Ministry of Research and Technology (BMFT), Grant No. ITN9104AO. Any responsibility for the contents of this publication is with the author.

6. References
1. Fahlman, S.E., Lebiere, C. (1989), \The Cascade-Correlation Learning Architecture", in Advances in Neural Information Processing Systems 2 , ed. D.S. Touretzky, pp. 524{532, Morgan Kaufman Publishers, San Mateo, CA. 2. Fritzke, B. (1993), \Growing Cell Structures | A Self-Organizing Network for Unsupervised and Supervised Learning", TR-93-026, ICSI, Berkeley, CA. 3. Goldberg, D.E. (1989). Genetic Algorithms , Addison-Wesley, Redwood City, CA. 4. Hassibi, B., Stork, D.G. (1993). \Second Order Derivatives for Network Pruning: Optimal Brain Surgeon", in Advances in Neural Information Processing Systems 5 , eds. C.L. Giles, S.J. Hanson, J.D. Cowan, pp. 188{195, Morgan Kaufman Publishers, San Mateo, CA. 5. Hertz, J.A., Krogh, A.S., Palmer, R.G. (1991), Introduction to the Theory of Neural Computation , Addison-Wesley, Redwood City, CA. 6. Jordan, M.I., Jacobs, R.A.(1993), \Hierarchical mixtures of experts and the EM algorithm", TR-9301, MIT Computational Cognitive Science, Cambridge, MA. 7. Kohonen, T. (1984), Self-Organization and Associative Memory, Springer Series in Information Sciences 8, Springer, Heidelberg. 8. Kohonen, T. (1990), \The Self-Organizing Map", in Proc. IEEE 78, pp. 1464{1480. 9. Kummert, F., Littmann, E., Meyering, A., Posch, S., Ritter, H., Sagerer, G. (1993). \A Hybrid Approach to Signal Interpretation Using Neural and Semantic Networks", Proc. 15. DAGM-Symposium 1993, Lubeck , eds. S.J. Poppl, H. Handels, pp. 245-252, Springer, Heidelberg. 10. Littmann, E., Ritter, H. (1993). \Generalization Abilities of Cascade Network Architectures", in Advances in Neural Information Processing Systems 5 , eds. C.L. Giles, S.J. Hanson, J.D. Cowan, pp. 188{195, Morgan Kaufman Publishers, San Mateo, CA. 11. Meyering, A., Ritter, H. (1992), \Learning 3D-Hand Postures from Perspective Pixel Images", in Arti cial Neural Networks 2, eds. I. Aleksander, J. Taylor, pp. 821-824, Elsevier Science Publishers (North Holland). 12. Ritter, H. (1991), \Learning with the Self-organizing Map", in Arti cial Neural Networks 1 , eds. T. Kohonen, K. Makisara, O. Simula, J. Kangas, pp. 357-364, Elsevier Science Publishers (NorthHolland). 13. Ritter, H. (1993), \Parametrized Self-Organizing Maps", in Arti cial Neural Networks 3, eds. S. Gielen, B. Kappen, pp. 568-575, Springer, Heidelberg. 14. Ritter, H., Martinetz, T., Schulten, K. (1992). Neural Computation and Self-organizing Maps , Addison-Wesley, Reading, MA. 15. Rumelhart, D.E., Hinton, G.E., Williams, R.J. (1986), \Learning Representations by Back-Propagating Error", in Nature , pp. 533{536. 16. Steil, J. (1993). \Marko sche Felder und wechselseitige Information in der Bildverarbeitung", Diploma thesis, Bielefeld University, FR Germany. 17. Turk, M. Pentland, A. (1991). \Eigenfaces for Recognition", in J. Cog. Neuroscience 3(1), pp. 71{86.

Das könnte Ihnen auch gefallen