Sie sind auf Seite 1von 6

2005 IEEE International Workshop on Robots and Human Interactive Communication

Vision-based Mobile Robot Learning and Navigation


Arati Gopalakrishnan, Sheldon Greene
Electrical and Computer Engineering Tennessee State University Nashville, TN 37209, USA agopalakrishnan@tnstate.edu
Abstract This research develops a vision-based learning mechanism for semi-autonomous mobile robot navigation. Laser-based localization, vision-based object detection and recognition, and route-based navigation techniques for a mobile robot have been integrated. Initially, the robot can localize itself in an indoor environment with its laser range finder. Then, a user can teleoperate the robot and point the objects of interest via a graphical user interface. In addition, the robot can automatically detect potential objects of interest. The objects are automatically recognized by the object recognition system using Neural Networks. If the robot cannot recognize an object, it asks the user to identify it. The user can ask the robot to navigate back autonomously to an object recognized or identified before. The human and robot can interact vocally via an integrated speech recognition and synthesis software component. The completed system has been successfully tested on a Pioneer 3-AT mobile robot. Index Terms Human-robot interaction, object recognition, and mobile robot navigation

Ali Sekmen
Computer Science Tennessee State University Nashville, TN 37209, USA asekmen@tnstate.edu techniques are common in robotics. Active landmarks are beacons that transmit signals to be sensed by the robot. Passive landmarks, such as a fire extinguisher or door opening, are detected by the robot without any transmitted signals. Laser, sonar, and image data may be used separately or fused for this purpose. Probabilistic techniques including Markov and MonteCarlo localization [13,14], occupancy grids [15], Bayesian (belief) networks [16,29], and Kalman filtering [17,18] are also widely used for localization and position tracking. Odometry data are easily accessible to keep track of a robots position. However, they are not very reliable due to slippage and drift and need to be corrected. Calibration techniques such as UMBMark are used for correcting systematic odometry errors caused by physical problems such as unequal wheel diameter or misalignment of wheels [19]. Object recognition and localization capability is essential for a robot in order to perform required tasks. Variety of sensors including laser, sonar, and vision are employed and feature extraction techniques including Neural-Networks (NNs) are used for this purpose [20]. Detecting and responding to a stimulus that does not fit into expected perception is called novelty detection and has been used for inspection robots [21,22]. In typical novelty detection, a robot is allowed to travel in an environment and collect sensory data to train a NN. Self Organizing Maps (SOMs) can be used for landmark categorization [22]. In this research, a mobile robot builds its environments representation during its first navigation with or without human assistance. The user can interact with the robot via a GUI that provides sensory information such as sonar, laser, and vision data and enables the user to send actuating commands. There are 3 main modes. In the manual mode, the user can determine the objects of interest and they are recognized automatically. If the robot fails to recognize, then it asks for help from the user. The user can interact with the robot vocally via the robots speech recognition capability. In the autonomous mode, the robot automatically detects the landmarks that have

I. INTRODUCTION Mobile robots have been widely used in various application areas such as space missions [1,2], military missions [3,4], personal assistants to humans [5,27], toxic cleansing, entertainment [6,7], and tour guiding [8,28]. Some human-robot interaction (HRI) mechanisms and mobile robot navigation techniques should be incorporated in many applications [9,10]. HRI has been extensively studied by many research groups. It can be categorized as active and passive HRI. In active HRI, a user actively interacts with a mobile robot via unnatural communication tools such as a joystick or Graphical User Interface (GUI). In passive interaction, the robot enables the user to use more natural interaction means such as speech or gestures. In other words, the human user behaves naturally as if interacting with another person [11]. Vocal communication for HRI with speech recognition and synthesizing technologies is becoming common [12]. Self-localization for a robot is the process of estimating its own initial pose with respect to a global coordinate system. Landmark-based localization

0-7803-9275-2/05/$20.00/2005 IEEE

48

salient features. It then records images of the landmark from different perspectives to train its NN for object recognition. In the semi-autonomous mode, the robot identifies some potential objects of interest that are distinguishable in the environment and asks if the user is interested. After recognizing objects and calculating their locations, the robot may be asked to navigate to any of them. A roadmap (route) with the recognized landmarks is generated for path planning. Fuzzy-logic based mobile robot navigation behaviors are used for navigation. The initial self-localization is achieved with laser range finder by detecting specific combination of doorways. In this project, the systematic odometry errors are calibrated for more accurate navigation. This paper is organized as follows: Section II describes the development platform. Section III presents the HRI system. Self-localization and navigation are described in Section IV. NN-based object recognition system is explained in Section V. The results are presented in Section VI. Some conclusions are given in Section VII. II. DEVELOPMENT PLATFORM A Pioneer 3-AT (P-3AT) mobile robot produced by ActivMedia is used as the development platform (Fig 1). The robot has 16 sonar sensors, a laser range finder, a pan-tilt-zoom camera, bumpers, and optical encoders. The P-3ATs wheels can reach speeds of 1.6 meters per second and carry a payload of up to 23 kg. 8 front and 8 rear sonar transducers are placed on a ring to provide a coverage area of 360 degrees. A SICK Laser Measurement System (LMS) 200 laser scanner which scans its surroundings two-dimensionally is mounted on the P-3AT. A very small PC (Espresso PC) produced by Cappuccino PC is attached on the top of the robot [23]. The total weight of this computer is around one (1) pound and it has all the capabilities of a regular PC. An optional laptop computer can also be used on the robot. Microsoft Component Object Model (COM) technology [24] is used as the framework for developing programming language independent software components for low-level hardware control and mobile robot navigation. An ActiveX DLL was developed for low-level communication with the robot and the Pan-Tilt-Zoom (PTZ) control of the vision system.. Camera image transfer is achieved by ezVideo, an ActiveX control developed by Ray Mercer. Another ActiveX control for the laser was used in this research. III. HUMAN-ROBOT INTERACTION Some active and passive HRI techniques have been developed in this research. A user can interact with the robot via a GUI as shown in Fig 2.

Fig 1. Pioneer with a laptop computer attached.

Fig 2. The graphical user interface for HRI.

The user can access sonar and laser range data and live camera images and can manually control the robot and camera head. When an object of interest appears on live images, the user clicks on it in the video screen. Then the object is centered and zoomed for better view. After that, the automatic object recognition system is activated to recognize the object. If the robot cannot recognize the object, it directly asks the user for help. Speech synthesize (text-to-speech) is used for this purpose. The robot then uses its speech recognition capability to recognize what the user says. In a typical conversation, the robot says I am sorry, I cannot recognize the object. Could you tell me what it is? and the user replies as it is a fire extinguisher. If the robot can recognize the object it informs the user vocally by saying The object has been recognized. It is a trashcan. The robot has continuous and library-based

49

speech recognition options. The continuos speech recognition is not as accurate as the library-based one. In this research, an XML-based speech recognition library has been developed. It includes a wide range of objects. A short list is shown in Table 1.
Table 1. Sample speech recognition library. Command Connect to pioneer Enable motors Localize What do you see? Where is it? Go to trashcan Intention Connect to the robot Enable the motor Enable laser and self-localize Take a snap and recognize the object Calculate the objects distance with respect to robots coordinate Determine a roadmap and go to trashcan

landmarks in the environment is generated. This representation is later used to navigate to the proximity of the landmarks.

The robot may ask the user to help it recognize the object of interest. It then waits to hear a sentence that includes any of the objects from the library. For example, if the user says the object is a trashcan, the object I see is a trashcan, or as far as I see, it is a trashcan, the robot considers the object as trashcan. In addition, the user may ask the robot to navigate to an object recognized or identified before. For example, the user may say please go to the fire extinguisher or could you go to the trashcan that I specified before. The robot looks for some keywords such as go to, trashcan, or fire extinguisher. In the semi-autonomous mode, the robot detects a distinguishable landmark, zooms the landmark, and vocally asks if the user is interested. Microsoft Speech SDK 5.1 [25] is integrated into the HRI software developed for this research. This SDK contains text-to-speech and speech-to-text capabilities. It provides speech recognition and text-to-speech engines and associated Application Programming Interface (API) that can be utilized to integrate into the robot software system. IV. SELF-LOCALIZATION AND NAVIGATION The robot is initially located somewhere in a hallway and does not know its initial position. It starts following a wall by using a fuzzy-logic based wall fallowing behavior that uses sonar and laser sensors. The robot uses its template matching algorithm to detect specific combination of doorway openings. The robot then aligns itself with the door and sets that position as its starting location (Fig 3). The thick lines are the laser sensor readings. After localizing itself, the robot can keep track of its position using the odometry. In order to make the navigation more accurate and reliable the robots odometry was systematically calibrated with UMBMark [19] and then counter-acted their effect in software in real-time. As the objects of interest (or landmarks) are identified (automatically or with the users help), a topological representation of the

Fig 3. Doorway opening detection with the laser range finding scanning.

V. OBJECT RECOGNITION Objects were detected from the images taken by the Cannon VC-C4 camera mounted on the robot. A NNbased object recognition system is initially trained with a group of known object types such as ball, cylinder, trashcan, fire extinguisher, cube, or cone (Fig 4). In this technique a library of images of the objects of interest are created and used for training the NN. There are two approaches for training with NNs - supervised and unsupervised [26]. In supervised training, the NN is provided with the desired output (the real object) with the inputs. In unsupervised training, the NN continuously adjusts itself to the new input. The object recognition system of this research uses supervised NN training in which inputs (actual data) and outputs (desired data) are provided.

Fig 4. Sample objects of interest (or landmarks).

In this research, 3 different modes are employed. Manual Mode The user simply points at an object of interest in the GUI (live images). The object is centered and zoomed and a still image is taken. The image is then processed

50

by the NN. If the NN cannot recognize the object, the robot asks the user to identify it. If it is an object that was not previously in the NN library, several images from different perspectives are taken to expand the NN library. Semi-Autonomous Mode Some potential objects of interest are first automatically detected by the robot. Color-based salient features are extracted to identify those potential objects of interest. The user is then asked if he/she is interested in those objects. If the user is interested, several images from different perspectives are taken and the NN library is expanded. In this mode, the user can still interact with the robot as in the manual mode.

(a) Original scene.

(b) Hue values.

Autonomous Mode All of the objects of interest are automatically detected by the robot without the users intervention. In most of the novelty detection systems, the robot needs to travel within the environment and collects streams of data from various sensors. It then compares its current perception to its prior knowledge of the environment. In this research, the robot extracts salient features in the environment without prior knowledge. For example, the hallway walls are dominantly yellowish, the floor is greyish, and the ceiling is white. The system can distinguish the conspicuous objects in the environment as shown in Fig 5. The image is filtered according to the histograms of the hue and saturation values. An object clustering technique is employed for detecting multiple potential objects of interest as shown in Fig 6. After an object is recognized by the object recognition system or identified by the user, its location is estimated. A set of camera calibration operations for a known group of objects were performed prior to the HRI. Therefore, based on the size of the object in an image and pan-tilt positions of the camera head, the position of the object with respect to the robots position can be estimated. For unknown objects, the laser sensor data were also used with certain assumptions. If the robot cannot estimate the position of an unknown object, it asks the user to get help. VI. RESULTS A software system in Visual Basic was developed to interact with the robot via a GUI and speech technologies. The experimental setup includes a pathway which consists of few objects of interest (or landmarks) from the image library, a door opening and a P-3AT mobile robot. The user can remotely log into the robots computer to control the robot. The robot initially looks for the doorways in the hallway (Fig 7). When the robot finds the door opening it sets that position as its starting location (Fig 7, Position-1). After localizing itself, the robot continues to explore the environment looking for familiar landmarks. The user can watch the robots environment with the live images transferred to the GUI. The user can click on an object of interest in the image and camera centers and zooms on the object to take a picture. Then the picture is used by the object recognition system to recognize and localize the object. Fig 8 illustrates the robot as it moves in the hallway. As it encounters landmarks, they are automatically recognized. Their locations with respect to the robots coordinates is determined and stored in a database as shown in Table 2. The user may ask the robot at anytime to navigate back to any of the recognized objects. In the experiment, the user vocally says could you please go back to the fire extinguisher and the robot travels back to the fire extinguisher as shown in Figs 9 and 10. The robot first generates a

(c) Segmentation based on hue values.

(d) Scene after eroding the segmented image.

Fig 5. Detection of single object of interest.

(a) Original scene.

(b) Hue values.

(c) Segmentation based on hue values.

(d) Scene after morphological operations.

Fig 6. Detection of multiple objects of interest.

51

route and then employs its fuzzy-logic based behaviours including follow-wall, follow-center, and move-to-point. When the robot is in the proximity of a landmark, it corrects its position estimation.

Fig 9. Robot navigates back to the fire extinguisher.

Fig 7. Path followed by the robot before it localizes itself and 1 marks the position where the localization was achieved. {a,b,c,d} represents the objects of interest or landmarks, a- fire extinguisher, b- trashcan, c- ball, and d-cone.

Table 2. Objects (or landmarks recognized) and corresponding locations.

Object Fireextinguisher Trashcan Cone Ball

X (mm) 1056 2234 4678 5100

Y (mm) 9 10 17 18

TH (rad) 0.18 0.23 0.29 0,32

Fig 10. Robot in its destination point.

VII. CONCLUSIONS This paper describes a mobile robot navigation and learning mechanism with interactive human-robot communication. The robot travels in a partially known environment and learns about landmarks with or without interacting with its user. A library of known objects of interest is initially created and then expanded during the learning phase. In this system, the user can vocally interact with a mobile robot via a GUI. Self localization, odometry correction, behaviour-based fuzzy behaviors, speech recognition and text-to-speech, and NN based object recognition and localization systems were integrated and successfully tested. REFERENCES
[1] S. Laubach and J. Burdick, RoverBug: an autonomous pathplanner for planetary microrovers, 6th International Symposium on Experimental Robotics, Sydney, Australia, March 1999. [2] M. Snorrasson, J. Norris, and P. Backes, Vision based obstacle detection and path planning for planetary rovers, Proceedings of SPIE. Vol. 3693 Presented at 13th annual Aero sense. Orlando, FL April 1999. [3] K. Albekord, A. Watkins, Gl. Wiens and N. Fitz-Coy, Multipleagent surveillance mission with non-stationary obstacles, Proceedings of 2004 Florida Conference on Recent Advances in Robotics, Orlando, Florida, May 2004.

Fig 8. Robot finds the fire extinguisher marked as landmark 2, similarly 3- trash-can, 4- cone, and 5-ball.

52

[4] M. Marzouqi and R. Jarvis, Developing robots for covert missions, Monash University, Australia. [5] K. Kosuge, M. Sato and N. Kazamura, Mobile robot helper, Proceedings of International Conference on Robotics and Automation, pp.583-588, 2000 [6] Breazeal(Ferrell), C. and Velasquez, J. (1998), Toward teaching a robot infant using emotive communication acts, Proceedings of 1998 Simulation of Adaptive Behavior workshop on Socially Situated Intelligence, Zurich Switzerland. 25-40. [7] P. Stone and M. Veloso, Towards collaborative and adversarial learning: a case study in robotic soccer, Carnegie Mellon University, Pittsburgh. [8] D. Schulz, W. Burgard, D. Fox, S. Thrun, and A.B. Cremers, Web interfaces for mobile robots in public places, IEEE Robotics and Automation Magazine, vol. 7, no. 1, pp. 48-56, March, 2000. [9] A. Stentz and M. Hebert, A complete navigation system for goal acquisition in unknown environments, Carnegie Mellon University. [10] H. Jacob and S. Feder, Adaptive mobile robot navigation and mapping, The International Journal of Robotics Research, vol. 18, no. 7, pp. 650-668, 1999. [11] A. Sekmen, Human-robot interaction methodology, PH.D. Thesis, Vanderbilt University, 2000. [12] A. Sekmen, A.B. Koku, and S. Sabatto, "Multi-robot cooperation based on vocal communication," Proceedings of the IASTED International Conference Robotics and Applications, pp. 169-173, Tampa, FL, USA. [13] D. Fox, W. Burgard and S. Thrun, Markov localization for mobile robots in dynamic environments, Journal of Artificial Intelligence Research, vol. 11, pp. 391-427, 1999. [14] K. Konolige and K. Chou, Markov localization using correlation, Proceedings of the 16th International Joint Conference on Artificial Intelligence, Stockholm, Sweden, 1999. [15] S. Thrun, Learning occupancy grids with forward sensor models, Carnegie Mellon University, Pittsburgh [16] W. Lam and F Bacchus, Learning bayesian belief networks: An approach based on the MDL principle, Computational Intelligence, vol. 10, no. 3, pp. 269-293, 1999. [17] E. K. and M. Buehler, Three-state extended kalman filter for mobile robot localization, McGill University,Canada. [18] M. Bertozzi, A. Broggi, A. Fascioli, A. Tibaldi, R. Chapuis, and F. Chausse, Pedestrian localization and tracking system with kalman filtering, 2004 IEEE Intelligent vehicles symsposium, University of Parma, 2004. [19] J. Borenstein and L. Feng, Correction of systematic odometry errors in mobile robots, Proceedings of the 1995 International Conference on Intelligent Robots and Systems, Pittsburgh, Pennsylvania, pp. 569-574, August 1995. [20] B. Ayrulu and B. Barshan, Neural networks for improved target differentiation and localization with sonar, Neural Networks, vol.14, no.3, pp.355-373, April 2001. [21] S. Marsland, U. Nehmzow, and J. Shapiro, On-line novelty detection for autonomous mobile robots, Journal of Robotics and Autonomous Systems, vol. 51, pp. 191-206, 2005. [22] J. Fleisher and S. Marsland, Learning to autonomously select landmarks for navigation and communication, 7th International Conference on Simulation of Adaptive Behaviour, Edinburgh, 2002. [23] Cappuccino PC, http://www.cappuccinopc.com/espressopc.asp [24] D. Rogerson, Inside COM, Microsoft Press, 1997. [25]Speech , http://www.microsoft.com/speech/download/sdk51/ [26] Training with neural networks: supervised and unsupervised, http://www.dacs.dtic.mil/techs/neural/neural3.html [27] M. J. Johnson, E. Guglielmelli, G. A. Di Lauro, C. Laschi, A. Pisetta,G. Giachetti, C. Suppo, Y. Perella, M. C. Carrozza, and P. Dario. The robotic appliance: the next generation personal assistant? The 11th International Conference on Advanced Robotics, July 2003.

[28] P. Kittler, Tour Guide Robot, Final Year Thesis, FEIT, ANU, Canberra, Australia, October 1998. [29] F.V. Jensen, Bayesian networks and decision graphs, Springer-Verilag, 2001.

53

Das könnte Ihnen auch gefallen