Sie sind auf Seite 1von 7

LEARNING FROM EXPERIENCE

Machine learning is a scientific discipline that is concerned with the design and development of algorithms that allow computers to evolve behaviours based on empirical data such as from sensor data or databases. A learner can take advantage of examples to capture characteristics of interest of their unknown underlying probability distribution. Data can be seen as examples that illustrate relations between observed variables. A major focus of machine learning research is to automatically learn to recognize complex patterns and make intelligent decisions based on data; the difficulty lies in the fact that the set of all possible behaviours given all possible inputs is too large to be covered by the set of observed examples or training data. Hence the learner must generalize from the given examples so as to be able to produce a useful output in new cases. Learning is to: improve behaviour/performance learn to perform new tasks (more) increase ability on existing tasks (better) increase speed on existing tasks (faster) produce and increase knowledge: formulate explicit concept descriptions formulate explicit rules discover regularities in data discover the way the world behaves Overall, machine learning to promote autonomy of agents. A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E for example, a computer program that learns to play checkers might improve its performance as measured by its ability to win at of tasks involving playing checkers games, through experience obtained by playing games against itself.

2.1

Requirements for Learning by Experience

In order to get a solution when using the learn by experience model in a single agent or robot, we need to define a set of pre defined states or situations where the agent can act, a set of possible actions that the agent can perform, a way of modifying parameters of action selection while actions reveal as good or bad for certain situations. The same set is necessary, and enough, when using multi-agent or multi-robot systems. It is important to note that although agents need a pre defined set of states or situations, abstraction is made in a way that any real situation in the environment is mapped into any pre defined state. Generally, all actions are tested in all situations during the training process in order to verify if it is good to use the action set or not.

Reinforcement Learning- the learning machine does some action on the environment and gets a feedback response from the environment. The learning system grades its action good as rewarding or bad as punishable based on the environmental response and accordingly adjusts its parameters. Generally, parameter adjustment is continued until an equilibrium state occurs, following which there will be no more changes in its parameters. The self organizing neural learning may be categorized under this type of learning.

Goal: Maximize cumulative discounted rewards Learn optimal state-action value function Q*(s, a) Learning of strategies through interaction with the environment
o o o o

Try out actions to get an estimate of Q Explore new actions, exploit good actions Improve currently learned policies Various learning algorithms: Q-Learning, SARSA

2.2

Learning Element : A Painball Game

A learning agent is composed of a few fundamental parts: a learning element, a performance element, a curiosity element or 'problem generator', and a performance analyser or 'critic'. The learning element is the part of the agent which modifies the agent's behaviour and creates improvements.

10

The performance element is responsible for choosing external actions based on the percepts it has received where percepts are information that is known by the agent about its environment. To illustrate this, consider that one of our agents is in the woods playing paintball. He is aware of an opposing paintballer nearby. This would be the percept that the agent responds to, by selecting an action which is moving behind a tree. This choice of action is made by the performance element. The performance analyser judges the performance of the agent against some suitable performance measure which in this case could be how close the agent is to being hit by the enemy or how many enemies have been hit. The performance must be judged on the same percepts as those received by the performance element, the state of affairs 'known' to the agent. When the analysis of performance has been made, the agent must decide whether or not a better performance could be made in the future, under the same circumstances. This decision is then passed to the learning element which decides on the appropriate alteration to future behaviour and modifies the performance element accordingly. In order to make sure that the agent advances in its learning and doesn't merely confine itself to previously observed behaviour, the curiosity element which has knowledge of the desirable behaviour of the agent for example it knows that being shot is not desirable and that finding the opponent's flag is steps in. To achieve optimal performance, this element will pose new challenges to the agent in an attempt to prevent bad habits developing. To understand the benefits of this, consider a paintballer who is hiding behind a tree. From his past experience, he knows that he is safe to stay where he is, and this would result in an adequate performance. However, the curiosity element kicks in and suggests that he makes a break from his cover and heads to a nearby tree which is closer to the enemy flag. This may result in the agent ultimately being shot at, but could also achieve a more desirable goal. It is then up to the performance analyser and the learning element to consider whether there is a benefit to this change in strategy.

11

2.3

Problems with Learning

Despite the obvious potential that learning has to offer the gaming world, it must be used carefully to avoid certain pitfalls. Here are but a few of the problems commonly encountered when constructing a Learning AI:

Mimicking Stupidity - When teaching an AI by copying a human player's strategy, it can be realised that the computer is taught badly. This is more than likely when the player is unfamiliar with a game. In this situation, a reset function may be required to bring the AI player back to its initial state or else a minimum level must be imposed on the computer player to prevent its performance dropping below a predetermined standard.

Overfitting - This can occur if an AI agent is taught a certain section of a game and then expected to display intelligent behaviour based on its experience. An agent which has learnt from its experience over one level will encounter problems when attempting a new level, as it may not have learnt the correct 'lessons' from its performance. If it has found that when opening doors, it has been able to escape the line of fire by diving behind a wall to its left, it will assume that this is a generalized tactic. This could lead to amusing behavioural defects if not monitored in the correct way.
12

Local Optimality - When choosing a parameter on which the agent is to base its learning, be sure to choose one which has no dependency on earlier actions. As an example, take a snow-boarding game. The agent learns, through the use of an optimization algorithm, the best course to take down the ski slope, using its rotation as a parameter. This may mean that a non-optimal solution is reached, in which any small change cannot improve performance. Think about the data being stored, a sequence of rotations clockwise and anticlockwise. An alteration to a rotation in the first half of the run may lead to a better time over the course in the long run but in the short-run, could cause a horrific crash further down the slope, as the rest of the rotations are now slightly off course.

Set Behaviour The question here is, once an agent has a record of its past behaviour and the resulting performance analysis, does it stick to the behaviour which has been successful in the past or does it try new methods in an attempt to improve? This is a problem which must be addressed or else an agent may either try to evaluate every possible behaviour or else stick to one without finding the optimal solution.

2.4

Application in Real World

RC Copter iZep The helicopter apprenticeship at Stanford University seems to contain some of the same elements of unrealism in magic as helicopters learn to fly and execute complex airborne tricks without a human pilot in the cockpit. There is science behind the magic. Computer scientists at Stanford University have developed an A.I. system that learns to perform inflight manoeuvres by watching human-operated helicopters do the same. Instead of having computer programmers, laymen when it comes to piloting aircraft, labour away translating exact flying instructions into code, the helicopters learn from human experts, in this case, radio control pilot Garett Oku. The scientists had Oku perform a complete air show multiple times over while they recorded every movement of the helicopter. They then translated those movements into an algorithm that they could "teach" to the robotic helicopter. Numbers, unlike people, don't vary, meaning the robotic helicopter could perform the manoeuvres more reliably than Oku. The recap of a recent test run in Palo Alto sounds like an intricate ballet, the robotic helicopter performed a five-minute air show, replete with travelling flips, rolls,
13

loops with pirouettes, stall-turns with pirouettes, a knife-edge, an Immelmann, a slapper, an inverted tail slide, and something called a "hurricane." The helicopter was even able to execute the "tic toc," a move in which it hovers side-to-side like a pendulum while its nose points directly upward. Not only is this more than any other robotic helicopter has been able to do previously, it also exceeds the capabilities of a full-scale, human-piloted helicopter. Though the robot's tricks may seem effortless, there is a lot of work being done behind the scenes and a lot of risks. A helicopter is an inherently unstable machine. Left without instructions for even a moment, it will tip and crash. This makes the computer code extremely complicated to write. Preliminary tries worked fine for simple moves, but couldn't handle the hurricane. Having the robot copy the exact movements of the joystick when operated by Oku didn't cut it either, since that method couldn't account for unpredictable variables like wind gusts. Eventually, teasing the perfect algorithm from an expert radio control pilot was the solution. The robotic helicopter, painted in Stanford colours, is a juiced-up version of a standard RC copter, carrying accelerometers, gyroscopes and magnetometers. With these as well as instruments on the ground, the position, direction, orientation, velocity, acceleration and spin of the helicopter are continuously monitored throughout flight. New flight directions are sent from a lightning-fast ground-based computer to the helicopter via radio at the rate of 20 times per second. Future applications will likely place these auto-copters straight into the danger zone, searching for land-mines in war zones or informing California firemen of susceptible wildfire spots in real time, so they don't have to rely on information that can be several hours old, as they often do now.

14

Das könnte Ihnen auch gefallen