Sie sind auf Seite 1von 24

Reinforcement Learning

The study of thinking.


1) Problem-Solving
2) Reasoning
Perception Memory Thinking/Cognition
Sensation Encoding
Retrieval

---------------------------------------------------------------------
Low Level Higher Level

Thinking is a higher-level cognitive process that requires all


sorts of cognitive operations (e.g. attention, perception,
memory, language) and is often a conscious, controlled
process

Should we wait until we understand the lower-level


processes first? Research in higher-level cognition might
inform research at lower-level cognition and vice-versa.
The study of thinking

Modern view:

• Thinking is an internal cognitive process

• The exact nature of these processes cannot be observed


directly from behavior

• However, most cognitive theories lead to testable


predictions. Behavioral experiments can test these
predictions. Cognitive processes are inferred indirectly
from behavior.
Well-defined & Ill-defined Problems
Well-defined problems have completely specified initial
conditions, goals, and operators  works well with computer
simulation

Ill-defined problems have some aspects which are not


completely specified  sometimes requires insight to see
problem in a new way

1. Writing a good paper = ?


2. solving an algebra problem = ?
3. conducting a statistical significance test = ?
4. designing a good experiment = ?
5. choosing a president = ?
6. reducing drunk driving = ?
7. being a nice person = ?
Well-defined problem solving - given state
- goal state
- obstacles
- operators
INITIAL STATE

GOAL STATE

INITIAL STATE GOAL STATE

Play the game: http://www.mazeworks.com/hanoi/


problem solving strategies

How to solve the maze?


- trial and error
- forward
- backward
- means-end analysis
• Most problem solving situations involves a combination
of planning (means-end analysis), trial and error, and
reinforcement learning and perhaps ... insight

• Reinforcement learning  grew out of behaviorism


• Insight  Gestaltists view
• Planning  grew out of AI and cognitive psychology
Learning by Reinforcement
Associationist theories of thinking -> thinking as response learning

R3
S
R2
R1
Three elements of associationist theory:
1) stimulus: a problem solving situation
2) response: a particular problem solving behavior
3) associations: strength between stimulus and response
Thorndike’s work on cats in a puzzle box

• Cats initially solved the puzzle box problem by trial and


error – trying various responses until one accidentally
worked
• After being placed in the box many times, it learned the
successful response and pulled the string almost
immediately
Habit Family Hierarchy

Try most dominant


response first, then
second strongest, etc.
1) Law of exercise: practice tends to increase S-R link

2) Law of effect: responses that solve a problem


increase in strength. Responses that do not help
solve problem lose strength

R3
S
R2
R1
• What about response chains?

• E.g.:

start
goal

• How can path from initial state to goal state be


strengthened? How to avoid dead-ends?

• How can we reward a successful action that only much


later in time leads to success?  problem of delayed
reinforcement

• Modern reinforcement learning involves passing


strengths of successful responses back through a chain.
Maze example

• Reinforcement learning example for mazes


Reinforcement Learning

• Behavior follows simple associations in response chains.


No planning, no mental maps, no “insight”
• Learning from very simple feedback: failure or success
• Associative strengths between response chains are
learned. Passing strength back in time

start
goal
Demo’s
Reinforcement learning in mazes:
http://www.ise.pw.edu.pl/~cichosz/rl-java/

Reinforcement learning in robot-arm control:


http://www.fe.dis.titech.ac.jp/~gen/robot/robodemo.html

Robot learning task of pole-balancing and devilsticking:


http://www-clmc.usc.edu/movies/learning.html
Some Amazing Anagrams
Original Becomes...
Dormitory Dirty Room
Desperation A Rope Ends It
The Morse Code Here Come Dots
Slot Machines Cash Lost in 'em
Animosity Is No Amity
Snooze Alarms Alas! No More Z's
Alec Guinness Genuine Class
Semolina Is No Meal
The Public Art Galleries Large Picture Halls, I Bet
A Decimal Point I'm a Dot in Place
The Earthquakes That Queer Shake
Eleven plus two Twelve plus one
Contradiction Accord not in it
To be or not to be: that is the question, whether In one of the Bard's best-thought-of tragedies,
tis nobler in the mind to suffer the slings and our insistent hero, Hamlet, queries on two
arrows of outrageous fortune. fronts about how life turns rotten.
"That's one small step for a man, one giant leap A thin man ran; makes a large stride; left planet,
for mankind." -- Neil A. Armstrong pins flag on moon! On to Mars!
Stimulus Response
(a new letter combination)

gorwn S R1 grown

R2 wrong

R3 wrgno

R4 …

Anagram solving time depends on:


- familiarity of goal word
- letter transition probability of goal word
- letter transition probability of presented word
- number of moves
Class Experiment

• Replicate effect of familiarity


Ready...?
• nrdki
» (drink 7.0)
• aewtr
» (water 3.0)
• cahtb
» (batch 16.0)
• milbc
» (climb 7.5)
• kcler
» (clerk 17.5)
• rtypa
» (party 14.0)
• huocg
» (cough 23.5)
• rmcap
» (cramp 12.0)
• nrdki
» (drink 7.0)
• aewtr
» (water 3.0)
• cahtb
» (batch 16.0) Mean solution times:
• milbc
» (climb 7.5)
High familiarity = 7.9 sec
• kcler
Low familiarity = 17.3 sec
» (clerk 17.5)
• rtypa
» (party 14.0)
• huocg
» (cough 23.5)
• rmcap
» (cramp 12.0)
• Can all thinking be described by trial and error/ stimulus-
response?

• What about insight?  Gestaltist view

• What about planning?  AI view


The Handcuffs Puzzle

The Set-Up For this puzzle you need two people, some rope and some
empty space to do the puzzle in. Each person will need a piece of rope
with a loop tied in both ends, so it can be worn as handcuffs. The rope
should be reasonably long, so that the person wearing it can easily step
over it if they want.
Each person puts on a complete set of handcuffs. Before putting them
on, they loop their handcuffs around each other so they are tied
together. Each person should wear a complete set of handcuffs. They
then have to get themselves apart while following these rules:

The handcuffs cannot be removed.


Do not break, cut, saw through, bite
through or in any other way damage
the rope. Damaging each other is
probably a bad idea too.

content copied from: http://ccins.camosun.bc.ca/~jbritton/jbhandcuff.htm

Das könnte Ihnen auch gefallen