Sie sind auf Seite 1von 6

1

Introduction

Question: A Multi-Modal Recurrent Neural Networks based approach to working memory could show that we can dissolve the frame problem by eliminating
search or: Multi-Modal Recurrent Neural Networks are a promising approach
to answer the question how we can abstract the mechanisms needed for a Global
Workspace to emerge in an artificial agent
Structure: Explanation Frame Problem, why attempts to solve it with GOFAI/Bayesian approaches will lead to infinite regress.
Showing that human agents dont suffer from the frame problem. Heideggarian concept of thrownness, Wheelers situated agents
Proposing how MMRNN could answer the question.

The frame problem

Good old-fashioned A.I. (GOFAI) is the term used to describe the logicians approach to manufacture artificial agents that can reason in an intelligent manner.
The main idea was that knowledge about a problem, or even the real-world, can
be sufficiently represented as a set of logical axioms and that agents can use
them to reason about the state of affairs and to plan actions in order to reach
a goal. Initial successes, like the General Problem Solver of Newell and Simon,
sparked a lot of interest in the field and they proudly proclaimed that there
exist now machines in the world that can think and act like a human being.
Newell and Simon also proposed the Symbol-System Hypothesis, the idea that
the human brain is essentially a symbol-processor, thus manipulates symbols,
for instance logical propositions. They go even further and claim that intelligence is in fact a search problem, which somehow implies that we just need the
right set of fast search-algorithm to be able to create an intelligent system. This
positivism lead to the thought that intelligence will be solved within 30 years.
However, this turned out not to be true. While the logicians approach has been
quite successful in several areas of higher-level intelligence, for instance chess,
theorem proving, and mathematics, it struggled heavily with lower-level intelligence like common-sense reasoning and even every-day activities like fixing
a broken lamp in the kitchen. One of the reasons was, and still is, the frame
problem. In order to understand the frame problem, we first must take a closer
look at the notion of intelligence. Wheeler [1] writes that any system worth
the epithet intelligent must be able to retrieve from memory just those items
or bodies of stored information that are most relevant to its present context,
and then decide how to use, update or weight that information in contextually
appropriate ways. In some sense this sounds like the search problem of Newell
and Simon. First of all, we have the notion of memory, which can be thought
of as the knowledge-base of the agent. Then, as Wheeler writes, the system
must be able to retrieve only relevant content from the knowledge. So it must
access the knowledge-base and decide on a case-by-case basis which knowledge
is relevant and which is not. Using the retrieved set of knowledge, it must de-

cide with action to take, for instance updating the knowledge-base or initiating
a motor action. Given a finite, or even infinite set of actions, this is again a
search problem. The system must access the part of the knowledge-base which
stores potential actions and then, on a case-by-case basis, select the one which
it assumes to provide the best utility. One important notion in Wheelers definition of intelligence is that the agent must do so in contextually appropriate
ways. And this is precisely where the frame problem provides its most demolishing blows to GOFAI. It essentially means that the agent must consider the
situation he is in. It obviously doesnt make sense to search through all actions
that are related to cooking when planning the next move in a game of poker. So
the programmer must provide a frame of relevance which, according to Dreyfus, confronts the A.I. programmer with three essentials problems: (1) Which
facts are possibly relevant, (2) which facts are actually relevant, and (3) which
of those are essential and which are irrelevant. The first problem means that
the programmer must include all facts that may effect the outcome of an action.
The second problem means that, given the set of possibly relevant information,
which subset is the set of facts that indeed effect the outcome of an action.
And the third narrows down the set even more by separating it into a set of
essential and a set of inessential facts. Its easy to see that this is a complicated
task because of two reasons. First, how can the programmer decide a-priori, for
every possible action, for every possible context, and for every possible mixture
of contexts what is relevant and what not? Many would argue not, and I agree
with that. So if we assume that the programmer doesnt decide it, the machine
must. Which again means it must search and given that a computer [...] must
treat all facts as possibly relevant at all times (Dreyfus), it must do so exhaustively. It seams that we are facing a trade-off with two options: We can
either deliberately exclude possibly relevant facts or let the machine perform an
exhaustive search through a possibly infinite set of facts. Both dont seem to
be satisfying.
There were multiple attempts to solve the frame problem but because of
the word-limit imposed on this essay I will only focus on the most relevant
one with respect to my question. I refer the reader to the [1] for a thorough
philosophical exploration of the topic. One attempt to solve the frame problem
is the idea of heuristics. Based on the assumption that human-agents are not
ideally rational but suffer from a phenomenon called bounded rationality, the
conclusion was made that the human brain uses heuristics to guide the search
and to reduce the search space by excluding branches of the search-tree. The
heuristics are a function of the context the agent is in. But how does the system
decide which heuristics are relevant and which are not? It will therefore need
meta-heuristics that limit the search space of possibly relevant heuristics. But
how does the system decide which meta-heuristics are relevant? It will need
meta-meta-heuristics and so on. Its clear that this leads to an infinite regress
as the frame problem strikes again on each level.
Given this insights, I argue that as long as we depend on search, we cant escape the frame problem and this is where I think Wheeler has found a promising
escape route even though he doesnt mention it explicitly.
2

Eliminating search

Wheelers philosophical exploration is built upon Heideggers concept of thrownness. Throwness means that a human agent is always embedded in a situation,
the environment, and every single part of it, including the human agent, is connected with each other. The human agent isnt just a passive observer who
builds a mental model of the world but is directly embedded into the holistic
system of relationships. Accepting thrownness means to discard the dualism of
GOFAI and to accept that everything is one huge holistic system with infinitely
many connections between its constituents. This notion has wide implications
as it allows us to explain how the human agent is able to act in contextuallyappropriate ways. Because the human agent is part of the environment from
day-one, and because his brain is sufficiently strong enough to learn causal structures through learning, exploring and making experiences, he more and more
merges with the environment. The brain forms itself in such a way so that
it always knows what to roughly expect from a given situation. The context
is not build explicitly after the sense organs forwarded raw sense data into the
brain but already interwoven with the raw data itself. More strongly, the human brain cant even escape the context that is provided by the raw sense data
much as a spaceship cant do anything but to obey the law of physics. If we
accept this, it is possible to show that we dont need context-sensitive search
anymore as we are always within a context. The environmental itself puts us in
a meaningful context and there is no need for additional cognitive processing.
The first question we must answer is how the human forms itself with
the respect to the environment. I want to refer to (Brey2001Dreyfus) and his
analysis of Dreyfus view on neural networks. As cognitive science has shown,
the brain is essentially a huge neuronal network that consists of neurons that are
connected to other neurons by synapses. Neurons can be viewed as information
processing systems that act upon stimuli input. The input can either come
from other neurons or directly from sense organs. Communication is done by
electro-chemical signals which are send to connected neurons, glands, or muscles.
Cognitive scientists think that intelligence is a function of the body and of the
graph that the neuronal network forms. Learning is the process of continuous
modification of the neural network. Some neurons die off, others get strengthen,
connections are made and broken and so on. So learning depends on the sensor
input from the environment, and given that we are part of the environment
from day one, all our learning is a function of it. So the connections that are
formed between neurons are formed because of environmental input. Neurons
die because they are not necessary anymore, and they are not necessary anymore
because the environment didnt provide input that makes use of them. One may
have noticed that I claim that all our learning depends on sensor inputs and
object that this is not true because we also learn by introspection and selfreflection. My answer to this is that I see thoughts and imagination also as
sensor input. If we agree that thoughts (and mental images) are produced by
the brain, and not by some Cartesian mental substance, then they are obviously
a function of the current brain state. Because the brain state is a product of
3

the environment, thoughts are also a product of the environment, but then fed
back directly into the nervous system. I consider this feedback loop as some
kind of sensory input and conclude that therefore all learning depends on the
environment. Of course, this feedback loops introduces a whole new dimension,
as neurons interact with each other and thus form very complex patterns that
alter in a subtile manner the current brain state from within.
So how can context can be interwoven within raw sensual data? Assume
that an external signal, for instance a sound wave, enters the nervous system
through a sense organ, in this case the ear. The intrinsic properties of the sound
wave (e.g. amplitude, pitch, frequency) will trigger only those neurons that are
trained to react to these properties. Some neurons will be triggered more
heavily than others. The neurons fire, which means that they release an electrochemical signal through the synapses to their adjacent neurons, which then
again fire and so on. This cascading effect propagates the signal downstream to
the auditory cortex. From there, the nervous system propagates the signal to
other areas of the brain, for instance the motor cortex, the amygdala etc. Its
important to see that this cascading effect is a function of the sound-wave. My
argument is that precisely this cascading effect is the interwoven context that
comes with raw sensual data. Meaning therefore arises by the holistic response
of the nervous system. Again, the recurrent property of the nervous system
becomes very important. At any given time, there are always multiple modalities
at play. The agent is seeing something, and the light waves are entering through
the eye into the nervous system. The agent is feeling a light wind-breath and
the haptic system forwards this input to the brain. This combination of input
simultaneously effects the current brain state and the sum of all interwoven
contexts, together with thoughts, form a context-of-contexts.
As we see, context is just there, there is even no escape from context. Hence
it is not necessary anymore to search some knowledge-base in order to select
the appropriate kind of action. We can eliminate search from the equation and
conclude that the nervous system doesnt suffer from the frame problem. But
one question remains open, how does the human agent now learn which actions
are best given a context? This is a broad question and I will not fully attempt
to answer it, but I will give a small account of my view on this. As we have seen,
the human agent is part of the environment from day-one. In the beginning,
he is a small and helpless child that cant do much besides crying out for help.
But as he uses this ability, he soon learns that crying has effect, namely that
the mother comes and feeds him. He will also discover that he will get attention
whenever he does certain things and that the reaction differs with respect to
his behaviour. He will also discover his sense organs and that they provide
feedback from the environment. Gradually the brain becomes more adapted to
the environment and learns to interact with the holistic system. During all this
time, the nervous system is evolving rapidly by - as explained above - generating
and destroying neurons and connections. Good actions will lead to effects that
are positive for him and bad actions cause the opposite. The brain somehow gets
trained in such a way, that it favours good actions. So isnt it still searching
for the right action? No, because actions are a function of brain state, and
4

brain state is a function of the environment. And the way the brain state is
formed is due to experiences. Hence actions are deterministically chosen given
a situation. There is much more to explore here and the whole discussion is of
huge philosophical interest, especially how it can then be that we act as if we
have free will? Id argue here that this is due to the unbelievable complexity of
the nervous system and its generative facility of thoughts and mental imaginary.
But this is out of scope for this essay and I politely ask the reader to accept my
view for the sake of the argument.

Multi-Modal Recurrent Neural Networks

I am now ready to tackle my hypothesis, namely that Multi-Modal Recurrent


Neural Networks are a promising approach to answer the question how we can
abstract the mechanisms needed for a Global Workspace to emerge in an artificial agent. To do this, I first need to explain what the global workspace is.
Global workspace theory was first perceived by Bernard Baars in 1988 and is
described in his book A cognitive theory of consciousness. It essentially is a
model for working memory, the space in the brain which we use to combine
sensory inputs, with thoughts and mental imaginary in order to produce effective action. Working memory has only a small duration of 10 to 30 seconds but
plays and essential role in conscious and unconscious processes, as it is able to
interact with nearly every part of brain. It can be seen as the reasoning engine
of a logical system in which algorithms are applied to a set of axioms in order
to find new theorems. Cognitive scientists have found that working memory
is important for decision making and this makes it clear why global workspace
theory is so important when we attempt to model an intelligent agent. As weve
seen above, intelligence occurs whenever an agent chooses a particular action
in a context-sensitive way. We have also seen that working memory is the part
of the brain where such action is chosen. So it makes sense to conclude that
we must model working memory to be able to manufacture an intelligent agent.
Global workspace theory is one of this models, so the questions arises how we
can put context into the global workspace. I have argued earlier that the human
nervous system doesnt suffer from the frame problem as it doesnt search. Context is always there as a function of the situation. Thus any attempt to model
working memory by means of global workspace theory needs to provide context
without resorting to search. As this would reintroduce the frame problem and
annihilate any attempt for a generalised intelligence.
So what do we have at our disposal? In recent years, deep neural networks
have made great advances in the field of computer vision, audio recognition
and other classification tasks (see blabla). Moreover, deep recurrent neural
networks were used for generating speech and sentences in natural language
and generative deep networks were even used to create novel art. So there is
much potential and I argue that multi-modal recurrent neural networks are the
way to go forward. The way convolutions neural networks recognise images
is very close to the way the eye perceives images and they work better than

any other attempt before. So we are getting closer to a more realistic model
of sense organs. But we need to combine them. Multi-modal means that we
must start to combine the different modalities of sense into one big network.
For instance, we could use one deep convolutions neural network for image and
another one for sound and combine them into a big one. With recurrent I mean
that we connect neurons that are further within with neurons further outside
by means of interdependent connections. Then the neural network responsible
for audio can effect the network responsible for vision and vice versa. Deep mind
has shown in 2014 the Neural Turing Machine, a neural network that can model
short-term memory and learn and recall sequences. This could be an interesting
starting point for modelling the global workspace. Multiple modalities can be
used as input to the Neural Turing Machine, which acts as global workspace in
order to produce action. The reasoning why I think this is a promising approach
goes as follows: If we manage to create an artificial environment for an artificial
agent, with which it can interact and learn from day-one, and if we find the
means to provide negative and positive feedback with respect to its actions,
then this agent will become one with the environment and therefore prove that
the notion of thrownness is correct. The multi-modal neural network that forms
its brain will develop as a function of the environment just like a human nervous
system is a function of the real world and the agent will be able to take action.
Its important to note that the whole exercise is not to model an agent that has
qualia like a human - this is impossible as its world of experience is totally
different - but to show that we can model a global workspace that is able to
interact with an environment without having to use search. If this works, then
we know that we have solved the frame problem. Of course the question remains
how such an environment should be modelled etc. but I leave that to the A.I.
researchers of the future.

References
[1] Michael Wheeler. Cognition in context: phenomenology, situated robotics
and the frame problem. International Journal of Philosophical Studies,
16(3):323349, 2008.

Das könnte Ihnen auch gefallen