Towards Combining Data and Process Oriented Models of Language Parsing?

Towards Combining Data and Process Oriented Models of
Language Parsing?
Hielke Prins (hielke.prins@student.uva.nl)
Institute for Interdisciplinary Studies, University of Amsterdam
Abstract ∙ As opposed to data oriented models of language processing, process oriented models

provide a more detailed account of the time steps involved in solving the problems of comprehension
and generation. This paper compares the two approaches to see where they meet each other. In
particular, I will explore the feasibility of implementing Data Oriented Processing (Bod 1992) within
the informationprocessing cognitive architecture ACTR (John R Anderson et al. 2004). I will illustrate
the constraints adapted by ACTR using a number of models from the literature and show why these
prevent a straightforward implementation. I will end with a brief discussion of this result.
OUTLINE acquire and use knowledge about the world to

Introduction.........................................................1 perform various complex tasks (John R
Anderson et al. 2004). Language processing is
Data oriented models......................................1 one of the most astonishing complex tasks that
Process oriented models.................................2 human agents have to deal with and modeling it
one of the most difficult ones.
Unification of data and processes...................2
Frameworks.........................................................3
Data oriented models
Data Oriented Parsing models........................3
Models of natural language processing
The ACTR framework...................................4 traditionally focus on parsing representations of
linguistic knowledge. They formalize explicit
The contours of DOP in ACTR ....................5
storage and efficient use of linguistic data that is
Literature.............................................................6 acquired from realistic corpora of human
Grammar representation.................................6 utterances. Models and human performance are
usually compared in terms of error rates in
Transmission limitations.................................7 production and judgments about ambiguity or
Adaptive transformation.................................8 grammaticality in comprehension of sentences.
Although, cognitive plausibility of linguistic
Linguistic informationprocessing..................9
models has been a topic of interest since the
Discussion.........................................................10 inception of theories that assume an innate
Conclusion.........................................................11 grammar, performance of the models has got the
most emphasis. Data Oriented Parsing (DOP)
References..........................................................11 models (Bod 1992) aim at explaining human
performance by replicating correct and incorrect
parsing using only the data. Without the
INTRODUCTION requirement of innate rules, they are cognitively
In one of the broadest definitions of cognitive more plausible as models of linguistic data
science, it attempts to understand how agents representation.
1 / 14
° ° =
a b c d e f
Figure 1: An example derivation in DOP using substitution of constituent subtrees. Shortest derivation will prefer the
solution shown here (cf), even when two smaller subtrees (a, b) are included in the treebank because it tries to minimize the
amount of substitutions necessary to obtain the result.
However, realistic cognitive models will have to interaction of two or more experimental factors
meet a second constraint: it should be possible to that only have an influence on the processing
implement the process of handling these times of different modules (Sternberg 1969b).
representations within the substrate of a human The serial alignment of modules in a stage thus is
brain. For this reason continuous models of one of the basic assumptions behind the use of
language processing have been proposed that additive factors methods. Total reaction time is
typically rely on a neural substrate. The taken to be the sum of processing times of all
Recurrent Neural Network (RNN) models of individual stages. This assumption often needs to
grammatical relations (Elman 1991) and the be compromised because many of the functional
Hierarchical Prediction Networks (HPN) are mechanisms operate in parallel, presumably in
examples of such continuous neural models. The different specialized brain regions. For example,
latter (Borensztajn et al. 2009) is proposed as a when copytyping humans seem to be able to
potential neural implementation of DOP models. read new and reproduce already read material at
the same time.
Process oriented models As pointed out by Miller (1988), discreteness and
continuity can not be regarded as a simple
Outside the linguistic domain, an information
processing approach to human cognition has long dichotomy. Rather, the individual stages in an
informationprocessing model can be described
been the most dominant. Within this approach,
the focus is not so much on the representations in terms of discrete and continuous with respect
to the representation of the data they process
used in performing a task, but on the functional
mechanisms that are proposed to operate on (representation), the nature of this processing
(transformation) and the way input and output
these representations. In particular, the time
needed by each of the mechanisms is usually representations are transferred to the next stage
(transmission).
explicitly modeled in order to compare model
predictions with human reaction times. Language processing seems to be an incremental
As opposed to continuous models, discrete task that can accurately be described in terms of
sequential serial processing using discrete
informationprocessing models of human
performance typically describe a set of separately contextdependent representations (Friederici
1995). Linguistic representations on the level of
modifiable processes called modules. Modules
aligned in a chain without temporal overlap form words and multiword combinations are discrete
by their nature although some of the
a stage. Using tools like the Additive Factor
Method (AFM), the existence of independent transformations used to process them, such as
manipulation of retrieval and resolving might be
stages can be determined by analyzing the
continuous. A cognitively plausible model of
2 / 14
language processing is thus most probably an
hybrid model.
Unification of data and processes

Adaptive Control of Though – Rational (ACTR)
is an integrated theory of human cognitive
architecture (John R Anderson et al. 2004) that
aims to provide a set of common constraints to a
variety of informationprocessing models.
Similar to the application of DOP models to
other domains of cognition (Bod 2005),
cognitive architectures are an attempt to arrive at
a unified theory of cognition. Where the DOP
approach to unification leads to the notion of
trees as a Universal Representation (Bod 2002), Figure 2: Buffers and transmission between them in the
an ACTR approach yields universal constraints ACTR architecture
on processing and transmitting representations.
combines these tree representations in subtrees
In the remainder of this paper I will discuss how of arbitrary size (Figure 1). Predictive results
aspects from data and process oriented models explain an interesting range of phenomena seen
might be combined in order to give a more in human output.
detailed description of possible implementations
Furthermore, unsupervised DOP (UDOP)
and to reproduce a broader range of phenomena
models explain induction of grammatic
in language processing.
regularities from concrete exemplars (Bod 2006).
First a brief description a data oriented and a Such a mechanism is necessary to account for
more process oriented theoretical framework is language acquisition when one does not want to
given. DOP serves as the former, ACTR as the assume an innate (universal or language specific)
latter. Some of the general principles required to language faculty.
implement aspects of DOP within the ACTR
Where the representations are stored in the brain
architecture are stipulated. Finally, I will discuss
and how the brain implements the postulated
some examples from the ACTR literature that
processes of derivation are questions that are
bring data and process oriented approaches
currently not answered within the DOP
closer together and whether it does so in a DOP
framework. However, DOP models can provide
compatible way.
predictions of reaction times by expressing the
time required for the fastest derivation directly in
terms of frequency of the representations used
FRAMEWORKS during that derivation plus a constant (Bod
2001). Retrieval time for a single subtree
representation is then defined as:
Data Oriented Parsing models
DOP models of human language processing 1
ti = (1)
follow the tradition in linguistic modeling in their 1  log  f i 
strong focus on representations. DOP models
postulate tree representations of linguistic data An ACTR model on priming (Reitter et al. 2010)
and a derivation process that divides and discussed later in this paper, will provide
3 / 14
evidence for the claim that such a direct available in the module buffers to slots of
derivation of reaction times from frequency of specialized goal chunks in the goal buffer.
occurrence in a corpus is only valid as an Asymmetric rules in the procedural module
approximation of long term memory effects. To determine the conditions that chunks have to
include short time priming effects, DOP models meet to be selected depending on the current
have to be extended by decay and recency goal chunk.
functions1 The right hand side of a production rule specifies
the actions that will be taken when the conditions
The ACT-R framework are met. For instance, it could request retrieval of
another chunk from declarative memory, replace
As an information processing framework, core of the current goal chunk or execute a key press by
the processing constraints postulated by the placing a chunk in the motor buffer. Only one
ACTR are the buffers specialized in specific rule can be applied at the same time.
cognitive processes. Figure 2 gives an overview
of the ACTR buffers and the transmission of Apart from their relevance to the current context
described in the goal buffer, selection and
representations between them.
execution of these rules is based on their relative
Like human cognitive agents, a model within the utility.
ACTR framework communicates with the
outside world though visual and manual Subsymbolic processes
operations executed by the model upon requests
Utility of production rules as well as the retrieval
and within specialized modules. Each module is
probabilities (defined as activation) of chunks in
associated with a buffer that temporary stores
the declarative module are subject to
transmitted representations. Although not of
subsymbolic processes in ACTR.
further concern here, this enables models to
replicate the whole process of cognitive Activation Ai of chunks is defined in terms of a
processing including eye movements, reading baselevel activation Bi and the weighted
times, key presses or auditory processing (in an association with slots in the current goal chunk:
auditory module not shown here).
A i = Bi  ∑ W j S ij  
(2)
j∈C
Declarative and production modules
Most relevant for the task of language processing Where W j is the ratio W T /n between total goal
considered here, are the declarative and the activation W T (usually 1 but believed to reflect
procedural modules. The declarative module individual differences) and the total amount of
stores facts assumed to be known by the model. attributes n of the chunk currently in the goal
These facts are encoded by chunks, a list of buffer.
attributes with associated values.
The weighted association with a chunk currently
Attributes can be considered as slots. Their hold in the goal buffer is known as the spreading
values are available for replacements with other activation mechanism (J. R Anderson 1983). S ij
chunks. The production module is responsible for is estimated by the amount of other chunks m
this process of assignment and does this in cycles associated with the goal:
of three subsequent steps: matching, selection
and execution. S ji = S − ln m (3)
Matching entails assignment of chunks currently
1 The need to account for decay and recency effects is Baselevel activation Bi is defined as:
discussed in the homework assignments for week 2.
4 / 14
n
Bi = ln ∑ t −d
j  (4) (1): Chunk activation is the sum of a base
j =1 level activation and an associative activation.
Where n is the number of times chunk i is (2): Baselevel activation will show the
retrieved, t j the time since the j th use and d , influence of practice and timebased decay.
a decay parameter (usually 0.5). (3): Associative activation will depend on how
Activation of chunks determines both, the many chunks are associated to a cue.
probability (5) and the time needed for retrieval (5): Probability of retrieving a chunk is the
(6). For a single matching chunk these are: probability that its noisy match score will be
above a threshold.
1
pi = − Ai − τ / s (5) (6): Latency in retrieving a chunk is an
1 e
exponential function of its match score.
t i = Fe− A i
(6) (7): Probability of retrieving one chunk over
others is the probability that its noisy match
Here s is the noise term and  defines the
score will be the largest.
retrieval threshold. A chunk can only be retrieved
when its activation value is above this threshold. Box 1: Summary of the subsymbolic processing in ACT
Activation in (6) are scaled by a parameter F to R adapted from Anderson (1998). Partial matching is
real time, the unit of cost in ACTR. As in DOP left out since current implementations instead promote
the use of spreading activation for this purposed.
we have to add a constant to the retrieval time to
estimate final reaction time. As might be
expected from an informationprocessing equation similar to the delta learning rule
framework, this constant is explicitly modeled as developed for neural models.
the time needed to perform the rules action,
routinely estimated at 50 ms.
The contours of DOP in ACT-R
Competition Given the description of chunks and their
In ACTR, chunks in declarative memory activation values above, the declarative module
compete for selection by production rules by in ACTR seems to be a natural candidate storage
means of their activation. This is a stochastic of the treebank from DOP models. The
process that follows from selection of the highest subsymbolic processes that handle retrieval of
activated chunk (2) but higher differences in chunks provide a builtin account for decay and
activation compared to other matching chunks recency effects.
increase the probability of retrieving the target. Furthermore, linking chunks by assigning them
Over many trials this is approximately described to each others attribute slots allows for the
by the chunk choice equation: hierarchical representations of data required to
represent DOP trees. To stretch this idea to the
e A /s i
limit, consider an hypothetical example of 'data
Pi =
∑ e A /s j
(7) oriented retrieval times' without any decay of
i
chunk activation.
A similar formula (the conflictresolution Without decay ( d = 0 ) the baselevel activation
equation) exists for production rules, with chunk (4) will become simply the natural logarithm of
activation replaced by rule utility. Utility of a past retrievals ln n (8). Assuming the context
rule is subject to reinforcement learning using an depending spreading activation to be a constant
5 / 14
c , the retrieval time of a chunk becomes LITERATURE
inversely related to the frequency of retrieved
past occurrences (9), albeit linear instead of
logarithmically as in the case of subtrees in DOP Grammar representation
(equation 1, page 3):
An excellent example of the use of the Additive
n Factor Method mentioned in the introduction, is
Bi = ln ∑ t −0
j  = ln n (8) provided by the Sternberg paradigm. Constraints
j =1
on representations for an informationprocessing
F e−c− model of serial memory search are derived from
t i = F e−ln nc = (9)
n manipulation of factors that influence reaction
times (Sternberg 1969a). In list memory
Let the goal chunk represent the surface structure experiments testing the paradigm, participants
of a sentence just read from the screen. The DOP learn relatively short lists of items and then have
way to process this sentence using the shortest to judge whether a newly presented item was
derivation criterion is to retrieve the smallest present on that list. Reaction times seem to
amount of subtrees from the treebank that increase linearly with setsize (ie. the length of
together account for the whole grammatical the list that is learned).
structure of the sentence.
Various models in the literature explain other
A logical next step thus seems to be an effects of factor manipulations on reaction times
implementation of some sort of derivation rule in in list memory experiments (Raaijmakers &
the procedural buffer that mimics the behavior of Shiffrin 1992), including ones that try to provide
abstraction(for instance) shortest derivation. The an integrated account for all of these
largest matching parse trees of labeled chunks manipulations. ACTR too claims to provide the
that are already in declarative memory would architecture for an integrated theory of list
then be retrieved and be used for parsing the memory.
current sentence. With or without decay,
activation and thus reaction times will depend on A model developed to support this claim (J. R
previous use. Anderson et al. 1998) correctly predicts various
aspects of list memory tasks, including the linear
However, when representing (sub)trees of a relationship between setsize and reaction times
grammatical structure as trees of chunks in from the Sternberg paradigm. Doing so, the
declarative memory there are a couple of model adopts a tree representation encoding the
constraints in ACTR that prevent models from position of items within a group and groups
directly implementing such an approach. Buffers within the list. Both, nodes and links within these
in ACTR (including the retrieval buffer used to trees are encoded as chunks in declarative
access declarative memory) can only store a memory. A context slot in the link chunks keeps
single chunk at the time and thus operate as a track of the list they represent.
capacity limited processing bottleneck.
Since a sentence is essentially a serial list of
These constraints are direct consequences of the words that have to be remembered during
theoretical commitments ACTR adopts on processing, the tree representation in the list
representation and transmission. The next section memory models has been extended to the domain
will discuss some of these constraints and how of syntactic and semantic parsing (J.R. Anderson
currently implemented ACTR models of various et al. 2001), successfully predicting the results of
tasks requiring sequence parsing, deal with them. six memory retention experiments.
In this model the link between a noun node and
6 / 14
its constituent “dog” in Figure 1e (page 2) is of language comprehension represent grammar
represented by a chunk like this: using rules in procedural memory (R. L Lewis &
isa-syntactic-chunk S. Vasishth 2005). Arguments in favor of
child : dog declarative representation range from unification
parent : N1 between lexical and syntactic knowledge, as in
referent : N Anderson (2001) and Reitter (2010), to symmetry
role : Head
context : S1 between comprehension and generation that is
difficult to obtain with asymmetric production
Note, that such a chunk represents the lexical rules (R. L Lewis & S. Vasishth 2005).
Context Free Grammar (CFG) rule N → dog. A
different example of a syntax chunk Arguments for procedural representation cited by
specification, proposed for a computational Lewis include empirical evidence that not all
model of syntactic priming (Reitter et al. 2010) grammatical knowledge is lexicalized (Frazier
makes this even more clear: 1995), the additional costs of declarative memory
retrieval in ACTR and neuroimaging data
isa-syntactic-chunk
supporting declarative lexical processing and
left : S
combinator : \ procedural grammatical processing in separate
right : NP brain structures (Ullman 2004) previously
associated with corresponding buffers in ACTR
Chunks of this kind are used by the model to
(J. R Anderson 2005). The additional costs of
generate sentences from semantic representations
retrieval from declarative memory are especially
using Combinatory Categorial Grammar (CCG).
important given the constraints on transmission
Since chunks have subsymbolic properties that
discussed in the next session.
determine retrieval time and probability, they
could also be seen as weighted rules forming a Another significant advantage of procedural
weighted CFG (WCFG) or a probabilistic CFG representation of syntactical information,
(PCFG) after normalization to values between 0 especially within the ACTR framework, is that
and 1 upon retrieval (eq. 5). language processing can so be treated as a skill
(R. L Lewis & S. Vasishth 2005), acquired
CFG rules are a special case of DOP trees with a
through the symbolic and subsymbolic learning
depth of one. In favor of DOP that allows
mechanisms for production rules that have been
subtrees of arbitrary size, it is argued that context
successful in modeling increasing expertise in
free writing rules do not estimate the adequate
other domains (J. R Anderson 2005).
probabilities of a parse (Bod 1993; Remko Scha
et al. 1999). They fail precisely because they do In either way, grammar representations are
not take context in account when defining chained by production rules into larger complex
probabilities of their productive units (context structures whose retrieval times and probabilities
free rules). Despite the analogy with a PCFG, are not estimated independently but in order to
language processing models in ACTR are not improve performance of the model as a whole
affected by this line of argumentation because (see Adaptive transformation), countering
the declarative chunks do not fully specify the extension of the argument provided by Bod and
grammar and their probability of retrieval is Scha (1992, 1993) against grain size of
dependent on the context, as shown in the next productive units in PCFG to ACTR models.
section.
Instead of representing grammatical knowledge Transmission limitations
as representations in declarative memory similar
to data oriented models like DOP, other models Reitters choice for CCG is motivated by its
abilities to account for the syntactic incremental
7 / 14
nature of language generation. An incremental Thus the more attributes the current goal chunk
generator selects and adjoins words to the current has and the more declarative chunks it is
syntactic representation as it is produced (Reitter associated with, the less effective boosting the
et al. 2010). Because CCG parsing requires very relevant goal related chunks becomes.
little buffering it is compatible with the Ultimately, source activation is no longer a
constraints on transmission adopted by ACTR. significant contribution in passing the threshold.
Early versions of ACTR provided a goal stack of A parser implemented in ACTR therefore is
an unbounded depth, free from decay and necessarily incremental, as it is severely limited
accessible without retrieval time costs. Although in access to stored representations of both, recent
convenient when implementing conventional previous processing (working memory of chunks
parsers in ACTR (Ball 2003), recent versions of related to the current goal state) and existing
the framework reject such a representation of the parts of a solution (longterm memory).
current problem state for being cognitively
implausible. Instead it is argued that goal Adaptive transformation
memory is not different from other memory
systems and goals should be stored in declarative The model by Reitter (Reitter et al. 2010) is build
memory (J. R Anderson & S. Douglass 2001). to generate natural sentences from lexical forms,
as opposed to comprehension and thus parsing of
As mentioned above, the interface to declarative existing sentences. The model predicts a number
memory in ACTR is the retrieval buffer (Figure of effects related to priming. Reitters defines
2, page 3) whose capacity is limited to a single priming as the repeated choice for semantic or
chunk. Unhindered retrieval of the complete tree syntactic structures, occurring more often than
of chunks representing the current problem state one would expect based on the prior probabilities
(ie. a partial parse of the sentence observed so of the relevant structures alone (Reitter et al.
far) is thus no longer possible. 2010).
The spreading activation mechanism offers a Prior probabilities are in ACTR reflected by the
way to work around this constraint by boosting
base level activation Bi of a chunk (eq. 4) and
activation of chunks in declarative memory that
give the context independent probability that it
are currently relevant to the goal (equation 7,
matches a production rule. Adaption of Bi using
page 5), easing and shortening their retrieval.
equation 4 is called base level learning.
The context slots of link representations in
Spreading activation gives the conditional
declarative memory, proposed for parsing based
probability (likelihood) that the chunk is needed
on list memory (J.R. Anderson et al. 2001),
given the context (eq. 7). Together they provide a
exploit this mechanism by providing an explicit
Bayesian interpretation of adaptive chunk
association between root node (the list or
activation ( Ai , eq. 2) that is designed to ensure
sentence) and its constituents.
optimal availability of a chunk when it is needed
Spreading is limited to a single level of the most (John R. Anderson & Milson 1989).
derivation2 and the total amount of source
It might come as no surprise that ACTR models
activation is constant (J. R Anderson et al. 1996):
commonly explain priming in terms of the
context conditional probabilities (ie. spreading
∑W j = constant
(10) activation), for instance in the Stroop task (van
j ∈C
Maanen & van Rijn 2007). However, the base
2 Note however that not everyone agrees on this depth level activation accounts for frequency (eq. 8) as
limit, with the Stroop interference model (van Maanen well as recency (eq. 4) effects. If a chunk just
& van Rijn 2007) and work by Ball (2003) being
notable exceptions.
used before is retrieved again before full decay of
8 / 14
activation to base line, these effects add up, chunks. Together they make interference a matter
unifying shortterm priming (based on repeated of the relative decay between recently and
retrieval before decay) and longterm priming currently active chunks that are similar to a
(based on frequency). common cue.
Reitter (2010) combines base level learning and Subsymbolic continuous transformation of the
spreading activation in a single model that availability of representations to subsequent
predicts qualitative and quantitative differences stages in an ACTR model thus explains priming
between shortterm and longterm priming. In and interference effects. Furthermore, they are an
particular, spreading activation accounts for the important mechanism to deal with the locality
selective effect of lexical boosts on shortterm and capacity constraints in working memory
priming because it requires the lexical items illustrated in the previous sections. They do so by
providing the boost to be in one of the buffers. increasing retrieval probabilities of
The same interaction between decay in base level representations that are likely to be necessary
learning and spreading activation has been given recent and longterm parsing history and
proposed to explain results in the probedigit task within the context of the current state of
(Waugh & Norman 1965) in favor of an processing.
explanation based on interference. The probe
digit task is a classical list memory experiment Linguistic information-processing
(see Grammar representation). Participants were
offered a list of unique digits and had to recall The elements of decay, activation boosts on
the successor of one of them afterwards. They retrieval and similarity based interference are
were asked to focus on the currently presented combined in a proposal for an integrated theory
stimulus without rehearsal. of working memory for cuebased sentence
processing (Richard L. Lewis et al. 2006). It
Interference is believed to increase with the exploits spreading activation to retrieve partial
number of candidate retrieval items, while decay representations using a cue in the incremental
is a time based effect. The original publication way described above. Cues are not simply the
reports an effect on the number of items between features of a lexical item (as is the initial
probe digit and retrieval cue but not of the simplification by Reitter) but are derived from
amount of time between them. The amount of the current word and context.
time available for decay was manipulated using
two conditions: slow and fast presentation of the Similarity based interference provides the model
digits. Reanalysis of the results yields a better fit with an explanation for storage load effects such
for a model based on ACTR's base level learning as dependence of reading time on expectations
equation complemented by mimicking its raised due to yet unfinished grammatical
spreading activation (Altmann & Schunn 2002). constructions (still expected verbs for instance
represented by competing cuematching
Although, Altmann and Schunn, rehabilitate the argument chunks in working memory).
role of decay they do not abandon interference;
they both play a role. Interference in ACTR Extending probed retrieval in the revised list
arises because of the competition between active memory experiments of Altmann and Schunn
chunks in declarative memory (eq. 7). Spreading (2002) using spreading context activation, decay
activation narrows down the competition to cue and interference explain locality and antilocality
relevant and similar but irrelevant chunks. Early effects. That is they account for the costs of
rapid decay in base level learning likewise keeping syntactic predictions in memory as a
selectively boosts the activation of most recent relation of the distance between the two
constituents involved (Gibson 1998), the
9 / 14
constituents being the cue and the retrieval target. with the compressor mechanism in HPN3.
Matching in HPN takes place in a multi
dimensional substitution space in order to allow
DISCUSSION for graded syntactic category membership that is
This paper compared two frameworks for data bootstrapped from scratch. Borensztajn (2009)
oriented (DOP) and processing oriented (ACTR) argues that the classical notion of a category
models of language processing. In particular, I from the symbolic paradigm typically requires
tried to investigate whether the constraints of the innate categories that are initially left without
latter allow for an implementation of the former. members.
Although ACTR and DOP strive a similar goal On its symbolic level, acquisition of categories in
of unification (Bod 2005) they do so in different ACTR is indeed a discrete process but
ways. As a cognitive architecture inspired by subsymbolic processes allow for graded
informationprocessing models, ACTR puts the associations of chunks with each other and with
emphasize on shared constraints on transmission production rules. Moreover, these continuous
whereas DOP assumes a common representation. transformation are governed by a Bayesian
Does ACTR provide the functional primitives to learning algorithm that in principle maximizes
implement DOP in a cognitively plausible usefulness of the acquired corpus of
fashion? representations, given the models environment.
Discussion of its constraints along the three A recent extension of DOP, Darwinised Data
dimensions of discreteness phrased by Miller Oriented Parsing (DDOP), uses backoff
(1988), showed that DOP like representations are strategies to bootstrap from an initially empty
in principle available in ACTR. However, treebank in an incremental way (Cochran 2009).
constraints on discrete transmission following DDOP replicates successfully applied
from capacity limited working memory governed representations by reinserting those that were
by continuous availability transformations, force used in generating output in the treebank,
the processing models discussed here to favoring more useful abstract generalizations
construct and retrieve these representations using while competing for the same substitution sites
an incremental parser. (since they are likely to be used more often).
DOP models are not by themselves incremental In ACTR chunks compete for retrieval when
since disambiguation depends on evaluation of matching conditions of production rules.
all possible derivations of an input sentence in Procedural learning in ACTR on a symbolic
the treebank. However, HPN (Borensztajn et al. level generally favors instantiation over
2009) currently uses a leftcorner parser to abstraction to reduce the amount expensive
address the same issues on locality and its retrievals from declarative memory, but like D
compressor node has a limited capacity. DOP, it can account for both.
An ordered set of slots on a compressor node Existing multiple production rules that
constitutes a production that is matched against consistently fire in sequence may compile into a
input or previously compressed sequences. A single one while encapsulating attribute values
process highly analogous to production rule from chunks retrieved from declarative memory
matching in ACTR. Neural implementations of during the process. The routing model by Stocco
the procedural buffer that have been proposed, 3 Most notable difference is the use of gates to remain
such as the conditional routing model (Stocco et control over the temporal order of production
al. 2010) or the one in ACTRN (C. Lebiere & J. execution. In ACTRN gate nodes represent specific
R Anderson 2008) presumably share features production rules and their association with the chunks
required in the lefthand side of that rule.
10 / 14
(2010) claims to provide a potential neural solution and an implementation describing
implementation of such production compilations. instantiation on the available substrate. As shown
In the other direction, new rules may be obtained here by the three different approaches, trees fit in
by analogy using examples from working or all of them.4
declarative memory (John R. Anderson &
Fincham 1994).
CONCLUSION
I showed that direct implementation of DOP
within ACTR is hindered by the constraints it
adopts on informationprocessing. The models
from Anderson (2001), Reitter (2010), and Lewis
(2005) I discussed, provide encouraging evidence
that modeling important aspects of human
language processing is feasible within these
constraints. In the discussion I furthermore
showed that other models actually adopt some of
same constraints, either to improve acquisition
capabilities (DDOP) or to increase cognitive
plausibility (HPN).
Implementation within a general cognitive
architecture might be advantageous over a neural
implementation or an evolutionary account for a
number of reasons. First, it allows for integration
with other aspects of cognition in general and
language processing in particular. As a special
case it allows for combining semantic and
syntactic processing (as in Anderson and Reitter)
without formal representation (as in Bonnema et
al. 1997). Second, as opposed to neural models it
provides a transparent symbolic interface to the
processes involved. Third, at least in the ACTR
framework, prediction of reaction times follows
naturally from the inherited discrete nature of
stage models (Sternberg 1969b). Finally,
predictions might eventually extend to other
measures of mental chronometry (Prins 2010) or
brain activity (as in (J. R Anderson 2005).
That said, the different levels of implementations
in HPN, ACTR and DOP do not directly
compete with each other. Marr (1982) famously
identified three separate levels of information
processing: a computation describing the
problem, an algorithm describing the steps in a 4 With apologies to Rens Bod
11 / 14
REFERENCES
Altmann, E.M. & Schunn, C.D., 2002. Integrating decay and interference: A new look at an old
interaction. In Proceedings of the 24th annual conference of the Cognitive Science Society. p.
65–70.
Anderson, John R et al., 2004. An integrated theory of the mind. PSYCHOLOGICAL REVIEW, 111,
p.10361060.
Anderson, John R. & Fincham, J.M., 1994. Acquisition of procedural skills from examples. Journal of
Experimental Psychology: Learning, Memory, and Cognition, 20(6), pp.13221340.
Anderson, John R. & Milson, R., 1989. Human memory: An adaptive perspective. Psychological
Review, 96(4), pp.703719.
Anderson, J. R, 1983. A spreading activation theory of memory. Journal of verbal learning and verbal
behavior, 22(3), p.261–295.
Anderson, J. R, 2005. Human symbol manipulation within an integrated cognitive architecture.

Cognitive Science: A Multidisciplinary Journal, 29(3), p.313–341.
Anderson, J. R et al., 1998. An integrated theory of list memory. Journal of Memory and Language, 38,
p.341–380.
Anderson, J.R., Budiu, R. & Reder, L.M., 2001. A theory of sentence memory as part of a general
theory of memory. Journal of Memory and Language, 45, pp.337367.
Anderson, J. R & Douglass, S., 2001. Tower of Hanoi: Evidence for the cost of goal retrieval. Journal
of experimental psychology: learning, memory, and cognition, 27(6), p.1331.
Anderson, J. R, Reder, L.M. & Lebiere, C., 1996. Working memory: Activation limitations on retrieval.
Cognitive Psychology, 30(3), p.221–256.
Ball, J.T., 2003. Beginnings of a language comprehension module in ACTR 5.0. In In Proceedings of
the Fifth International Conference on Cognitive Modeling. p. 231–232.
Bod, R., 2001. Sentence memory: Storage vs. computation of frequent sentences. In In Proceedings
CUNY’2001 Conference.
Bod, R., 1992. A computational model of language performance: Data Oriented Parsing. In
Proceedings of the 14th conference on Computational linguisticsVolume 3. p. 855–859.
Bod, R., 2002. A unified model of structural organization in language and music. Journal of Artificial
Intelligence Research, 17(2002), p.289–308.
Bod, R., 2006. Exemplarbased syntax: How to get productivity from examples. Linguistic review,
23(3), p.291.
Bod, R., 2005. Towards Unifying Perception and Cognition: The Ubiquity of Trees, Citeseer.
12 / 14
Bod, R., 1993. Using an annotated corpus as a stochastic grammar. In Proceedings EACL’93.
Bonnema, R., Bod, R. & Scha, R., 1997. A DOP model for semantic interpretation. In Proceedings of
the 35th Annual Meeting of the Association for Computational Linguistics and Eighth
Conference of the European Chapter of the Association for Computational Linguistics. p. 159–
167.
Borensztajn, G., Zuidema, W. & Bod, R., 2009. The hierarchical prediction network: towards a neural
theory of grammar acquisition. In Proceedings of CogSci.
Cochran, D., 2009. Darwinised dataoriented parsing: statistical NLP with added sex and death. In
Proceedings of the EACL 2009 Workshop on Cognitive Aspects of Computational Language
Acquisition. CACLA ’09. Stroudsburg, PA, USA: Association for Computational Linguistics, p.
42–50. Available at: http://portal.acm.org/citation.cfm?id=1572461.1572469 [Accessed April 30,
2011].
Elman, J.L., 1991. Distributed representations, simple recurrent networks, and grammatical structure.
Machine Learning, 7(2), p.195–225.
Frazier, L., 1995. Constraint satisfaction as a theory of sentence processing. Journal of Psycholinguistic
Research, 24(6), pp.437468.
Friederici, A.D., 1995. The Time Course of Syntactic Activation During Language Processing: A
Model Based on Neuropsychological and Neurophysiological Data. Brain and Language, 50(3),
pp.259281.
Gibson, E., 1998. Linguistic complexity: Locality of syntactic dependencies. Cognition, 68(1), p.1–76.
Lebiere, C. & Anderson, J. R, 2008. A connectionist implementation of the ACTR production system.
Lewis, Richard L., Vasishth, Shravan & Van Dyke, J.A., 2006. Computational principles of working
memory in sentence comprehension. Trends in Cognitive Sciences, 10(10), pp.447454.
Lewis, R. L & Vasishth, S., 2005. An activationbased model of sentence processing as skilled memory
retrieval. Cognitive Science: A Multidisciplinary Journal, 29(3), p.375–419.
van Maanen, L. & van Rijn, H., 2007. An accumulator model of semantic interference. Cognitive
Systems Research, 8(3), p.174–181.
Marr, D., 1982. Vision: A computational investigation into the human representation and processing of
visual information.
Miller, J., 1988. Discrete and continuous models of human information processing: Theoretical
distinctions and empirical results. Acta Psychologica, 67(3), pp.191257.
Prins, H., 2010. Comparison between EEG data and ACTR buffer activity during the Attentional Blink
using Independent Component Analysis. University of Groningen.
Raaijmakers, J.G.W. & Shiffrin, R.M., 1992. Models for recall and recognition. Annual review of
13 / 14
psychology, 43(1), p.205–234.
Reitter, D., Keller, F. & Moore, J.D., 2010. A Computational Cognitive Model of Syntactic Priming.
Cognitive Science.
Remko Scha, Rens Bod & Khalil Sima’an, 1999. A MemoryBased Model of Syntactic Analysis: Data
Oriented Parsing. Available at: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=?
doi=10.1.1.23.4139 [Accessed May 2, 2011].
Sternberg, S., 1969a. Memoryscanning: Mental processes revealed by reactiontime experiments.

American Scientist, 57(4), p.421–457.
Sternberg, S., 1969b. The discovery of processing stages: Extensions of Donders’ method. Acta
psychologica, 30, p.276–315.
Stocco, A., Lebiere, C. & Anderson, J. R, 2010. Conditional routing of information to the cortex: A
model of the basal ganglia’s role in cognitive coordination. Psychological review, 117(2), p.540–
574.
Ullman, M.T., 2004. Contributions of memory circuits to language: The declarative/procedural model.
Cognition, 92(12), p.231–270.
Waugh, N.C. & Norman, D.A., 1965. Primary memory. Psychological review, 72(2), p.89.
14 / 14

Towards Combining Data and Process Oriented Models of Language Parsing?

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Towards Combining Data and Process Oriented Models of Language Parsing?

Hochgeladen von

Copyright:

Verfügbare Formate

Towards Combining Data and Process Oriented Models of

Abstract ∙ As opposed to data oriented models of language processing, process oriented models

OUTLINE acquire and use knowledge about the world to

a b c d e f

Unification of data and processes

Anderson, J. R, 2005. Human symbol manipulation within an integrated cognitive architecture.

Sternberg, S., 1969a. Memoryscanning: Mental processes revealed by reactiontime experiments.

Das könnte Ihnen auch gefallen

Towards Combining Data and Process Oriented Models of Language Parsing?

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Towards Combining Data and Process Oriented Models of Language Parsing?

Hochgeladen von

Copyright:

Verfügbare Formate

Towards Combining Data and Process Oriented Models of

Abstract ∙ As opposed to data oriented models of language processing, process oriented models

OUTLINE acquire and use knowledge about the world to

a b c d e f

Unification of data and processes

Anderson, J. R, 2005. Human symbol manipulation within an integrated cognitive architecture.

Sternberg, S., 1969a. Memory­scanning: Mental processes revealed by reaction­time experiments.

Das könnte Ihnen auch gefallen

Sternberg, S., 1969a. Memoryscanning: Mental processes revealed by reactiontime experiments.