Beruflich Dokumente
Kultur Dokumente
KUPIN
1. Introduction
This is a paper on grammatical formalism. However, it differs in purpose
from two important papers on grammatical formalism: Peters and Ritchie (1973)
and Ginsburg and Partee (1969). Each of these papers presented a formalism that
would be wide enough in scope to permit most then countenanced syntactic
theories to be represented. In effect, these papers were presenting a precise
scientific language ion syntactic theories to make use of. This is not our purpose.
In this paper we are attempting to present a particular theory of syntax in a precise
way. Many of the operations describable within other theories cannot be expressed
within this theory. However, the converse does not appear to be true. In this way
our theory is very restrictive. The class of formal objects that can be trans-
formational rules is more narrow than that in any theory we know of. It is vastly
more narrow than that in the above mentioned formalizations, particularly with
respect to allowable structural descriptions.
There are several reasons for attempting to construct a very restrictive
theory. The first is, simply, that the "best" theory is the most falsifiable theory
(Popper (1959)). This means that in the absence of strong evidence falsifying
a particular linguistic theory, if that theory predicts the occurrence of fewer
grammar-like formal objects than another theory, the former must be preferred
to the latter. The first theory is making claims that are easier to prove false, and
0301-4428/77/0402-0002 $ 2.00
Brought to you by | Brown University Rockefeller Library
Copyright by Walter de Gruyter & Co.
Authenticated | 128.148.252.35
Download Date | 9/5/12 5:34 AM
174 Howard Lasnik and Joseph J. Kupin
as long as those claims are not falsified, it is a better theory. Appropriate counter-
evidence would consist of well-documented, highly productive phenomena which
cannot be accounted for within the theory. Even such counterevidence should
not lead to the abandonment of all restrictions, however, but to the search for a
well-motivated minimal "enrichment" of the theory to allow description of the
phenomenon. The more restrictive theory should not be abandoned until that
minimally more powerful theory is found.
The second reason for positing a restrictive theory confronts the question
of language acquisition. We follow Chomsky (1965) in the belief that children
acquire their grammar from an environment that seriously underdetermines it, and
that some evaluation metric is employed to select the appropriate grammar for any
particular language. Certainly if the class of possible grammars is smaller, the
evaluation task becomes simpler. By restricting the class of allowable grammars,
we thus approach an explanation of how language can be acquired.
There is a second, less important, difference between this paper and the two
works mentioned above. Each of these treated equally all of the following phe-
nomena, discussed in Emonds (1970): root transformations, cyclic rules, minor
movement rules and agreement (feature copying) rules. Here we will further the
general program of Chomsky (1972) distinguishing rules by their formal and
functional characteristics and positing as many grammatical components as neces-
sary to account for the formal constraints. As Emonds suggests, the above classes
of phenomena are formally distinct, and we feel that each should be assigned to a
different component of the grammar. In this paper we will be concerned only with
the cyclic transformational phenomena. We will attempt to present the most
restrictive theory that has any hope of accounting for these phenomena.
Section two of this paper explains the definitions and constructs that are
needed for our analysis. Section three presents a detailed example from English
in which the definitions are used to construct a derivation, and our concluding
comments are in section four.
By the notation { + AJ , +A 2 ,..., An} we mean a collection of sets in which each set
contains +A{ or AJ (but not both) for each i, 1 <i<n.
< i < n . That
Th is, { + A,, + A 2 ,...} s {{+AJ,
def.
}, {-A!, -
.:.}
+ verb,
2
Vn is not a non-terminal vocabulary as defined in Chomsky (1959) p. 129. "axiom #2:
A e Vn iff there are , , such that " Vn as we use it here is the closest
analog in the transformational component to the set defined by that axiom, which is
appropriate for a base component. We will extend the conventions of Chomsky (1959)
to Vn as if it were a non-terminal vocabulary.
12 TL IV
phrase marker.5 Trees (12) and (13) (among others discussed below) would both
be associated with RPM (11).
(11) {S, Ab, Cb, aB, ab}
S
(12) f ^-
c \
\ >
a
<13> C B
i \
\ "
a
For this special case of domination, we have constructed the dominates
predicate so that Ab dominates Cb and Cb dominates Ab. We could have defined
the predicate so that neither of the two was true, but no definition could make
one of them true and the other false.
The choice of this representation, then, constitutes an empirical claim about
human language. All grammars in this theory will necessarily treat (12) and (13)
identically since they have identical representations, namely, (11).
A second consequence of this choice of representation is "pruning" of the
strongest possible sort. Both the following trees, and many others besides, would
be associated with (11).
S
(14)
(15)
(17)
(18)
No reduced phrase marker can have both aDb and ab in it since neither
precedes nor dominates the other. Assuming that in the base every non-terminal
introduces a terminal, this difference in descriptive power is only relevant under
one particular definition of deletion. Our definition, which obviates this difference,
is given below.
In this theory, pruning thus becomes a non-issue, since the repeated nodes never exist
to be pruned. There is never a conversion to more tree-like objects so the issue never comes
up. Thus, the effects of pruning, if indeed there are any, are unavoidable.
7
It seems that, in general, movement is restricted to cases where source and goal have
identical specifications in the transformation. For example, NP movement is into an NP
position. This is one version of the structure preserving hypothesis, cf. Emonds (1970). This
could be captured in our formalism by stipulating that if f=(i/j) then AI = Aj. Since there
are a number of unresolved issues pertaining to movement, we will not pursue this question
here.
8
/ is the language-specific set of "insertable elements". In English, / apparently
includes DO (for do-support) and THERE (for there-insertion). We follow Chomsky (1976a) in
the view that transformations do not insert lexical material. Note that the lexically inserted
homophones to the DO and THERE of / are of different syntactic categories. Lexical DO is a
main verb (while do-support DO is an auxiliary) and lexical THERE is an adverb (while there-
insertion THERE is an NP).
(20)
a. for all i, 1 < i < , dominates XiBjZj in &
and b. <& satisfies the conditions of
i. basic analyzability
ii. subjacency
iii. tensed sentence
iv. COMP island
and c. if f = (b/j) or f = (i/j) then ^
satisfies the condition of lexical conservation.
and d. there is no set W which satisfies a, b, and c and which is more prominent than <&.
(22) B is more specific than A if there exists an index i, and sets of features and
such that B = (i, ) and A = (i, v) and 3 v.
(23) subjacency
Of satisfies the condition of subjacency for the triple (, , ^) if there is at
most one string ' such that ' is cyclic and such that for some i and j, '
dominates XjBjZ; in 3P and not: ' dominates XjBjZj //; .^.
H C H
t
I I
c
I
h
(35) SC((30),S,(28)) =
{S, Ab, Heb, hClx hcD, heb)
S
A D
/ \C
H
l
b
(36) SC((31),S,(28)) =
{S,A,Hc,hC,hc}
S
A
H C
l l
h C
(37) SC((32),S,(28)) =
{S, AAh, HcAh, tCAh, tcD, tcDh, tcAH, tcAh}
H C D H
l l l l
t C h
(38) SC((33),S,(28)) =
{S, AhA, HchA, tChA, tcD, tcHA, tchD, tcAh}
S
= x k w k <p / if9
= 9'wk'2k if9 = 9;yk'2k'
= otherwise
As is conventional in definition by parts, we require that the first condition of (43)
that is applicable is selected.
10
There is a very direct relationship between derivations in a base component and
RPM's in & (3$). See Kupin (In press) for further discussion.
base component j? ($). Each later RPM must be the result of a transformational
mapping from the immediately preceeding RPM. Certain other conditions must
hold. Among them are our version of the cycle, (45), a modified form of Kiparsky's
(1973) Elsewhere Condition, (46), and one filtering function precluding A's from
the last RPM in a derivation, (47).
and b. for all i, 1 < i < n, &{+1 = SC(T, , ^), for some e 9\ and T e &
and c. &i = &-J implies i = j.
(^i, . . ., ^n) obeys the following conditions with respect to y\
d. strict cycle condition
e. specificity condition
/. surface filter condition
In definition (45), for all k, l < k < n,
let Gk be the string such that ^k + 1 = SC(Tk, ak, ^k).
(46) specificity
(^! , . . . , ^n) satisfies the specificity condition with respect to ^", if for all i,
if SC (, , = j/ and SC ( , , ^) = ^
and = (X, f) and T' = (X', f) and X ^ X' and spec(X, X7), then ^i+\ &
above.11 (45) is a partial ordering of the application of rules that says only that
given a particular set of cyclic non-terminals that cover one mapping, no later
mapping can have a set of covering cyclic non-terminals that properly includes the
first. That is, one can not use a "lower S node" as a domain once a rule has been
done "higher up in the tree". This is parallel to what Chomsky (1973) has called
the strict cycle. (45) says nothing about two mappings whose covering set of non-
terminals are not in the subset relation. This is the case in which transformations
are done in two S's in two different places in the sentence as in "Bill knows Si
and 82". This theory makes no claim about whether transformations within Sj or S2
need be applied earlier. Conceptually, (45) is somewhat different from many other
statements of the principle of the cycle. The principle is often taken to be a
requirement that rule applications begin on the most deeply embedded cyclic
domain, and from there proceed to the "next domain up", and so on. Chomsky
(1973) proposed that the notion "transformational cycle" be sharpened by the
addition of the "strict cycle condition" (our (45)). What we suggest is that the
"strict cycle condition" is not merely a part of the cyclic principle, but rather
that it exhausts that principle. It should be noted that though the principle of the
cycle is related to the subjacency condition (23) in that both have to do with
cyclic domains, the two can not be collapsed. The subjacency condition is strictly
Markovian, depending like everything else in structural descriptions only on
"current" structure. The strict cycle, on the other hand, is properly part of the
definition of derivation, since it depends on all earlier stages of the derivation.
Condition (46) says that if two transformations are applicable and one is
more general than the other, the more general one may not be chosen for
application.
Condition (44*:) entails that no derivation includes any vacuous subderivations
(cf. Levine (1976)). This requirement gives the ordered set constituting a derivation
one of the properties of an unordered set. We find in this a potentially interesting
similarity to the case of phrase markers vis-a-vis phrase structure derivations.
Condition (47) allows us to "soften" the effects of the optionality of all
transformations in the following way. The effect of condition (47) is that if is
introduced somewhere in a structure, no particular becomes obligatory, but
rather it is obligatory that something be done somewhere along the line to remove
that ; otherwise, the derivation must be "thrown out". This seems to be the
proper generalization. What is obligatory is not the means used, but the end
achieved.
There is also an ordering inherent in the "feeding" or "bleeding" action of T's. That
is, the application of a transformation sometimes creates a situation where another becomes
applicable (feeding) or creates a situation where another cannot apply within the same
sentence (bleeding).
3. Example
In this section a fairly complicated structure and three transformations are
presented and part of a derivation is constructed to illustrate the definitions given
above. In what follows, certain details irrelevant to the present investigation have
been omitted. We believe that, to the level of detail we can attempt here, the
structure and the transformations will be part of any adequate analysis of American
English. Lasnik (forthcoming) presents a detailed analysis of the English auxiliary
essentially within this same framework.
The transformations to be considered are:
(50) T! : (COMP WH, (2/1)) WH fronting12, where WH = (3, { + WH})
(51) T2: (NP NP, (2/1)) NP preposing
(52) T3: (NP NP, (1/2)) NP postposing
We will discuss their application in the derivation of the sentence:
(53) Who knows which gifts Paul and Bill were given by John?
The RPM below labeled 9 is assumed to be the initial RPM in this derivation.
We will use the line letters in this listing to refer to elements of 9 in the dis-
cussion below. For the reader's convenience, one of the phrase structure trees
associated with & is given. The nodes in trees (54') and (64') are labelled with
superscript a, b, c,... in correspondence with the elements <z, b, c,... in RPM's (54)
and (64), respectively.
(54) ^:
a. S
b. COMP wh pres know J. past be en give P. and B. wh gifts by
c. S
d. NP pres know J. past be en give P. and B. wh gifts by
e. AwhpresVP
/. wh pres V J. past be en give P. and B. wh gifts by
g. wh pres know S
b. wh pres know COMP J. past be en give P. and B. wh gifts by
/. wh pres know S
j. wh pres know NP past be en give P. and B. wh gifts by
k. wh pres know J. past VP
/. wh pres know J. past PASS give P. and B. wh gifts by
12
As is well known, WH fronting applies to NP's, adverb phrases, adjective phrases
and quantifier phrases. We conclude that all of these phrases have the same number of bars.
It is not totally clear what this number should be. For concreteness we have chosen 3 as
the number. We assume that the lowest "phrase-level" (3-bar) non-terminal dominating
a WH word is specified +WH in its phrase structure derivation. In this example we will not
explicitly mark the difference between NP's with feature +WH and other WH NP's.
(54')
COMPb
To see that ^ is an RPM consider that each element in the listing either
dominates or precedes each of the following elements in the listing. For example
(54) dominates (540), (54/>), and (54 /) since each of these is of the form:
wh pres know Aj. past be en give wh gifts by and in no case is NP
or 0. (54#) precedes (54^), (54r), and (54j) since each of these is of the form:
wh pres knowAJ. past be en give P. and . where wh gifts by .
To illustrate the SubP function, we have listed below some sub-RPM's in ^ ,
some of which will be used later in the discussion.
(55) SubP(^,^1) = ^1
(56) SubP (*, u = {COMP, }
(57) SubP(*, 9>u = {NP, NP and B., P. and NP, P. and B.}
(58)
(59)
(60) SubP (fc#i) = {NP,wh gifts}
Each of these can be seen to follow the definition given in section 2.1.
Now we are prepared to consider how the three transformations can apply
to 0*i . We will begin by illustrating certain ordered pairs of elements and explaining
why they do or do not qualify as proper analyses of ^ for the transformation under
consideration.
(62)
(b, q) because of tensed sentence & prominence
(h,q) because of prominence
,^1) = (A,i)
(b, q) because of covering cycle (20o)
(62) For T2 : SD (T2 ,,^1)7* (/, n) because of conservation
(sj) because s precedes} is false
(see analyzability)
(63) For T3 : SD (T3 , # , ^) (d, o) because of subjacency, and conservation
2
1 ~ J Past be en give P. and B. wh gifts by
2 = source index = j = k;
x2 = wh pres know J. past be en give P. and B.
A2 = NP
22 = by
yi-
v = J . past be en give P. and B.
y2 = wh gifts
{COMP,NP,wh gifts}
Wi = wh gifts
w 2 =t
primary change (Tx ,,
13 TL IV
Here and below, <X>'s with superscripts are strings taking the place of ' in
the definition of g.
g(, T! ,& u = ObWl vw222 =
COMP wh pres know wh gifts J. past be en give P. and B. t by
g. wh pres know S
h. wh pres know COMP J. past be en give P. and B. t by
/. wh pres know NP J. past be en give P. and B. t by
/ wh presknow wh gifts S
k. . wh pres know wh gifts NP past be en give P. and B. t by
/. wh pres know wh gifts J. past VP
m. '. wh pres know wh gifts J. past PASS give P. and B. t by
. . wh pres know wh gifts J. past be en V P. and B. t by
. wh pres know wh gifts J. past be en give NP t by
p. . wh pres know wh gifts J. past be en give NP and B. t by
q. . wh pres know wh gifts J. past be en give P. and NP t by
r. . wh pres know wh gifts J. past be en give P. and B. NP by
s. wh pres know wh gifts J. past be en give P. and B. t PP
/. wh pres know wh gifts J. past be en give P. and B. t by NP
u. . wh pres know wh gifts J. past be en give P. and B. t by
Sa
<\i
(644')
\ /
CO MPD JS c
^"^ VP
NPd / \
V* N V _
C(A
^\L^^^^
NPM VP*
\ ^"x O^-^
1 .^
m
/ /P pO^^ppS
^7*
N P! \ pXss tr NP ^NP* NP*
/\ /\
wh pres know wh gifts J. past be en give P. and B. i by
1 1 1
We have now completed the first step in one possible derivation of the
seentence "Who knows which gifts Paul and Bill were given by John." Rather
thhan proceed with a second step in the same detail, we will sketch in the remaining
sfcteps. From ^2 there are only two possible moves: SC^, a, &2) on ^e analysis
(>,</), and SC(T3, g,&2\ on the analysis (k,f). Note that 5(3,^,^2) is not
13r3*
(/,/) due to the COMP island constraint.13 No new structural descriptions have
resulted from the change from ^ to ^2 K" we> as we eventually must, remove
the last in the "lower" sentence by applying T3 to ^2, we will begin the
"passive" chain of transformations. The second half of this chain, NP preposing,
is not forced to apply by any syntactic requirement on derivations. It is allowed
to apply optionally and the derivation in which it does not apply is discarded on
semantic grounds. For semantic interpretation, movement traces (t's) must be
"properly bound" by the moved item (Fiengo (1974, 1977)), and in SC (T^,^)
the trace of J. is not properly bound. The application of T2 will replace that t with
(57), leaving behind a trace that is properly bound. NP preposing creates ^4, and
finally SC^T^S,^.) will end the derivation.
We applied the transformations in the following order:
lower S cycle: T x , T3, T2; higher S cycle: Tj
They also could have been applied in either of the following orders:
lower S cycle: T3, T x , T2; higher S cycle: T t .
lower S cycle: T3, T2, T! ; higher S cycle: T t .
These are the only possible successful derivations from ^, and all produce the
desired result.
4. Conclusion
4.0. Some Consequences
It has been our intention to present a restrictive transformational theory in a
revealing formalism. For this reason, it would have been inappropriate to begin
with an all encompassing notation of roughly the Peters-Ritchie sort, and then tack
on the necessary restrictions. Instead, we have attempted to develop a formalism
in which the constraints follow from prior definitions and in this way form part
of a coherent whole. Thus our choice of representation has empirical consequences.
If any of the central constraints are shown to be invalid, our theory of grammar
will be falsified. For example, the straightforward definition of transformation
given in 2.2 transparently embodies most of the generalizations listed in that
section.
13
Note that there is an apparent difficulty in that movement out of COMP even into
another COMP will be blocked quite generally by our tensed S condition, preventing the
derivation of "Who do you think Bob saw?". Movement of the WH word into the COMP
of the embedded sentence is permitted, but movement from this COMP into the higher
COMP is blocked by (24). There are a number of possible modifications that will allow
COMP to function as an "escape hatch" as in Chomsky (1973). For example (20) could be
changed in such a way that when f = (i/j), (24) must be satisfied only when one of the non-
terminals indexed by i and j is not COMP. We might also mention that recent work (see in
particular Huang (1977)) indicates that COMP has internal structure: one substructure
for sentence introducers such as English THAT and FOR, and another for WH phrases. Clearly
it is only the latter that is relevant to COMP to COMP movement and to (26).
We have thus far ignored one of the major research questions of recent work
on grammatical formalism, namely, that of weak generative capacity. It seems clear
to us that our theory shares the defect of the Aspects theory noted by Peters and
Ritchie (1973). Our deletion operation presumably results in grammars that lack
the survivor property of Peters (1973); hence, our theory provides a grammar for
every r. e. set. Nonetheless, on one level, it makes sense to say that our theory is
better than the Aspects theory (as articulated by Peters and Ritchie). Peters and
Ritchie's proof depends upon the presence in grammars of a deletion operation of
a particular sort, and is virtually independent of all other grammatical properties,
many of which are important to linguistic investigation. It is with respect to these
other properties that the theories in question diverge. In comparing two theories,
it is reasonable to abstract away from their common virtues and shortcomings.
In the present instance, such an abstraction leaves our theory much less powerful.
Notice that we use the term "powerful" not with respect to the character
of the languages generated but rather with respect to the relative size of the classes
of grammars allowed. In 1., we argued that the theory is best that allows the
smallest subset of grammars consistent with empirical evidence. From this point
of view, the fact that some of the knguages generated may be non-recursive is of
subsidiary importance. The relevant consequence of the Peters-Ritchie proof is
that a grammar is available for every r. e. set.
To state this in another way, our concern throughout this paper has been
with restricting the class of grammars compatible with reasonably limited data,
and not with resolving the decidability problem for the sentences of particular
grammars. We have not considered this second problem and are not convinced that
it is of any inherent linguistic import.
REFERENCES
ANDERSON, S.R., and P. KIPARSKY, eds. (1973), A Festschrift for Morris Halle, Holt, Rinehart,
and Winston, New York.
BRESNAN, J. (1976), On the Form and Functioning of Transformations. Linguistic Inquiry 7,
340.
CHOMSKY, N. (1955), The Logical Structure of Linguistic Theory (Plenum, New York 1975).
CHOMSKY, N. (1956), Three Models for the Description of Language. LR.E. Transactions on
Information Theory, IT-2,113124.
CHOMSKY, N. (1959), On Certain Formal Properties of Grammars. Information & Control 2,
137167.
CHOMSKY, N. (1965), Aspects of the Theory of Syntax. MIT Press, Cambridge, Massachusetts.
CHOMSKY, N. (1972), Studies on Semantics in Generative Grammar. Mouton, The Hague.
CHOMSKY, N. (1973), Conditions on Transformations. In Anderson and Kiparsky.
CHOMSKY, N. (1976a), On Wh-Movement. Presented at the Irvine Conference on Formal
Syntax.
CHOMSKY, N. (1976b), Conditions on Rules of Grammar. Linguistic Analysis 2, 303351.
CHOMSKY, N., and M. HALLE (1968), The Sound Pattern of English. Harper & Row, New York.
EMONDS, J. (1970), Root and Structure Preserving Transformations. Unpublished MIT diss.
FIENGO, R. (1974), Semantic Conditions on Surface Structure. Unpublished MIT diss.
FIENGO, R. (1977), On Trace Theory. Linguistic Inquiry 8, 3561.
FIENGO, R., and H. LASNIK (1976), Some Issues in the Theory of Transformations. Linguistic
Inquiry 7,182-191.
GINSBURG, S., and B. PARTEE (1969), A Mathematical Model of Transformational Grammars.
Information & Control 15,297334.
HUANG, P. (1977), Wb-frontingand Related Processes. Unpublished, University of Connecticut diss.
JACKENDOFF, R. (1972), Semantic Interpretation in Generative Grammar. MIT Press, Cambridge,
Massachusetts.
JACKENDOFF, R. (1976), X Syntax. To appear as Linguistic Inquiry Monograph No. 2.
KIPARSKY, P. (1973), Elsewhere in Phonology. In Anderson and Kiparsky.
KUPIN, J. (In press), A Motivated Alternative to Phrase Markers. Linguistic Inquiry 1, 2.
LASNIK, H. (1976), Remarks on Coreference. Linguistic Analysis 2, 122.
LASNIK, H. (forthcoming), Restricting the Theory of Transformations: A Case Study.
LEVINE, A. (1976), Why Argue about Rule Ordering? Linguistic Analysis 2,115124.
PETERS, S. (1973), On Restricting Deletion Transformations. M. Gross, et al. eds. The Formal
Analysis of Natural Language, Mouton, The Hague.
PETERS, S., and R.W. RITCHIE (1973), On the Generative Power of Transformational
Grammars. Information Sciences 6, 4983.
POPPER, K. (1959), The Logic of Scientific Discovery. Basic Books, New York.
POSTAL, P. (1974), On Raising. MIT Press, Cambridge, Massachusetts.