Sie sind auf Seite 1von 12

P.

Schlenker - Ling 1 - Introduction to the Study of Language, UCLA

Introduction to Language - Lecture Notes 4B

Sentence Structure II: Phrase Structure Grammars


Goal: How are sentences built (or 'generated', as linguist say)? Corresponding to the two hypotheses that were considered in the preceding Lecture Notes, we discuss two possibilities. The first hypothesis, based on a 'word chain device' (formally called a 'finite state model' or a 'Markov model), yields sentences that have a flat structure. We already found an argument against such a hypothesis in the preceding Lecture Notes- the sentences of English do not have a flat structure. We show that this hypothesis has other defects as well. The second hypothesis, by contrast, generates (=produces) sentences that do not have a flat structure. It involves Phrase Structure Rules, which yield trees with labels added to indicate the syntactic category of each constituent (e.g. Noun Phrase, Verb Phrase, etc.). The resulting tree is seen to recapitulate the process by which a sentence is generated =produced) by the rules of grammar: a group of elements forms a constituent whenever they have been introduced by the application of the same rule.

1
1.1

Review: Constituency
Summary: Trees

(i) In every sentence, certain groups of words form 'natural units' [=constituents] and may: -stand alone -be moved as a unit -be replaced as unit by a pronoun (ii) Trees encode the information about constituents: two expressions are a natural unit (=constituent) if there is a sub-tree that contains them and nothing else. (iii) A sentence that can be analyzed as 2 different trees is structurally ambiguous (e.g. Lucy will hit the student with the book)

1.2

A Puzzle Explained: Question Formation The Puzzle (repeated from earlier Lecture Notes

Pinker discusses in Chapter 2 of The Language Instinct (p. 29) the example of question formation. If we wish to form a question that corresponds to the assertion John is in the garden, we may simply move the auxiliary is to the beginning of the sentence, yielding Is John __ in the garden? [here __ simply indicates that a word has been displaced]. In a slightly more complex case, such as John is in the garden next to someone who is asleep, we form the corresponding question by moving to the beginning of the sentence the first is, yielding Is John __

P. Schlenker - Ling 1 - Introduction to the Study of Language, UCLA

in the garden next to someone who is asleep? If we tried instead to move the second is, we would obtain a sharply ungrammatical result ('ungrammatical' in the descriptive sense we will use throughout this course): *Is John is in the garden right next to someone who __ asleep? These contrasts are recapitulated in (1): (1) a. John is in the garden next to someone who is asleep. b. Is John __ in the garden next to someone who is asleep? c. *Is John is in the garden right next to someone who __ asleep? (Move the first is) (Move the second is)

From these one might be tempted to infer that the rule of question formation is to systematically move to the beginning of the sentence the first is which is uttered. Pinker shows that this hypothesis is incorrect, since it predicts (incorrectly) that the question corresponding to (2)a is (2)b: (2) a. A unicorn that is eating a flower is in the garden b. *Is a unicorn that __ eating a flower is in the garden? c. Is a unicorn that is eating a flower __ in the garden? (Move the first is) (Move the second is)

We do not discuss at this point what the correct rule is (it will turn out that it must be stated in more abstract terms than 'moving the first is' or 'moving the second is'). But we observe that a child that only heard simple cases of question formation (e.g. Is John __ in the garden?) would have to infer a rather complex and subtle rule from limited data. For the same reason as in the case of integers mentioned above, the child must have something to guide his acquisition of a rule that goes beyond the sentences that he has heard."

The Solution: 'move the auxiliary which is immediately under the right-hand daughter of the root'

The solution of the puzzle is that the rule of question formation should be stated in terms of structure (i.e. in terms of syntactic trees) rather than in terms of strings (=linear order). The rule of question formation in English is to move to the beginning of the sentence (i.e. to add to the tree) the auxiliary which is immediately under the right-hand daughter of the root (the root is the top-most node of the tree). (3) a. b.

If Mary is replaced with the person who will be hired (clearly a constituent - for instance it may be replaced with the pronoun 'he' or 'she'), the general structure of the sentence is not affected, and in particular the same word will is moved which was moved in the simple sentence. Crucially, it is not the word will contained in the person who will be hired which is moved - as one wants. This is illustrated in (4) [note that a triangle stands for a constituent whose internal structure is omitted for simplicity; in homeworks you should specify the complete structure of a tree, i.e. you should not use triangles, unless the exercise tells you to do so]:

P. Schlenker - Ling 1 - Introduction to the Study of Language, UCLA

(4)

a.

b.

Going back to our original puzzle with A unicorn is in the garden, we can apply exactly the same reasoning. Constituency tests would lead one to posit the following structure, where a unicorn is a single constituent. (5)

The rule of question formation can then be applied in the same way as in our earlier examples: (6)

And just as we want, the rule functions in exactly the same way when a unicorn is replaced with a unicorn that is eating flowers; and the right result is obtained:

P. Schlenker - Ling 1 - Introduction to the Study of Language, UCLA

(7)

a.

b.

An Incorrect Model: Finite State Grammars (=Markov Model)

A plausible -but incorrect- model is discussed by Pinker in Chapter 4 of The Language Instinct, the Finite State Model (also called 'Markov Model'; Pinker also calls it a 'word chain device'). It is both natural and historically important, since it was considered plausible until the 1950's. In a nutshell, it attributes to a speaker a simple mental system that allows him or her to determine whether a given word can or cannot follow another given word. Here is the example of a Finite State Model discussed by Pinker (I have added a 'START' and an 'ACCEPT' states, which are implicit in Pinker's discussion; the idea is that you feed the sentence to the machine, starting with the first word, one word after the other. If you end up in the ACCEPT state after the last word has been processed, the sentence is accepted; otherwise the sentence is rejected): (8) happy the a one boy girl dog ice cream hot dogs candy

START (9)

eats

ACCEPT

Examples of sentences that are generated by (8): a. the boy eats ice cream b. the happy boy eats ice cream c. the happy happy boy eats hot dogs d. a happy happy girl eats candy

(10) Examples of ungrammatical sentences that are not generated by (8): a. *boy the eats ice cream b. *happy boy eats hot dogs c. *hot dogs eats the dog

P. Schlenker - Ling 1 - Introduction to the Study of Language, UCLA

(11) Examples of grammatical sentences that are not generated by (8): a. some boy eats ice cream b. the dog that the dog eats eats ice cream c. either the boy eats ice cream or the girl eats candy There are two important arguments against the Finite State Model: -Argument 1: It does not account for the tree-like structure of sentences that we observed in Lecture Notes 3B. -Argument 2: It cannot properly account for 'long distance dependencies', i.e. constructions in which two elements that depend on each other are separated by an arbitrary number of words. (12) Example of a long distance dependency: either ... or... a. Either John is sick or he is depressed b. Either John thinks that he is sick or he is depressed c. Either Mary knows that Johns thinks that he is sick or she is depressed d. Either the boy eats hot dog or the dog eats hot dog e. Either the happy happy boy eats hot dog or the dog eats candy etc. We could try to integrate the either ... or construction into our Finite State Model, but no simple solution would work. To see this, observe that in the following model nothing requires that a sentence that starts with either should also contain or somewhere down the road. And for good reason: in order to 'remember' this, the model would need some kind of memory, which it lacks completely. The problem turns out to be very severe. In fact, Noam Chomsky became famous in the 1950's by proving that no matter how complex a finite state machine was, it could not handle all constructions of English. (13) happy either the START a ACCEPT if one boy girl dog ice cream hot dogs candy or then

eats

(14) Some grammatical sentences generated by (13) a. Either a girl eats candy or a boy eats hot dogs b. Either a happy girl eats candy or a boy eats hot dogs

P. Schlenker - Ling 1 - Introduction to the Study of Language, UCLA

(15) Some ungrammatical sentences generated by (13) a. *Either a girl eats candy b. *Either a happy girl eats candy

A Better Model: Phrase Structure Grammars

Our goal, then, is to devise a system of rules that addresses the two criticisms given in Argument 1 and Argument 2 above. In other words, the system we are trying design should: Requirement 1: Account for the tree-like structure that sentences have, and Requirement 2: Provide an analysis of long-distance dependencies, i.e. constructions in which two elements that depend on each other are separated by an arbitrary number of words. We start with some properties that are satisfied by all or most sentences: (i) All sentences have a verb (e.g. sleep, eat, claim) and an inflection, which may appear as an auxiliary (will, might, can, should, did, do, does) or as an affix on the verb (the latter case will not be discussed here, as it involves further complexities). (ii) All sentences include, normally before the verb, a group of words that contains a noun, be it a common noun (e.g. man, woman, table) or a proper name (John, Mary). This is illustrated in the following sentences: (16) a. John will sleep b. The director will sleep c. Mary will hit John d. The director will criticize John If we performed constituency tests on these sentences, we would see that they all start in the same way: -first, they contain a constituent that includes a noun -second, they contain a constituent of the form [Inflection + ___], where ___ is a constituent that contains a verb. (17) a. [John] [will [sleep]] b. [The director] [will sleep] c. [Mary] [will [hit John]] d. [The director] [will [criticize John]] The initial group that contains a noun we will call a Noun Phrase, NP for short. The group that contains a verb, referred to as ____ above, will be called a Verb Phrase. The group [Inflection + Verb Phrase] will be called I' (pronounced 'I bar'. I for inflection, ' (i.e. bar) to indicate that it contains other things in addition). With this background, we can start writing our grammar. Because each sentence contains an inflection, it is called an 'Inflection Phrase', symbolized as IP. IP NP I' I' I VP (a sentence consists of a Noun Phrase followed by an I bar) (an I bar consists of an Inflection followed by a Verb Phrase).

We can now write the rest of the grammar:

P. Schlenker - Ling 1 - Introduction to the Study of Language, UCLA

I will, might, can, should, does, did (an Inflection is: will, or might, or can, or should, or does, or did) NP PN, D N (a Noun Phrase comprises either a Proper Name/ProNoun alone, or a Determiner and a Noun) PN John, Bill, Mary, Sam, he, she... N President, director, boy, girl, Dean, friend, mother... VP Vi, Vt NP, Vs CP (a Verb Phrase comprises either an intransitive Verb Vi alone, or a transitive verb Vt followed by a Noun Phrase, or a verb of speech or though Vs followed by a Complementizer Phrase) D the, some, a, every, my, his, her... Vi sleep, run, snore, fall... Vt meet, date, hit, kill, criticize... Vs think, say, believe, claim... CP C IP (a Complementizer Phrase comprises a Complementizer followed by an Inflection Phrase) C that Let us first go through some very simple examples. The tree is constructed from the top, applying one rule at each step. For instance the fact that IP is the mother of NP and I' indicates that we have applied the rule: IP NP I'. Similarly the fact that I' is the mother of I and VP indicates that we have applied the rule: I' I VP, etc. (18) IP I' NP PN Mary (19) IP I' NP D the N will President sleep We can also generate some of the sentences that occupied us in Lecture Notes 3B: I VP Vi I will sleep VP Vi

P. Schlenker - Ling 1 - Introduction to the Study of Language, UCLA

(20)

IP I' NP PN will D Mary meet the President N I Vt VP NP

(21) NP

IP I' I D Your N will friend Vt D meet VP NP N

the President Crucially, we observe that our Phrase Structure Grammar generates sentences 'with the right structure', i.e. with the tree-like structure that was discussed in Lecture Notes 3B. The only difference is that some nonbranching nodes have been added (reminder: a non-branching node is a node with just 1 daughter). When the non-branching nodes and the labels are disregarded, we obtain exactly the trees that were argued for in Lecture Notes 3B: (22)

Mary

will meet the President

(23)

will Your friend meet the President

P. Schlenker - Ling 1 - Introduction to the Study of Language, UCLA

We also note that our little grammar can generate more complex sentences, thanks in particular to our rule for verbs of speech and thought (e.g. believe, think, claim, etc.), which can embed an Inflection Phrase within another Inflection Phrase, as is shown below (the embedding of a constituent of a given category within another constituent of the same category is called recursion; it is essential to generate an infinite language): (24) IP I' NP PN will C Mary claim that NP PN John I will IP I' VP Vi sleep I Vs VP CP Recursion of IP (=an IP is embedded within another IP)

Observe that nothing would prevent us from embedding the IP in (24) within a larger IP, e.g. John will think that ____. Since this procedure can be repeated as many times as we want, our grammar can generate an infinite number of sentences. At this point it should already be clear that we have met Requirement 1: our grammar does account for the tree structure that was argued for in Lecture Notes 3B. What about Requirement 2, then? Do we now have an account of long-distance dependencies? We do, as soon as we add one rule to our little grammar: IP either IP1 or IP2, if IP1 then IP2 This rule generates trees such as the following: (25)

either

IP1

or

IP2

P. Schlenker - Ling 1 - Introduction to the Study of Language, UCLA

It is then clear that by adding under IP1 or IP2 any of the trees that can be generated by our grammar, we obtain a grammatical sentence. Requirement 2 has thus been met as well.

The Head Parameter

[This part of the Lecture Notes will probably not be discussed until Thursday, February 5th, 2004] The constituents generated by our Phrase Structure Grammar have labels that indicate which element gives them their 'crucial' properties. For instance a Verb Phrase is so-called because it always contains a verb in a specified position. We say that the verb is the head of the Verb Phrase. A major property of natural languages is that their constituents are headed. We make a further observation, which is specific to English. A head always comes before its sister. Linguists call the sister of a head its complement. Thus we can express the same fact by stating that in English the head always comes before its complement. For instance, the inflection I comes before its complement VP; the complementizer C comes before its complement IP; and a transitive verb Vt (e.g. hate) comes before its complement NP (e.g. the President). Interestingly, the position of the head relative to its complement depends on the language. This is one additional parameter which can account for language variation (reminder: we discussed the 'Null Subject Parameter' in previous lectures). English is uniformly head-initial, in the sense that in every construction the head comes before its complement. By contrast, Japanese is uniformly head-final, in the sense that the head always comes after its complement. While this does not account for all syntactic differences between English and Japanese, it accounts for quite a few, and brings out the similarities between two apparently very different word orders: (26) John-ga Mary-o but-ta John-particle Mary-particle hit-PAST 'John hit Mary' (27) IP I' NP PN John PN Mary (28) Bill-wa John-ga Mary-o but-ta Bill-particle John-particle Mary-particle hit-PAST that 'Bill thought that John hit Mary' to omot-ta think-PAST hit VP NP I V PAST

10

P. Schlenker - Ling 1 - Introduction to the Study of Language, UCLA

(29) NP PN Bill IP

IP I' VP I CP C I' NP PN John PN Mary hit VP NP I V PAST that V PAST think

It should be noted that English and Japanese are two extreme examples: head-initial for all constructions (English), or head-final for all constructions (Japanese). Some languages display a mixed pattern, in which some constructions (e.g. Verb Phrases) are head-initial, while others (e.g. Complementizer Phrases) are headfinal.

11

P. Schlenker - Ling 1 - Introduction to the Study of Language, UCLA

Appendix. Contents of Chapter 4 of Pinker's Language Instinct 4. How Language Works (Syntax) (i) General properties of grammar -Two 'tricks' (a) Arbitrariness of the sign [Saussure] (75) (b) Infinite use of finie means [Humboldt] (75) -Syntax is a discrete combinatorial system (75) -Syntax is autonomous from cognition (76) (a) Meaningful sentences that are ungrammatical (b) Grammatical sentences that are meaningless (ii) The Markov Model (=the Finite State Model) (81) (iii) Syntactic Trees (90) (a) Basic components (90) (b) Structural ambiguity (94) (iv) Phrase Structure (97) (a) Parts of speech (98) (b) X' theory (99) (c) The Head Parameter (103) (d) Thematic roles (105) (e) Case (107) (f) IP (110) (g) Function words (111) (h) Deep structures (113)

12

Das könnte Ihnen auch gefallen