Sie sind auf Seite 1von 1

Word knowledge: intuition and assessment

Alina Villalva (FLUL, CLUL) & Carina Pinto (ESSLei, CLUL)


alinavillalva@campus.ul.pt carina@letras.ulisboa.pt

1. Many linguistic frameworks of analysis look at the lexicon as the locus of idiosyncrasy, where all the raw matter of language use (i.e. words,
affixes, lexicalized phrases) can be found. Though there are claims about the status of the lexicon in grammar models, and though its internal
structure may be modulated, and the format of lexical specifications may be hypothesized, no theoretical approach seems to be willing to take
the risk of speculating on which words real individual speakers know.

2. Textual corpora allow to emulate the lexicon of a language, but the distance between the words in a corpus and the words of a language
may be huge. In fact, textual corpora are biased by their own defining features: written vs. oral texts; text types (e.g. literature, administration,
journalism, etc.); adult vs. children, etc. Nevertheless, textual corpora are powerful tools that allow access to real language use.

3. Lexical corpora, a by-product of textual corpora, are often used to 8. To perform the offline test we used google forms. All the informants
generate word frequency rates, which may be relevant for lexical were physically present in the same room, and each individual testing
studies that evaluate, for instance, the ratio of nouns, adjectives and process was completed in about 90 min.
verbs, or the productivity of a given word formation process. Subjects were asked to perform the following production task:
It is important to keep in mind that conclusions based on lexical 1. Question: Does Xção mean the act of X?
frequency drawn from Corpus X may differ from the conclusions drawn 2. The informants who replied ‘yes’ were asked to use the base verb in
from Corpus Y. Most importantly, they certainly differ from the a sentence (e.g. usar ‘to use’) that allowed to control the answer.
conclusions drawn from the lexicon of individual speakers, even 3. Negative replies led the informants to be asked to write a sentence
though, for the time being, their appraisal is beyond our reach. with the derivative (e.g. utilização ‘use’), which again was important
4. Although its assessment still lacks robustness, approaches to word to control the previous reply.
processing often take frequency as a core criterion. Therefore, we may 4. If the informants replied ‘I don’t know’, they were invited to assign a
suspect that the centrality of frequency might entail some of the meaning to the base verb, which allowed us to better understand
distortions that are often visible in nonsensical results of many this choice.
experiments − word processing experimentation is based on individual 9. The present results regard a selection of words (76) that illustrate
responses, whereas frequency values rely on biased sets of linguistic conveniently the global results, allowing us to conclude that:
data. The assessment of frequency from ever growing textual corpora i. There is a gap between the alleged knowledge of derivatives and the
will progressively become more reliable, but they are not yet there. ‘real’ knowledge demonstrated with sentence production. This gap
5. Meanwhile, is it possible to devise another tool that could take the seems to be smaller in medium size words.
Taking these preliminary results into
variation between alleged knowledge of the derivative and

core place of frequency in word processing research?


proven knowledge
9,58 9,42

Our previous research (cf. Pinto 2017, Villalva & Pinto 2018) suggests account, we decided to only consider 7,58 7,96
7,09

that individual knowledge of words may be more relevant for VWP the results of real knowledge in the
than abstract word frequency. This was our initial motivation to perform remaining analysis. 2-syllable 3-syllable 4-syllable 5-syllable 6-syllable

an offline test related to the interpretation of Portuguese deverbal ii. Word knowledge is unrelated to the word length. All the levels of the
nouns. This choice is based on two assumptions: number of syllables present words with better and worse results:
①  The comprehension of derivatives is not related to frequency.
②  The knowledge of the meaning of the base word and the 97 100 98 98 98 98
93 93 94 94 95
94
93 96
94 96
94
93 96
94
93 96
95 96
95
89 90 90 92 89 92
91 92
91 92
91 91
87 89
88 88
87 88
83 82 84
83
82 82 82 84

compositionality of the derivative are prevailing criteria.


78 79
77 76 78 2 syllables
74 73 73
68 68 70 69 3 syllables
63 60 60 60
57 58 59 56 4 syllables
53
52 55
49 49

6. Our linguistic data comprehends 153 deverbal action nouns that


44 47 5 syllables
38 41
37 Real word knowledge of the derivative vs. number of syllables 6 syllables
30 30
28

contain the suffix –ção (e.g. utilização ‘the use’). The data set was split
23
18

into 2 subsets, according to the number of syllables (2-6), and iii. Word knowledge is
according to their frequency values, based on the CRPC. In this also unrelated to
database, we can find circa 12,000 nouns in –ção. Their frequency f r e q u e n c y. S i n c e
value ranges from 142,408 tokens (situação ‘situation’) to 1 token frequency is presented
(webização ‘web+ization’). by the number of tokens
The corpus we selected consists of compositional words with different and knowledge by
frequency values. The following charts display their frequency per percentage, the
number of syllables. Although longer words present lower frequency following graphs are
values, all word lengths have higher and lower frequency words: o b t a i n e d b y
60000

superimposing the
2-syllable base verb 3-syllable base verb 4-syllable base verb 5-syllable base verb 6-syllable base verb
derivative derivative derivative derivative
derivative
40000

20000
knowledge line to the
0 frequency columns, per number of syllables.
These charts also show that the 2-syllable pairs

frequency of the derivative is 10. In sum, these results succeed to 3-syllable pairs
4-syllable pairs

unrelated to the frequency of the demonstrate that word knowledge is 5-syllable pairs
6-syllable pairs

independent of word length and frequency.


759,4375

1802,125

base verb. Average values presented


5395,35

1710,25

7275,75

8155,95

1086,95
9002,8

3205,6

3248,4

in the chart on the right also suggest that This is a very interesting output, as it allows a
base verb derivative

frequency decreases with longer word length in both cases. new measurement principle for online testing,
tailored for each specific test, to be set. In other words, based on this
7. The subject sample consists of native Portuguese speakers, who are experiment, we claim that experimentation on VWP must be framed by
university students (Mage=21,74±5,1). All the informants have normal an offline test of this kind. We have already conducted a priming
or corrected vision and present no language pathology. experiment that uses the same corpus, so the next stage of our
We have obtained 37 to 51 answers for each lexical item. research will use word knowledge to examine RTs.
[CRPC] – Corpus de Referência do Português Contemporâneo. @http://www.clul.ul.pt/pt/recursos/183-reference-corpus-of- Pinto, C. (2017). O Papel da Estrutura Morfológica nos Processos de Leitura.. Faculdade de Letras da Universidade de Lisboa.
contemporary-portuguese-crpc (13/11/2018). Villalva, A & Pinto, C. (2018). Morphological complexity and lexical processing costs. Alfa, 62 (1), p.149-168.

Workshop "Language in mind and brain” LMU Munich, 10-12 December 2018

Das könnte Ihnen auch gefallen