Beruflich Dokumente
Kultur Dokumente
1. Many linguistic frameworks of analysis look at the lexicon as the locus of idiosyncrasy, where all the raw matter of language use (i.e. words,
affixes, lexicalized phrases) can be found. Though there are claims about the status of the lexicon in grammar models, and though its internal
structure may be modulated, and the format of lexical specifications may be hypothesized, no theoretical approach seems to be willing to take
the risk of speculating on which words real individual speakers know.
2. Textual corpora allow to emulate the lexicon of a language, but the distance between the words in a corpus and the words of a language
may be huge. In fact, textual corpora are biased by their own defining features: written vs. oral texts; text types (e.g. literature, administration,
journalism, etc.); adult vs. children, etc. Nevertheless, textual corpora are powerful tools that allow access to real language use.
3. Lexical corpora, a by-product of textual corpora, are often used to 8. To perform the offline test we used google forms. All the informants
generate word frequency rates, which may be relevant for lexical were physically present in the same room, and each individual testing
studies that evaluate, for instance, the ratio of nouns, adjectives and process was completed in about 90 min.
verbs, or the productivity of a given word formation process. Subjects were asked to perform the following production task:
It is important to keep in mind that conclusions based on lexical 1. Question: Does Xção mean the act of X?
frequency drawn from Corpus X may differ from the conclusions drawn 2. The informants who replied ‘yes’ were asked to use the base verb in
from Corpus Y. Most importantly, they certainly differ from the a sentence (e.g. usar ‘to use’) that allowed to control the answer.
conclusions drawn from the lexicon of individual speakers, even 3. Negative replies led the informants to be asked to write a sentence
though, for the time being, their appraisal is beyond our reach. with the derivative (e.g. utilização ‘use’), which again was important
4. Although its assessment still lacks robustness, approaches to word to control the previous reply.
processing often take frequency as a core criterion. Therefore, we may 4. If the informants replied ‘I don’t know’, they were invited to assign a
suspect that the centrality of frequency might entail some of the meaning to the base verb, which allowed us to better understand
distortions that are often visible in nonsensical results of many this choice.
experiments − word processing experimentation is based on individual 9. The present results regard a selection of words (76) that illustrate
responses, whereas frequency values rely on biased sets of linguistic conveniently the global results, allowing us to conclude that:
data. The assessment of frequency from ever growing textual corpora i. There is a gap between the alleged knowledge of derivatives and the
will progressively become more reliable, but they are not yet there. ‘real’ knowledge demonstrated with sentence production. This gap
5. Meanwhile, is it possible to devise another tool that could take the seems to be smaller in medium size words.
Taking these preliminary results into
variation between alleged knowledge of the derivative and
Our previous research (cf. Pinto 2017, Villalva & Pinto 2018) suggests account, we decided to only consider 7,58 7,96
7,09
that individual knowledge of words may be more relevant for VWP the results of real knowledge in the
than abstract word frequency. This was our initial motivation to perform remaining analysis. 2-syllable 3-syllable 4-syllable 5-syllable 6-syllable
an offline test related to the interpretation of Portuguese deverbal ii. Word knowledge is unrelated to the word length. All the levels of the
nouns. This choice is based on two assumptions: number of syllables present words with better and worse results:
① The comprehension of derivatives is not related to frequency.
② The knowledge of the meaning of the base word and the 97 100 98 98 98 98
93 93 94 94 95
94
93 96
94 96
94
93 96
94
93 96
95 96
95
89 90 90 92 89 92
91 92
91 92
91 91
87 89
88 88
87 88
83 82 84
83
82 82 82 84
contain the suffix –ção (e.g. utilização ‘the use’). The data set was split
23
18
into 2 subsets, according to the number of syllables (2-6), and iii. Word knowledge is
according to their frequency values, based on the CRPC. In this also unrelated to
database, we can find circa 12,000 nouns in –ção. Their frequency f r e q u e n c y. S i n c e
value ranges from 142,408 tokens (situação ‘situation’) to 1 token frequency is presented
(webização ‘web+ization’). by the number of tokens
The corpus we selected consists of compositional words with different and knowledge by
frequency values. The following charts display their frequency per percentage, the
number of syllables. Although longer words present lower frequency following graphs are
values, all word lengths have higher and lower frequency words: o b t a i n e d b y
60000
superimposing the
2-syllable base verb 3-syllable base verb 4-syllable base verb 5-syllable base verb 6-syllable base verb
derivative derivative derivative derivative
derivative
40000
20000
knowledge line to the
0 frequency columns, per number of syllables.
These charts also show that the 2-syllable pairs
frequency of the derivative is 10. In sum, these results succeed to 3-syllable pairs
4-syllable pairs
unrelated to the frequency of the demonstrate that word knowledge is 5-syllable pairs
6-syllable pairs
1802,125
1710,25
7275,75
8155,95
1086,95
9002,8
3205,6
3248,4
in the chart on the right also suggest that This is a very interesting output, as it allows a
base verb derivative
frequency decreases with longer word length in both cases. new measurement principle for online testing,
tailored for each specific test, to be set. In other words, based on this
7. The subject sample consists of native Portuguese speakers, who are experiment, we claim that experimentation on VWP must be framed by
university students (Mage=21,74±5,1). All the informants have normal an offline test of this kind. We have already conducted a priming
or corrected vision and present no language pathology. experiment that uses the same corpus, so the next stage of our
We have obtained 37 to 51 answers for each lexical item. research will use word knowledge to examine RTs.
[CRPC] – Corpus de Referência do Português Contemporâneo. @http://www.clul.ul.pt/pt/recursos/183-reference-corpus-of- Pinto, C. (2017). O Papel da Estrutura Morfológica nos Processos de Leitura.. Faculdade de Letras da Universidade de Lisboa.
contemporary-portuguese-crpc (13/11/2018). Villalva, A & Pinto, C. (2018). Morphological complexity and lexical processing costs. Alfa, 62 (1), p.149-168.
Workshop "Language in mind and brain” LMU Munich, 10-12 December 2018