Universidade Do Algarve/ INESC-ID Lisboa/L F: Mapping Gramma+cal Structures Onto Proficiency Levels

Mapping
Gramma+cal Structures
onto Prociency Levels
Rui Talhadas
2
Universidade do Algarve/ INESC-ID Lisboa/L F
rtalhadas@gmail.com
Abstract
PEAPL2
The relaCve frequency of the use of verbal tenses is

dierent. Frequency of use is known to correlate
with language prociency levels. Thus it is expected
that a Present Simple, being more frequent, be
learnt more quickly than the less frequently occurring CondiConal. With the analysis of corpora, we
expect to understand at which stage of learning the
use dierent tenses is mastered.
Discourse connectors
2
1
3
3
7
5
1
0
1
1
1
0
t:C
t:IP
1
2
2
1
0
3
Vinf
58
98
260
269
239
322
pass-e
pass-s
words
Sent.
CEFR
ID
ar007CVITI
A1 3
ar008CVETD
A2 6
AL_2_11_6.1B
B 13
AL_2_11_70.3Q B 8
CI.CA.C1.17.6.1B C1 10
CI.CA.C1.01.50.2L C1 20
3
5
6
3
5
6
0
0
1
0
2
1
Linguis+c features
When beginning to learn PFL, learners mostly use

quite simple sentences, as they are not enCrely at
ease with the language. According to some authors,
the use of conjuncCons, conjuncCve adverbs and
other discursive connectors can help to situate the
learner at an adequate CEFR level.
Clausal internal structure
The clausal-internal structures will also be analyzed,

based on the sequence of the elementary syntacCc
consCtuents (or chunks); e.g. noun phrase (NP),
verb phrase (VP), preposiConal phrase (PP), and
their syntacCc dependencies. This analysis will
allow to assess if, and how, clausal complexity
increases along learning levels.
Passive construc:on
The use of the passive construcCons, with both

auxiliary ser (to be) and estar (to be), and the
pronominal passive (with the so-called passive
parCcle), demonstrates knowledge of various
lexical-grammaCcal structures.
Solu:on
A number of relevant features will be extracted from publicly available learning corpora
composed of texts from PFL students. These
corpora will be processed by the NLP processing chain STRING [3], and the xml output will
be fed into an automaCc text classier,
CLAVIS [2], which extracts staCsCcal features
out of texts. This tool will be adapted in order
to extract the linguisCc features here studied.
This data will then be studied with machine
learning techniques [4] in order to map the
use of these linguisCc features onto the dierent CEFR levels and its evoluCon across
levels.
The moment in the learning process when a learner

starts to use a word seems to be related to the
frequency with which that word is used in
language: the most frequent words are learnt
before the rarest.
Verbal tenses and modes
CLAVIS
Corpus
Cople2
Cople2
RePLE
RePLE
PEAPL2
PEAPL2
Laboratrio de Sistemas de Lngua Falada
Vocabulary
CEFR Levels
CEFR [1] was built to help the planning of

language learning programs, mainly dened
in a communicaCve perspecCve, dening a
set of competences that students should
acquire in order to aWain a certain communicaCve prociency level. CEFR generically
menCons:
- GrammaCcal competences
- Morphological competences
- Lexical competences
However, specic lexical and grammaCcal
contents to be taught are not made explicit.
Investigao e Desenvolvimento em Lisboa
Linguis:c Features
In the development of scienCcally validated

curricula that promote a consistent and appropriate learning process of progressive complexity, it
is necessary to determine at what stage of this
process are the students of Portuguese as a
COPLE2 RePLE
Foreign Language (PFL) linguisCcally prepared to
learn and use the dierent language structures.
This project intends to map the use of various
grammaCcal and lexical structures, namely:
(i) vocabulary; (ii) the use of verbal tenses and
modes; (iii) the use of conjuncCve adverbs, conjuncCons and other discourse connectors; (iv) the
internal sentence structures; and (v) the passive
construcCon; in correlaCon with the learning
levels dened in the Common European Framework of Reference for Languages (CEFR), and the
evoluCon in the learning process of PFL.
Problem
Instituto de Engenharia de Sistemas e Computadores
Weka
Applica:ons
Language models
A2
A1
B1
C1
B2
Results from this project may be used on:

development of dierent Computer-
-Assisted Language Learning applicaCons
improving precision of automaCc text
readability classiers
unsupervised PFL tesCng and prociency
assessment
development and improvement
of the PFL syllabi
References: [1] Conselho da Europa (2001). Quadro Europeu Comum de Referncia para as Lnguas: Aprendizagem, ensino, avaliao. Edies ASA. [2] Curto, P., BapCsta, J., Mamede, N.
(2015). AssisCng European Portuguese Teaching: LinguisCc features extracCon and automaCc readability classier. In Computer Supported EducaKon, Selected Papers from CSEDU2015 ,
Lecture Notes is Computer Sciences/CommunicaCons in Computer and InformaCon Science (CCIS) series, vol. 583: pp. 81-96. Springer-Verlag. [3] Mamede, N., BapCsta, J., Diniz, C.,
Cabarro, V. (2012). STRING - A Hybrid StaCsCcal and Rule-Based Natural Language Processing Chain for Portuguese. In: Abad, A. (ed.) PROPOR 2012 Demo session (hWps://string.l2f.inescid.pt). [4] WiWen, I., Frank, E., Hall, M. (2011). Data Mining: PracCcal Machine Learning Tools and Techniques. Morgan Kaufmann, 3rd ed.
Acknowlegment: University of Algarve-FCHS (PhD program on Language Sciences) has parCally funded the parCcipaCon in this conference.

Universidade Do Algarve/ INESC-ID Lisboa/L F: Mapping Gramma+cal Structures Onto Proficiency Levels

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Universidade Do Algarve/ INESC-ID Lisboa/L F: Mapping Gramma+cal Structures Onto Proficiency Levels

Hochgeladen von

Copyright:

Verfügbare Formate

Mapping

The relaCve frequency of the use of verbal tenses is

When beginning to learn PFL, learners mostly use

Clausal internal structure

The clausal-internal structures will also be analyzed,

The use of the passive construcCons, with both

The moment in the learning process when a learner

Verbal tenses and modes

Laboratrio de Sistemas de Lngua Falada

CEFR [1] was built to help the planning of

Investigao e Desenvolvimento em Lisboa

In the development of scienCcally validated

Instituto de Engenharia de Sistemas e Computadores

Results from this project may be used on:

Das könnte Ihnen auch gefallen