Sie sind auf Seite 1von 1

Mapping

Gramma+cal Structures
onto Prociency Levels
Rui Talhadas

2
Universidade do Algarve/ INESC-ID Lisboa/L F

rtalhadas@gmail.com

Abstract

PEAPL2

The relaCve frequency of the use of verbal tenses is


dierent. Frequency of use is known to correlate
with language prociency levels. Thus it is expected
that a Present Simple, being more frequent, be
learnt more quickly than the less frequently occurring CondiConal. With the analysis of corpora, we
expect to understand at which stage of learning the
use dierent tenses is mastered.

Discourse connectors

2
1
3
3
7
5

1
0
1
1
1
0

t:C
t:IP

1
2
2
1
0
3

Vinf

58
98
260
269
239
322

pass-e

pass-s

words

Sent.
CEFR

ID
ar007CVITI
A1 3
ar008CVETD
A2 6
AL_2_11_6.1B
B 13
AL_2_11_70.3Q B 8
CI.CA.C1.17.6.1B C1 10
CI.CA.C1.01.50.2L C1 20

3
5
6
3
5
6

0
0
1
0
2
1

Linguis+c features

When beginning to learn PFL, learners mostly use


quite simple sentences, as they are not enCrely at
ease with the language. According to some authors,
the use of conjuncCons, conjuncCve adverbs and
other discursive connectors can help to situate the
learner at an adequate CEFR level.

Clausal internal structure

The clausal-internal structures will also be analyzed,


based on the sequence of the elementary syntacCc
consCtuents (or chunks); e.g. noun phrase (NP),
verb phrase (VP), preposiConal phrase (PP), and
their syntacCc dependencies. This analysis will
allow to assess if, and how, clausal complexity
increases along learning levels.

Passive construc:on

The use of the passive construcCons, with both


auxiliary ser (to be) and estar (to be), and the
pronominal passive (with the so-called passive
parCcle), demonstrates knowledge of various
lexical-grammaCcal structures.

Solu:on

A number of relevant features will be extracted from publicly available learning corpora
composed of texts from PFL students. These
corpora will be processed by the NLP processing chain STRING [3], and the xml output will
be fed into an automaCc text classier,
CLAVIS [2], which extracts staCsCcal features
out of texts. This tool will be adapted in order
to extract the linguisCc features here studied.
This data will then be studied with machine
learning techniques [4] in order to map the
use of these linguisCc features onto the dierent CEFR levels and its evoluCon across
levels.

The moment in the learning process when a learner


starts to use a word seems to be related to the
frequency with which that word is used in
language: the most frequent words are learnt
before the rarest.

Verbal tenses and modes

CLAVIS

Corpus
Cople2
Cople2
RePLE
RePLE
PEAPL2
PEAPL2

Laboratrio de Sistemas de Lngua Falada

Vocabulary

CEFR Levels

CEFR [1] was built to help the planning of


language learning programs, mainly dened
in a communicaCve perspecCve, dening a
set of competences that students should
acquire in order to aWain a certain communicaCve prociency level. CEFR generically
menCons:
- GrammaCcal competences
- Morphological competences
- Lexical competences
However, specic lexical and grammaCcal
contents to be taught are not made explicit.

Investigao e Desenvolvimento em Lisboa

Linguis:c Features

In the development of scienCcally validated


curricula that promote a consistent and appropriate learning process of progressive complexity, it
is necessary to determine at what stage of this
process are the students of Portuguese as a
COPLE2 RePLE
Foreign Language (PFL) linguisCcally prepared to
learn and use the dierent language structures.
This project intends to map the use of various
grammaCcal and lexical structures, namely:
(i) vocabulary; (ii) the use of verbal tenses and
modes; (iii) the use of conjuncCve adverbs, conjuncCons and other discourse connectors; (iv) the
internal sentence structures; and (v) the passive
construcCon; in correlaCon with the learning
levels dened in the Common European Framework of Reference for Languages (CEFR), and the
evoluCon in the learning process of PFL.

Problem

Instituto de Engenharia de Sistemas e Computadores

Weka

Applica:ons
Language models

A2
A1

B1

C1
B2

Results from this project may be used on:


development of dierent Computer-
-Assisted Language Learning applicaCons
improving precision of automaCc text
readability classiers
unsupervised PFL tesCng and prociency
assessment
development and improvement
of the PFL syllabi

References: [1] Conselho da Europa (2001). Quadro Europeu Comum de Referncia para as Lnguas: Aprendizagem, ensino, avaliao. Edies ASA. [2] Curto, P., BapCsta, J., Mamede, N.
(2015). AssisCng European Portuguese Teaching: LinguisCc features extracCon and automaCc readability classier. In Computer Supported EducaKon, Selected Papers from CSEDU2015 ,
Lecture Notes is Computer Sciences/CommunicaCons in Computer and InformaCon Science (CCIS) series, vol. 583: pp. 81-96. Springer-Verlag. [3] Mamede, N., BapCsta, J., Diniz, C.,
Cabarro, V. (2012). STRING - A Hybrid StaCsCcal and Rule-Based Natural Language Processing Chain for Portuguese. In: Abad, A. (ed.) PROPOR 2012 Demo session (hWps://string.l2f.inescid.pt). [4] WiWen, I., Frank, E., Hall, M. (2011). Data Mining: PracCcal Machine Learning Tools and Techniques. Morgan Kaufmann, 3rd ed.
Acknowlegment: University of Algarve-FCHS (PhD program on Language Sciences) has parCally funded the parCcipaCon in this conference.

Das könnte Ihnen auch gefallen