Beruflich Dokumente
Kultur Dokumente
Gramma+cal Structures
onto Prociency Levels
Rui Talhadas
2
Universidade do Algarve/ INESC-ID Lisboa/L F
rtalhadas@gmail.com
Abstract
PEAPL2
Discourse connectors
2
1
3
3
7
5
1
0
1
1
1
0
t:C
t:IP
1
2
2
1
0
3
Vinf
58
98
260
269
239
322
pass-e
pass-s
words
Sent.
CEFR
ID
ar007CVITI
A1 3
ar008CVETD
A2 6
AL_2_11_6.1B
B 13
AL_2_11_70.3Q B 8
CI.CA.C1.17.6.1B C1 10
CI.CA.C1.01.50.2L C1 20
3
5
6
3
5
6
0
0
1
0
2
1
Linguis+c features
Passive construc:on
Solu:on
A number of relevant features will be extracted from publicly available learning corpora
composed of texts from PFL students. These
corpora will be processed by the NLP processing chain STRING [3], and the xml output will
be fed into an automaCc text classier,
CLAVIS [2], which extracts staCsCcal features
out of texts. This tool will be adapted in order
to extract the linguisCc features here studied.
This data will then be studied with machine
learning techniques [4] in order to map the
use of these linguisCc features onto the dierent CEFR levels and its evoluCon across
levels.
CLAVIS
Corpus
Cople2
Cople2
RePLE
RePLE
PEAPL2
PEAPL2
Vocabulary
CEFR Levels
Linguis:c Features
Problem
Weka
Applica:ons
Language models
A2
A1
B1
C1
B2
References: [1] Conselho da Europa (2001). Quadro Europeu Comum de Referncia para as Lnguas: Aprendizagem, ensino, avaliao. Edies ASA. [2] Curto, P., BapCsta, J., Mamede, N.
(2015). AssisCng European Portuguese Teaching: LinguisCc features extracCon and automaCc readability classier. In Computer Supported EducaKon, Selected Papers from CSEDU2015 ,
Lecture Notes is Computer Sciences/CommunicaCons in Computer and InformaCon Science (CCIS) series, vol. 583: pp. 81-96. Springer-Verlag. [3] Mamede, N., BapCsta, J., Diniz, C.,
Cabarro, V. (2012). STRING - A Hybrid StaCsCcal and Rule-Based Natural Language Processing Chain for Portuguese. In: Abad, A. (ed.) PROPOR 2012 Demo session (hWps://string.l2f.inescid.pt). [4] WiWen, I., Frank, E., Hall, M. (2011). Data Mining: PracCcal Machine Learning Tools and Techniques. Morgan Kaufmann, 3rd ed.
Acknowlegment: University of Algarve-FCHS (PhD program on Language Sciences) has parCally funded the parCcipaCon in this conference.