Sie sind auf Seite 1von 13

J. Sci. & Devel. 2015, Vol. 13, No.

6: 976-988 Tp ch Khoa hc v Pht trin 2015, tp 13, s 6: 976-988


www.vnua.edu.vn

CNG C X.ENT CHO TRCH XUT D LIU THC TH, QUAN H GIA
THC TH V H TR PHN TCH D LIU TRONG CC TP CH
V PHNG CHNG DCH BNH TRONG NNG NGHIP CA PHP
Phan Trng Tin*, Ng Cng Thng

Khoa Cng ngh Thng tin, Hc vin Nng nghip Vit Nam

Email*: ptgtien@vnua.edu.vn

Ngy gi bi: 22.07.2015 Ngy chp nhn: 03.09.2015

TM TT

Trch xut thc th l cng vic trch xut thng tin v phn loi thng tin trong vn bn theo nhng loi xc
nh trc nh tn ngi, t chc, a im, thi gian, v mt bc cao hn l tm mi quan h gia cc thc th
v d nh mi quan h gia tn ngi vi tn t chc. Cng c x.ent c xy dng lm cng vic nh vy,
cng c s dng cc t in cho thc th v cc lut trch xut. Trong trch xut quan h gia cc thc th chng
ti p dng hai phng php: phn tch cu trc ca vn bn v s dng m hnh hc khng gim st l phn
tch tn sut xut hin ca cc thc th. Cng c x.ent c sn trn trang ch R theo ng dn: http: //cran.r -
project.org/web/packages/x.ent/index.html.
T kho: Automat hu hn, nhn bit thc th nh danh, Perl, R, trch xut thng tin, trch xut thc th, trch
xut quan h.

X.ent Package for Extraction of Entities, Relationships between Entities and Support
Data Analysis in Epidemiological Journals in French Agriculture

ABSTRACT

Entity extraction is a task of information extraction and element classification in text such as the names of
persons, organizations, locations, times, etc. and to find relationship between entities such as the relationship
between the names of persons with the organizations. The X.ent tool was built solve this task. It uses dictionaries
matching and hand - crafted rules to extract. In extracting the relationship between the entities, we applied two
methods: analysis of text structures and unsupervised learning approach called coo ccurrence analysis. This tool is
available on the site of R at the links: http: //cran.r - project.org/web/packages/x.ent/index.html.
Keywords: Entity Extraction, Information Extraction (IE), Named entity recognition (NER), Perl, Relation
Extraction, R.

1. T VN vo YouTube, v.v. (theo NASATI) v n cn tip


tc tng ln trong thi gian ti.
Chng ta ang sng trong thi i bng n
v cng ngh thng tin, theo thng k, mi ngy Vic x l v phn tch d liu ln da trn
c 540 triu tin nhn vn bn c gi i trn nhng nghin cu trong nhiu lnh vc bao gm
ton th gii, 143 t email c trao i, 40.000 khoa hc my tnh, thng k, ton hc, k thut
gigabyte d liu c to ra bi My gia tc ht d liu, nhn dng mu, trc quan ha, tr tu
ln (LHC - Large Hadron Collider), 400 triu nhn to, my hc v tnh ton hiu nng cao.
cp nht trng thi trn trang mng x hi Vi lng d liu rt ln, n c th cha c
Twitter c ng, 104.000 gi video c thm nhng thng tin d tha, v vy vic trch xut

976
Cng c x.ent cho trch xut d liu thc th, quan h gia thc th v h tr phn tch d liu trong cc tp ch v
phng chng dch bnh trong nng nghip ca Php

thng tin (IE) l mt bc rt quan trng ly khuyn khch ngi nng dn s dng cc
c ra nhng thng tin cn thit cho vic phn phng php iu tr chng li cc sinh vt
tch d liu. Hin nay trch xut thng tin c gy hi. n bn u tin c ra i vo nm
s dng trong rt nhiu lnh vc ng dng nh 1946 v u l cc bn nh my (bn in), t
tm hiu v xu hng kinh doanh ch yu nm 2001 tt c cc n bn c xut bn theo
ca ngi dng, ngn nga bnh tt, phng nh dng PDF. Php c chia lm 22 vng v
chng ti phm, lnh vc tin sinh hc, phn tch cc vng nc ngoi, mi vng s xut bn cc
chng khon, v.v. bo co ring.
X.ent l mt cng c c chng ti xy Ngun d liu ca d n c 50.000 bn bo
dng cho vic trch xut d liu vn bn (trch co, trong c khong 20.000 l dng cc trang
xut thc th v quan h gia cc thc th), in. Chng ti cn scan cc bn giy ny v n
ngoi ra chng ti cn xy dng mt s tnh c chia s ti th vin BNF (Bibliothque
nng bng ho c vit trn R cung cp Franois - Mitterrand) v sau c chuyn
cho ngi s dng cc tnh nng phn tch d i sang dng text nh k thut OCR (Optical
liu sau khi trch xut. Cng c ny l s kt Character Recoginition) bi Jouve Corp.
hp cc ngn ng lp trnh khc nhau: Perl cho
y l d n c ti tr bi B Nng
phn trch xut d liu, R cho vic h tr phn
nghip v Nghin cu Php, d n bao gm cc
tch kt qu. Sau khi hon thnh chng ti
chuyn gia sinh vt hc v sinh thi hc nghin
gi cng c ca chng ti ln trang ch ca
cu cc tc nhn gy bnh: dch t hc v khoa
CRAN (l mt trang cha cc gi ng dng ca
hc mi trng (cc d bo v su bnh) vi mt
R) v c cc chuyn gia thng k hc y
mng li gi l PIC (Intergrated Crop
chp nhn, hin ti ngi s dng c th ti v
Protection). C 4 chuyn gia v khoai ty v la
v ci t trc tip t my ch CRAN. y l
m t PIC ng hnh cng chng ti trong d
sn phm c ti hon thnh trong qu trnh
n ny, d n c tn VESPA (Valeur et
hc cao hc ti Php nm 2012 - 2014.
optimisation des dispositifs depidemiosurveillance
dans une strategie durable de protection des
2. VT LIU V PHNG PHP cultures - c lng v ti u ho cc thit b
2.1. Vt liu gim st dch t hc trong chin lc bo v s
bn vng cho cy trng).
D liu c chng ti trch xut l cc bo
co v phng chng dch bnh cho cy trng ca
2.2. Phng php
Php, c 12 thc th chng ti quan tm l cy
trng (crops), bnh (diseases), sinh vt ph hoi Trch xut thng tin (IE) l mt tc v t
(pests), cc sinh vt c li khc (auxiliaries), v ng trch xut c c thng tin c cu trc
tr a l (regions, towns), ngy thng ca bo t cc ti liu khng cu trc hoc bn cu trc
co (date), s ca bo co (issues), ho cht s m my tnh c th c c. Trong hu ht cc
dng (chemicals), cc giai on pht trin cy trng hp, hot ng ny lin quan n x l
trng (developmental stage), s gy hi vi cy cc vn bn ngn ng con ngi hay ni cch
trng (crop damage), kh hu (climate), mc khc l x l ngn ng t nhin (Natural
tiu cc (negative). Cc quan h gia cc thc Language Processing)
th m chng ti quan tm: cy trng vi bnh Mc tiu chnh ca chng ti l trch xut
v cy trng vi sinh vt ph hoi. quan h gia thc th cy trng vi cc tc nhn
Php, hng tun cc nh nng hc s to gy hi cho cy trng cng vi mc gy hi
cc bo co thng tin cho ngi nng dn v ca chng. Trch xut thng tin l mt cng c
cc tn cng ca dch bnh v cn trng i vi tt trong x l ngn ng t nhin. Cc bc thc
cy trng. Mc tiu ca cc bo co ny l hin trong x l d liu trch xut thng tin:

977
Phan Trng Tin, Ng Cng Thng

Hnh 1. Bo co v dch bnh cy trng vng Bourgogne v Franche - Comt

Bc 1: Nhn bit cc thc th nh danh in v cy trng (crops), bnh (diseases), sinh


(Named Entity Recognition - NER) vt ph hoi (pets), cc sinh vt c li khc
Bc 2: Trch xut quan h (auxiliaries), v tr a l (regions, towns), ho
cht iu tr (chemicals). Cc t in c
Bc 3: Trch xut thng tin ng cnh nh
chng ti xy dng theo nguyn tc sau: t u
mc gy hi, giai on pht trin ca cy
l t kho gc, sau phn loi ca t , N l
trng, kh hu, a l...
gc (node) ca cc loi khc, L l l ca t loi
C rt nhiu gii thut v phng php
(leaf), vi mt t kho gc c th c cc dng
thc hin trch xut thc th nh danh (NER)
bin i ca n nh dng s t, s nhiu, khng
nh: cc thut ton v phn loi theo partern -
du, t ng ngha, t vit tt, v.v.
based (da theo cc quy lut trch xut ca cc
chuyn gia), cc thut ton v thng k nh b. S dng cc lut trch xut
HMM (Hidden Markov Model), MaXent C nhng loi thc th m chng ta khng
(Maximum Entropy Modeling) hay CRF th xy dng c t in cho thc th , v d
(Conditional Random Fields). nh cc giai on pht trin ca cy trng, hay
nh gi mc gy hi vi cy trng hay l d
2.2.1. Trch xut thc th nh danh liu kiu ngy thng, v.v. V vy chng ti phi
a. S dng t in c s xy dng cc lut trch xut s dng cng c
Khi trch xut d liu, c nhng thc th Unitex, c th xem ti a ch http: //www
chng ta c th xy dng cc t in ca thc igm.univ mlv.fr/~unitex/ (Paumier et al.), c
th thc hin cho vic trch xut, v d t pht trin bi i hc Paris Est. Cc lut trch

978
Cng c x.ent cho trch xut d liu thc th, quan h gia thc th v h tr phn tch d liu trong cc tp ch v
phng chng dch bnh trong nng nghip ca Php

Hnh 2. Cu trc t in v thng k t in m chng ti xy dng

xut ny chnh l cc automat hu hn, c - < t trong t in> < t kho nh du


xy dng bng giao din ho. V d trch kt thc>
xut d liu ngy thng nm trong bo co, - < t kho nh du bt u> < t trong
chng ti da theo cu trc d liu ngy thng t in>
trong cc vn bn mu v d chng c nh dng
2.2.2. Trch xut quan h
xx {January|February} xxxx th chng ta c
th xy dng quy lut nh hnh 3. Trch xut quan h gia cc thc th vn l
bi ton tng i phc tp, c nhiu phng
Trong d n ny, vi s h tr ca cc
php trch xut khc nhau c xut nh
chuyn gia v nng nghip chng ti xy dng
xy dng lut trch xut quan h, cc phng
cc lut trch xut hay chnh l ng php khc
php Bootstraping, Supervised, Distant
nhau cho vic lut trch xut, c mt s quy tc
Suppervision hay cc phng php
ly c d liu nh sau:
Unsupervised (Zettlemoyer, 2013). Chng ti
- < cc t trong t in> xut hai phng php trch xut quan h:
- < t kho nh du bt u>. < kt thc phng php phn tch cu trc ti liu v
cu> phng php m hnh hc khng gim st s
- < t kho nh du bt u>. < t kho dng tn sut xut hin d liu ca cc thc th
nh du kt thc> (co occurrence).

Hnh 3. Lut trch xut ngy thng c xy dng bng cng c Unitex

979
Phan Trng Tin, Ng Cng Thng

Hnh 4. Ng php trch xut nh gi mc gy hi vi cy trng

a. Phn tch cu trc ti liu Mt vi on trong vn bn c th cha cc


T chc ca mt ti liu (tiu , tiu tiu m trong on c th c cha cc thc
con, phn tham chiu, cc phn on, cc bng, th nhng n khng c lin kt vi thc th
cc nh, phn gii thiu, phn tng kt, phn chnh hoc thng tin ca ng cnh. V d nh
tho lun) c th nh hng ti vic trch xut. thng tin ph tr, hoc ch thch hoc thng tin
Chng ti gi y l kin trc ca mt ti liu. c chch t mt ngun d liu khc.
Tuy nhin nhiu kin trc l c sn v tp cc b. M hnh hc khng gim st s dng tn
heuristics l khng gii hn. sut xut hin
Heuristics 1: Thc th chnh nh ngha 1: n v vn bn v thc th
Thc th chnh xy ra v tr tiu hoc Mt n v vn bn (TU) l mt danh sch
tiu con ca on hoc ca mt phn ca lin kt m cha cc t W v cc thc th E.
on. Mt thc th c th l mt t hoc mt tp cc
Trong hnh 5 chng ta nhn thy rng thc t lin tip nhau.
th chnh xy ra u ca mi on, trong v d nh ngha 2: V tr thc th
ny l thc th cy trng (bl, betterave)
t Ei l mt thc th gc. Mt ti liu c
Heuristics 2: Ly gi tr u tin chia thnh cc n v vn bn (TU). Mt n v
Vi cc thc th khc nhau, c th trong d vn bn c th l mt phn ca mt on, mt
liu chng ta tm thy nhiu gi tr ca thc th cu hoc mt on vn. Gi l v tr ca cc t
, nhng chng ta ch ly gi tr u tin trong kho v l tiu ca thc th Ei trong ti
bo co . liu. Chng ta nh ngha mt ca s m WL l
Trong hnh 5 chng ta nhn thy cc thc s t ti v tr bn tri t v WR l s t bn
th nh v tr a l, ngy xut bn ca bo co, phi ca . WR c gi tr l ngha l ca s s
s ca bo co. bt u ti u ca vn bn, tng t nh vy
Heuristics 3: Vng khng tm kim WL c gi tr l , ca s s ti cui ca vn bn.

980
Cng c x.ent cho trch xut d liu th
c th, quan h gia thc th v h tr phn tch d li
u trong cc tp ch v
phng chng dch bnhnh trong nng nghi
nghip ca Php

Hnh 5. Ch thch bng tay trong mt ti liu ca d n


Ghi ch: Mu vng: cy trng, mu xanh l cy: cc giai on pht tri trin
n cy trng, mu nu: bnh cy trng, mu : v tr
a l, mu xanh da tri: sinh vt gy hi, mu ta: cc sinh vt c li, mu xanh en: thi gian

Kiu 1: Tn sut xut hin ca n v vn khc. Chng ta nh


nh ngha tn
t xut xut hin
bn. t Ei l thc th gc v Ej l mt thc th bi mt hm nh phn cooc(Ei,Ej) nh sau:

1 1
1, 2 3
cooc(Ei,Ej) =
0

Kiu 2: Tn sut xut hin ca ca s, hiu ca thc th (r cho vng, p cho cy


ging nh kiu 1, nhng tho mn: trng...) hoc quan h (p: m l quan h gia cy
trng v bnh...), tip theo l d liu trch
cooc(Ei,Ej) = 1 nu ( - WL)
xut gn vi thc th hoc quan h m chng ta
( +WR)
trch xut
ut c theo loi no .
Kiu 3: Cc rng buc tn sut xut hin, Ngoi ra nh gi hiu qu ca cng
ging nh kiu 1 hoc kiu 2. Nhng t mt c x.ent, chng ti so snh kt qu trch xut
danh sch cc im nh du mk, t nht mt vi cc cng c khc (http8, http9, 2014), chng
im nh du mk cn nm gia a Ei v Ej, v vy ti phi bin i d liu theo chun ca CoNLL
ta c: (Conference on Natural Language Learning) cho ch
cooc(Ei, Ej ) = 1 nu
u cc m hnh my hc s dng phng php
thng k. Chng ti phi thc hin s ho bng
tay 37 tp nh gi kt qu. nh dng d

liu (hnh 6 bn tri) gm hai ct: ct u tin l
2.2.3. nh dng d liu u
u vo v
u ra cc t c ct ra theo ng th t ca cc cu,
Kt qu trch xut c lu tr theo nh ct th 2 l phn loii ca t , O l t khng
dng ging nh dng CSV (hnh 6 b bn phi), thuc phn loi no, PLA l t thuc phn loi
u tin l tn ca tp bo co, tip theo l k tn cy trng, v.v.

981
Phan Trng Tin, Ng Cng Thng

Hnh 6. nh dng u vo v u ra theo chun CONLL v nh dng u ra ca x.ent

3. KT QU V THO LUN n v vn bn s thay i v bn tri v bn


phi so vi thc th gc. Hnh 7 hin th kt qu
3.1. nh gi kt qu trch xut
m chng ti thay i s ca n v vn bn t
nh gi hiu qu cng c x.ent, chng thc th gc, chng ti th nghim ca s bn
ti so snh kt qu trch xut vi cc cng c tri v bn phi thay i t 0 n 500 t. Chng
trch xut khc. ti nhn thy kt qu tt nht khi s t bn tri
Trc ht, v trch xut thc th nh tin dn ti 0 (gn ti thc th gc) v s t bn
danh, chng ti so snh vi cng c LingPipe phi tin dn ti 500.
(http9, 2014) s dng trch xut bng so khp Bng 2 cho chng ta bit kt qu trch xut
vi d liu trong t in v cng c SNER quan h trong tp d liu ny th phng php
(http8, 2014) s dng m hnh hc my c gim phn tch c php s hiu qu hn F - score
st CRF. khong 55%, trong khi phng php Coo -
Cc tham s cho vic nh gi kt qu l ccurrence khong 42%. Vi dng tp d liu c
F - score hay F1 (cng thc 3), Recall (cng thc cu trc, vic s dng phng php phn tch
2) v Precision (cng thc 1). cu trc tm ra mi quan h s hiu qu hn.
Kt qu trch xut ca x.ent cho kt qu tt Ngc li phng php Coo - currence s hiu
nh cng c Lingpipe. Lingpipe cng c cch cc qu hn vi tp d liu khng c cu trc. Trong
cch tip cn trn c s m hnh Hidden - cc bng di, PET l t vit tt ca thc th
markov nhng n cho kt qu t tt hn. sinh vt gy hi cy trng, MAL l bnh ca cy
Tip theo, chng ti so snh kt qu trch trng, PLA l thc th tn ca cy trng, REG l
xut ca x.ent s dng phn tch cu trc vi thc th v v tr a l, TOT l kt qu trung
cch tip cn Coo - currence vi cc tham s ca bnh ca cc thc th. PLA - MAL v PLA - PET
s khc nhau, tc l rng ca ca s ca mt l mi quan h ca cc thc th c nu trn.

982
Cng c x.ent cho trch xut d liu thc th, quan h gia thc th v h tr phn tch d liu trong cc tp ch v
phng chng dch bnh trong nng nghip ca Php

Hnh 7. So snh kt qu trch xut quan h s dng Coo - currence


bng vic thay th tham s cc ca s khc nhau

#
0 P 1, P = (1)
#
#
0 R 1, R = (2)
#

0 F1 1, F1 = (
(3)
)

Bng 1. nh gi kt qu trch xut thc th nh danh


X.ENT SNER LINGPIPE

P R F1 P R F1 P R F1

PET 96.46 95.52 95.98 92.66 71.41 80.52 96.45 95.53 95.99

MAL 96.97 95.53 96.24 95.46 77.38 85.38 96.97 95.52 96.24
PLA 88.80 98.67 93.47 93.99 82.68 87.94 88.80 98.67 93.47
REG 100 100 100 93.20 73.73 81.92 100 100 100

TOT 94.33 96.67 95.48 93.68 76.85 84.41 94.34 96.65 95.48

Bng 2. nh gi kt qu trch xut quan h gia cc thc th


X.ENT COOCCURRENCE
P R F1 P R F1
PLA - PET 53.4 75.8 52.7 36.4 50.5 42.3
PLA - MAL 58.1 69.5 63.3 41.3 38.7 40.0
TOT 55.3 73.1 62.9 38.1 45.4 41.4

983
Phan Trng Tin, Ng Cng Thng

3.2. Phn tch v thng k d liu sau trch xut kim quan h vi chng, e2 l mt thc th
Cng c x.ent c pht trin bng ngn khc loi v d "mouche du chou" l mt trng
ng Perl cho phn chc nng trch xut d liu hp ca thc th sinh vt gy hi cho cy trng,
"mildiou" l mt trng hp ca thc th bnh.
v quan h v c ng gi thnh mt gi R v
c sn trn R platform (R Development Core Trong R, bn c th nh nh sau:
Team). Gi cng c ny cng cung cp cc hm xplot(e1 = colza,e2 = c(mouche du chou,
trn R h tr cho ngi s dng phn tch v mildiou))
thm d kt qu sau khi trch xut nh: cc Chng ta c th thm cc rng buc v thi
th hin th s xut hin ng thi, biu tn gian nh:
xut, biu Venn, biu chng xp ln nhau xplot(e1 = colza,e2 = c(mouche du chou,
v s dng cc gi thuyt thng k kim tra mildiou),t = c(09.2010,02.2011))
mi lin h gia cc quan h. Nhn vo biu , ngi s dng c th bit
Trn hnh 8 chng ta nhn thy mt v d c tn ti quan h trong bo co no v
hin th song song ng thi gia hai thc th ngc li. Biu tng mu ch tn ti, mu
(e1 v e2), e1 l thc th gc m chng ta tm tm l khng tn ti trong bo co.

Hnh 8. Biu so snh s xut hin ng thi hay khng


ca cc thc th trong ti liu

Hnh 9. Biu hin th tn xut theo thi gian ca cc bo co

984
Cng c x.ent cho trch xut d liu thc th, quan h gia thc th v h tr phn tch d liu trong cc tp ch v
phng chng dch bnh trong nng nghip ca Php

Biu tn xut (histogram) thc hin Nhn vo th kt qu, chng ta bit rng
thng k c bao nhiu bo co cha thc th, cy "colza" l cy c ci ng c th b tn cng
hoc cha mt quan h no theo thi gian. bi "mouche du chou" l rui dm v "puceron"
Trong hnh 9 l cu lnh: l rp. Trong khi cc loi cy khc nh
xhist("colza: mildiou"), nhn vo th, ngi "tournesol" l cy hng dng, "mas" l cy
s dng c th bit c trong giai on no xut ng, "bl" l cy la m ch b tn cng bi
hin nhiu bnh "mildiou" vi cy "colza".
"puceron".
th dng chng xp l mt trng hp
Mt bi ton khc t ra sau khi trch xut
khc ngi s dng c th phn tch c
l phn tch s xut hin ng thi ca cc
quan h gia cc thc th, v d nh quan h
vi cy trng, da vo d liu trch xut, ngi thc th hoc cc quan h trong cc bo co.
s dng c th bit c cy trng no thng b Trong hnh 11 l v d so snh s xut hin
tn cng bi sinh vt ph hoi no, cn loi khc ng thi ca cc cy bl, orge de
th khng. Trong hnh 10 l cu lnh: printepmps v cy tournesol, chng ta c th
xprop(c("bl","mas","tournesol","colza"),c(" thc hin trong R nh sau:
mouche du chou", "puceron")) xvenn(c(bl,orge de printemps,tournesol)

Hnh 10. Biu dng chng xp

Hnh 11. Biu dng Venn

985
Phan Trng Tin, Ng Cng Thng

Bng 3. So snh cc cp quan h


Relation KOLMOGOROV WILCOXON STUDENT GrowthCurves
700 bl: mligthe/bl: thrips 1.00 0.13 0.13 0.02
543 bl: cicadelle/bl: pyrale 1.00 0.00 0.00 0.02
613 bl: criocre/bl: thrips 1.00 0.00 0.00 0.02
689 bl: mligthe/bl: puceron 0.91 0.00 0.00 0.02
des pis de crales

nh gi kh nng xut hin ng thi 3.3. Tch hp kt qu trch xut


ca cc quan h ca cc thc th khc nhau, Cng c x.ent thc hin trch xut thng
chng ti cng xut s dng cc phn b tin, kt qu l mt nh dng theo kiu CSV, v
xc sut nh gi tng ng ca cc vy thng s gy kh khn cho ngi s dng
quan h hay trong bi ton nh gi v cy thng thng. Chng ti xy dng mt ng
trng vi dch bnh, dng cc phn b xc dng Web c tn PESTOBSERVER, ti a ch
sut nh gi xem cc bnh no c th xy http: //www.pestobserver.eu, tch hp kt qu
ra cng thi im. Chng ti xut s trch xut d liu v c lin kt vi ti liu gc
dng cc phn b xc sut: Kolmogorov, ca bo co cy trng . Trn giao din ny cho
Wilcoxon, Student, GrowthCurves tnh php tm cy trng, quan h cy trng vi bnh
tng ng ca cc quan h vi nhau. Cc gi v sinh vt gy hi vi cy trng trong mt
tr p - value ny s gip ngi s dng nh khong thi gian no . Sau n s tm kim
gi cc cp quan h ny c xy ra ti cng mt a ra tt c cc bi bo co lin quan n ch
thi im hay khng. m ngi s dng cung cp.

Hnh 12. Giao din ngi dng cui tch hp kt qu x.ent

986
Cng c x.ent cho trch xut d liu thc th, quan h gia thc th v h tr phn tch d liu trong cc tp ch v
phng chng dch bnh trong nng nghip ca Php

4. KT LUN (INRA - Dijon center) v nhng gp cho


tng giao din, chc nng ngi dng cui, v
Chng ti xy dng thnh cng mt cng
ti Jean - Noel Aubertot (INRA - Toulouse
c c tn l x.ent v p dng cng c ny cho
center) v tng cho vic xy dng b d liu
trch xut thng tin vo trong cc d liu l cc
v phng chng dch bnh cho cy trng. Cm
bo co v phng chng dch bnh cho cy trng
n nhng ng nghip lm vic ti labo INRA -
ca Php. Cng c ny trch xut quan h
LIGM tr gip v cng ngh, k thut trong
crops/diseases v crops/pests c chnh xc F -
thi gian ti thc hin d n ca ti y.
score 62%.
Ngoi ra, chng ti cn xy dng c mt
TI LIU THAM KHO
platform giao din thn thin vi ngi s dng
m tch hp kt qu trch xut kt hp cng vi Abacha A.B., Zweigenbaum P. et Max A. (2012).
Extraction dinformation automatique en domaine
v tr a l ni xy ra dch bnh v lin kt vi mdical par projection inter - langue: vers un passage
bo co gc. lchelle (Automatic Information Extraction in the
Chng ti cng quan tm ti vic tr gip Medical Domain by Cross - Lingual Projection) [in
French]. La confrence JEP - TALN - RECITAL
ngi s dng khm ph cc mi quan h tim 2012, volume 2: TALN, p. 15 - 28.
nng gia cc thc th. Hai hng m chng ti Carpenter B. (2007). LingPipe for 99.99% Recall of
v ang tip tc thc hin: Th nht, cung Gene Mentions. Proceedings of the 2nd BioCreative
cp giao din trc quan di dng ho (cc workshop, Valencia, Spain.
th, bng biu) cho ngi s dng d dng so Constant M., Tellier I., Duchier D., Dupont Y., Sigogne
snh c kt qu v a ra cc nh gi nh A. et Billot S. (2011). Intgrer des connaissances
linguistiques dans un CRF: application
th so snh ng thi, th tn xut, biu
lapprentissage dun segmenteur - tiqueteur du
Venn, biu chng xp v p dng cc phn b franais. TALN. Montpellier, p. 1 - 12.
thng k nh gi kt qu. Th hai l tch Faure C., Delprat S., Mille A. et Boulicaut J. - F.
hp kt qu trch xut vo trong mt platform (2006). Utilisation des rseaux baysiens dans le
thn thin vi ngi dng kt hp vi cc thng cadre de lextraction de rgles dassociation. Actes
6me Journes Francophones Extraction et
tin thc t. , ngi s dng c th duyt
Gestion de Connaissances EGC06, p. 569 - 580.
qua tp liu thng qua quan h cc thng tin
Finkel J.R., Grenager T. and Manning C. (2005).
ph tr (v tr a l, mc thit hi) s dng Incorporating Non - local Information into
bn a l v c th phn hi li vi cc ti Information Extraction Systems by Gibbs
liu gc. Sampling. Proceedings of the 43rd Annual Meeting
on Association for Computational Linguistics
Ngn ng ting vit kh l phc tp so vi (Stroudsburg, PA, USA, 2005), p. 363 - 370.
ngn ng ting anh nh cu trc t, ng php... http1 Stackoverflow (2014). http: //stackoverflow.com.
Chng ti ang tip tc nghin cu nhm ci http2 Manuel dUtilisateur Writing R Extentions
tin cng c ny c th x l vi ngn ng (2014). http://cran.r-project.org/doc/manuals/R-
ting vit. exts.html.
http3 O beautiful code, How R Searches and Finds
Stuff (2014). http://obeautifulcode.com/R/How-
LI CM N R-Searches-And-Finds-Stuff/.
Ti xin gi li cm n c bit ti ngi http4 Prcision et rappel (2007).
http://benhur.teluq.ca/SPIP/inf6104/article.php3?id
hng dn ti Dr. Nicolas Turenne (Paris - Est
_article = 98&id_rubrique = 10&sem =
University), ngi cng st cnh vi ti trong Semaine%208.
thi gian thc hin d n; Prof. Kurt Hornik http5 Wilkipedia (2014). http://fr.wikipedia.org.
(Vienna University), ngi a ra nhng phn http6 Les Rsaux Baysiens (2014).
bin v kha cnh k thut; Roselyne Corbire http://w3.jouy.inra.fr/unites/miaj/public/matrisq/Co
(INRA - Rennes center) v Vincent Cellier ntacts/abari.07_ 03_12. expo2.pdf

987
Phan Trng Tin, Ng Cng Thng

http7 Traitement Automatique du Langage Naturel Moncla L. (2013). Automatic Annotation of Motion
(2014). http://lipn.univ- Expressions and Place Named Entities. 2nd
paris13.fr/~audibert/pages/enseignement Unitex/GramLab.
/TAL_ITCN.pdf. Paumier S. et Martineau C. (2006). Manuel
http8 Stanford Named Entity Recognizer dUtilisateur Unitex 3.1 Beta. Universit Paris - Est
(2014).http://nlp.stanford.edu/software/CRF- Marne - la - Valle. version 1.2.
NER.shtml. Sutton C. et McCallum A. (2010). An Introduction to
http9 LingPipe (2014)http://alias-i.com/lingpipe/. Conditional Random Fields for Relational
http10 Information Extraction And Named Entity Learning. 1011.4088 [stat], p. 5 - 32.
Recognition (2014). R Development Core Team, R (2015). A Language and
https://web.stanford.edu/class/cs124/lec/ Environment for Statistical Computing, R
Information_Extraction_and_Named_Entity_Reco Foundation for Statistical Computing, Vienna,
gnition.pdf. Austria, ISBN 3 - 900051 - 07 - 0 (2015).URL
http11 Les Rseaux Baysienes. http: //www.R - project.org/
http://www.bayesia.com/fr/technologie/reseaux- Tannier X. (2012). Traitement Automatique des
bayesiens.php. Langue. Universit Paris - Sud.
Lafferty J., McCallum A. et Pereira F. C. N. (2001). Turenne N. (2013). Knowledge Needs and Information
Conditional Random Fields: Probabilistic Models Extraction. Wiley - ISTE.
for Segmenting and Labeling Sequence Data. Dep. Zettlemoyer L. (2012). Relation Extraction. University
Pap. CIS. of Washington.

988

Das könnte Ihnen auch gefallen