Sie sind auf Seite 1von 28

Introduction

ASM Algorithm
OWL Framework
Future Works
Bibliography

An OWL framework for rule-based recognition of


places in Italian non-structured text
Domenico Cantone Andrea Fornaia
Marianna Nicolosi-Asmundo Daniele Francesco Santamaria
Emiliano Tramontana
Department of Mathematics and Computer Science, University of Catania

KDWEB 2016 - Cagliari, September 9, 2016


Work supported by the project PRIME - Piattaforma di Reasoning Integrata,
Multimediale, Esperta - PON FESR Sicilia 2007/2013 and by the FIR project
COMPACT: Computazione affidabile su testi firmati, code: D84C46.
Cantone, Fornaia, Nicolosi-Asmundo, Santamaria, Tramontana

KDWEB 2016 - Cagliari, September 9, 2016

1 / 28

Introduction
ASM Algorithm
OWL Framework
Future Works
Bibliography

Contents

Introduction
Motivation
Approach Used

ASM Algorithm
Pipe and Filter
Multi-Agent Model
Phase 1: Preliminaries
Phase 2: Rules
Phase 3: Further filtering

OWL Framework

Cantone, Fornaia, Nicolosi-Asmundo, Santamaria, Tramontana

Semantic Web: Overview


The Ontologies
The Ontologies: Places
Modelling
The Ontologies:
Estimation Modelling
The Ontologies:
Algorithms Modelling
Examples
Future Works
Bibliography
KDWEB 2016 - Cagliari, September 9, 2016

2 / 28

Introduction
ASM Algorithm
OWL Framework
Future Works
Bibliography

Motivation
Approach Used

Motivation
Recognition of location names inside non-structured text has
several practical applications:
investigative field to analyse interception transcriptions.
social media context to reveal visited places to target
advertisements.

Different approaches were studied in the literature:


Automatic learning, maximum entropy, conditional random
fields...

Linked data and ontologies have been used to address the


question only in the last decade.
Existing approaches mainly focus on English; they are hard to
generalize for other languages.
Cantone, Fornaia, Nicolosi-Asmundo, Santamaria, Tramontana

KDWEB 2016 - Cagliari, September 9, 2016

3 / 28

Introduction
ASM Algorithm
OWL Framework
Future Works
Bibliography

Motivation
Approach Used

Approach Used

Recognition of location names belonging to the Italian


country appearing in non-structured Italian text.
We first apply a Rule-based algorithm to detect location
names.
Then we use several semantic web tools to store data and
reason on them even in case of ambiguities:
open geographic datasets.
OWL ontologies.

Cantone, Fornaia, Nicolosi-Asmundo, Santamaria, Tramontana

KDWEB 2016 - Cagliari, September 9, 2016

4 / 28

Introduction
ASM Algorithm
OWL Framework
Future Works
Bibliography

Pipe and
Phase 1:
Phase 2:
Phase 3:

Filter Multi-Agent Model


Preliminaries
Rules
Further filtering

Pipe and Filter Multi-Agent Model


Agents implementing grammar rules to spot a location name.
Il vertice che si terra oggi a Bruxelles (The meeting taking
place today in Bruxelles)

Three phases, connected in a pipe & filter style.


Preliminary results: F1 score is 0.67 (max).
Rule 1
sentence
splitting

Rule 2

Filter0
capital
letters

Filter1
nonplaces

Filter2
verbs

Rule 3

Phase 1

Phase 2

Phase 3

Cantone, Fornaia, Nicolosi-Asmundo, Santamaria, Tramontana

KDWEB 2016 - Cagliari, September 9, 2016

5 / 28

Introduction
ASM Algorithm
OWL Framework
Future Works
Bibliography

Pipe and
Phase 1:
Phase 2:
Phase 3:

Filter Multi-Agent Model


Preliminaries
Rules
Further filtering

Phase 1: Preliminaries

Longer texts, e.g. a novel, are split into sentences.


A sentence is tokenised based on a lexicon.
Articles. Italian definite and indefinite articles.
Prepositions. Both simple and composite ones, excluding con
(with).
Verbs. Verbs that may be related to places.
Descriptors. Adverbs that may be related to places.
Non-places. Additional customisable list of words already
known as false positives.

Cantone, Fornaia, Nicolosi-Asmundo, Santamaria, Tramontana

KDWEB 2016 - Cagliari, September 9, 2016

6 / 28

Introduction
ASM Algorithm
OWL Framework
Future Works
Bibliography

Pipe and
Phase 1:
Phase 2:
Phase 3:

Filter Multi-Agent Model


Preliminaries
Rules
Further filtering

Phase 2: Rules
Three rules, based on Italian sentence patterns possibly
related to a location.
Each agent implements a rule, by using an automaton.
It works at the token level.
The accepting state of the automaton identifies a candidate
location.
Each rule is independent from the others.

Rule 1: Da Roma.
Rule 2: Vicino a Roma.
Rule 3: Andando a Roma.

Cantone, Fornaia, Nicolosi-Asmundo, Santamaria, Tramontana

KDWEB 2016 - Cagliari, September 9, 2016

7 / 28

Introduction
ASM Algorithm
OWL Framework
Future Works
Bibliography

Pipe and
Phase 1:
Phase 2:
Phase 3:

Filter Multi-Agent Model


Preliminaries
Rules
Further filtering

Phase 2: Rule 1 (Da Roma)

Cantone, Fornaia, Nicolosi-Asmundo, Santamaria, Tramontana

KDWEB 2016 - Cagliari, September 9, 2016

8 / 28

Introduction
ASM Algorithm
OWL Framework
Future Works
Bibliography

Pipe and
Phase 1:
Phase 2:
Phase 3:

Filter Multi-Agent Model


Preliminaries
Rules
Further filtering

Phase 2: Rule 2 & Rule 3

Cantone, Fornaia, Nicolosi-Asmundo, Santamaria, Tramontana

KDWEB 2016 - Cagliari, September 9, 2016

9 / 28

Introduction
ASM Algorithm
OWL Framework
Future Works
Bibliography

Pipe and
Phase 1:
Phase 2:
Phase 3:

Filter Multi-Agent Model


Preliminaries
Rules
Further filtering

Phase 2: Rule 1, 2 and 3 Unified

Cantone, Fornaia, Nicolosi-Asmundo, Santamaria, Tramontana

KDWEB 2016 - Cagliari, September 9, 2016

10 / 28

Introduction
ASM Algorithm
OWL Framework
Future Works
Bibliography

Pipe and
Phase 1:
Phase 2:
Phase 3:

Filter Multi-Agent Model


Preliminaries
Rules
Further filtering

Phase 3: Further filtering


Filter 0. (Optional) Places are only upper-case words.
Strong assumption (e.g. false for automatic transcriptions).

Filter 1. Places are not known FPs.


User-defined dictionary with found FPs.
Andare in tilt. (Rule 3)

Filter 2. Places are not (conjugated) verbs.


Stemming of candidate words. Filtered if recognised as a verb.
Il ministro esce camminando lentamente. (Rule 3)

Remaining positives are accepted as Locations and given to


the Ontology Model.

Cantone, Fornaia, Nicolosi-Asmundo, Santamaria, Tramontana

KDWEB 2016 - Cagliari, September 9, 2016

11 / 28

Introduction
ASM Algorithm
OWL Framework
Future Works
Bibliography

Semantic Web: Overview


The Ontologies
The Ontologies: Places Modelling
The Ontologies: Estimation Modelling
The Ontologies: Algorithms Modelling
Examples

Overview of the Semantic Web


The Semantic Web provides a common framework that allows
data to be shared and reused across application, enterprise, and
community boundaries Tim Berners-Lee
How data can be shared and reused?
Providing information an explicit meaning, so it can be
automatically processed and integrated by machines.
With automated reasoning procedures, implicit information in
data can extracted allowing to gain a deeper knowledge of the
domain.
Several tools are available.
Cantone, Fornaia, Nicolosi-Asmundo, Santamaria, Tramontana

KDWEB 2016 - Cagliari, September 9, 2016

12 / 28

Introduction
ASM Algorithm
OWL Framework
Future Works
Bibliography

Semantic Web: Overview


The Ontologies
The Ontologies: Places Modelling
The Ontologies: Estimation Modelling
The Ontologies: Algorithms Modelling
Examples

Overview of the Semantic Web

What do we need?
A markup language such as XML.
IRI (Internationalized Resource Identifier).
Formal semantics and strict syntax.
Description language.
Description of the world we want to model (Ontology).
Reasoner and query language.

Cantone, Fornaia, Nicolosi-Asmundo, Santamaria, Tramontana

KDWEB 2016 - Cagliari, September 9, 2016

13 / 28

Introduction
ASM Algorithm
OWL Framework
Future Works
Bibliography

Semantic Web: Overview


The Ontologies
The Ontologies: Places Modelling
The Ontologies: Estimation Modelling
The Ontologies: Algorithms Modelling
Examples

Overview of the Semantic Web

How we provide a description?


Thinking about the world in terms of Subject-Predicate-Object
(Triple).
Using default vocabularies to define set, individuals, datatypes,
and predicate features.
Combining our data with descriptions provided by other users.
Sharing data.

Cantone, Fornaia, Nicolosi-Asmundo, Santamaria, Tramontana

KDWEB 2016 - Cagliari, September 9, 2016

14 / 28

Introduction
ASM Algorithm
OWL Framework
Future Works
Bibliography

Semantic Web: Overview


The Ontologies
The Ontologies: Places Modelling
The Ontologies: Estimation Modelling
The Ontologies: Algorithms Modelling
Examples

Overview of the Semantic Web


What does Triple mean?
- Michelangelo made the Sistine Chapel ceiling
<http://wwww.unict.it/art.rdf#Michelangelo>
<http://wwww.unict.it/art.rdf#makes>
<http://wwww.unict.it/art.rdf#Sistine Chapel Ceiling>
- Michelangelo is an artist
<http://wwww.unict.it/art.rdf#Michelangelo>
<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
<http://wwww.unict.it/art.rdf#Artist>
Cantone, Fornaia, Nicolosi-Asmundo, Santamaria, Tramontana

KDWEB 2016 - Cagliari, September 9, 2016

15 / 28

Introduction
ASM Algorithm
OWL Framework
Future Works
Bibliography

Semantic Web: Overview


The Ontologies
The Ontologies: Places Modelling
The Ontologies: Estimation Modelling
The Ontologies: Algorithms Modelling
Examples

The Ontologies

OWL\SWRL.
Editor Protege.
Reuse LinkedGeoData ontology.
OpenStreetMap Dataset.
Compliant OWL\SWRL Reasoner such as Pellet.
OWL-API and JAVA to develop software interaction with
ontologies.
SPARQL query processor such as JENA.

Cantone, Fornaia, Nicolosi-Asmundo, Santamaria, Tramontana

KDWEB 2016 - Cagliari, September 9, 2016

16 / 28

Introduction
ASM Algorithm
OWL Framework
Future Works
Bibliography

Semantic Web: Overview


The Ontologies
The Ontologies: Places Modelling
The Ontologies: Estimation Modelling
The Ontologies: Algorithms Modelling
Examples

The Ontologies: Places Modelling

Cantone, Fornaia, Nicolosi-Asmundo, Santamaria, Tramontana

KDWEB 2016 - Cagliari, September 9, 2016

17 / 28

Introduction
ASM Algorithm
OWL Framework
Future Works
Bibliography

Semantic Web: Overview


The Ontologies
The Ontologies: Places Modelling
The Ontologies: Estimation Modelling
The Ontologies: Algorithms Modelling
Examples

The Ontologies: Estimation Modelling

Modelling Localisation Estimation


A candidate on the text.
An estimation (method and tools used).
One or more match with respective degree of belief.
Geographical places for every match.
Greatest degree, best result
What if estimation changes?

Cantone, Fornaia, Nicolosi-Asmundo, Santamaria, Tramontana

KDWEB 2016 - Cagliari, September 9, 2016

18 / 28

Introduction
ASM Algorithm
OWL Framework
Future Works
Bibliography

Semantic Web: Overview


The Ontologies
The Ontologies: Places Modelling
The Ontologies: Estimation Modelling
The Ontologies: Algorithms Modelling
Examples

The Ontologies: Estimation Modelling

Cantone, Fornaia, Nicolosi-Asmundo, Santamaria, Tramontana

KDWEB 2016 - Cagliari, September 9, 2016

19 / 28

Introduction
ASM Algorithm
OWL Framework
Future Works
Bibliography

Semantic Web: Overview


The Ontologies
The Ontologies: Places Modelling
The Ontologies: Estimation Modelling
The Ontologies: Algorithms Modelling
Examples

The Ontologies: Algorithms Modelling

Cantone, Fornaia, Nicolosi-Asmundo, Santamaria, Tramontana

KDWEB 2016 - Cagliari, September 9, 2016

20 / 28

Introduction
ASM Algorithm
OWL Framework
Future Works
Bibliography

Semantic Web: Overview


The Ontologies
The Ontologies: Places Modelling
The Ontologies: Estimation Modelling
The Ontologies: Algorithms Modelling
Examples

The Ontologies: Algorithms Modelling

Cantone, Fornaia, Nicolosi-Asmundo, Santamaria, Tramontana

KDWEB 2016 - Cagliari, September 9, 2016

21 / 28

Introduction
ASM Algorithm
OWL Framework
Future Works
Bibliography

Semantic Web: Overview


The Ontologies
The Ontologies: Places Modelling
The Ontologies: Estimation Modelling
The Ontologies: Algorithms Modelling
Examples

Examples

Cantone, Fornaia, Nicolosi-Asmundo, Santamaria, Tramontana

KDWEB 2016 - Cagliari, September 9, 2016

22 / 28

Introduction
ASM Algorithm
OWL Framework
Future Works
Bibliography

Semantic Web: Overview


The Ontologies
The Ontologies: Places Modelling
The Ontologies: Estimation Modelling
The Ontologies: Algorithms Modelling
Examples

Examples

Cantone, Fornaia, Nicolosi-Asmundo, Santamaria, Tramontana

KDWEB 2016 - Cagliari, September 9, 2016

23 / 28

Introduction
ASM Algorithm
OWL Framework
Future Works
Bibliography

Semantic Web: Overview


The Ontologies
The Ontologies: Places Modelling
The Ontologies: Estimation Modelling
The Ontologies: Algorithms Modelling
Examples

Examples

Cantone, Fornaia, Nicolosi-Asmundo, Santamaria, Tramontana

KDWEB 2016 - Cagliari, September 9, 2016

24 / 28

Introduction
ASM Algorithm
OWL Framework
Future Works
Bibliography

Semantic Web: Overview


The Ontologies
The Ontologies: Places Modelling
The Ontologies: Estimation Modelling
The Ontologies: Algorithms Modelling
Examples

Examples

Cantone, Fornaia, Nicolosi-Asmundo, Santamaria, Tramontana

KDWEB 2016 - Cagliari, September 9, 2016

25 / 28

Introduction
ASM Algorithm
OWL Framework
Future Works
Bibliography

Future Works

Extending the algorithm with other rules and filtering;


Extending both algorithm and ontology with Context
Sensitivity;
Using a local dataset including also data from local
government or final users;
Applying to Digital Humanities: OntoCeramic and Catania
Benedectine Monastery (San Nicol
o lArena).

Cantone, Fornaia, Nicolosi-Asmundo, Santamaria, Tramontana

KDWEB 2016 - Cagliari, September 9, 2016

26 / 28

Introduction
ASM Algorithm
OWL Framework
Future Works
Bibliography

Bibliography
E. Agichtein and L. Gravano. Snowball: Extracting relations from large
plain-text collections. In Proceedings of the fifth ACM conference on Digital
libraries. ACM, 2000, pp. 8594.
A. Ballatore, D. C. Wilson, and M. Bertolotto. A Survey of Volunteered Open
Geo-Knowledge Bases in the Semantic Web. Quality Issues in the Management
of Web Information, ISRL 50, pp. 93120, Springer, 2013.
D. Cantone, C. Longo, M. Nicolosi-Asmundo, and D. F. Santamaria, Web
ontology representation and reasoning via fragments of set theory, in Web
Reasoning and Rule Systems - 9th International Conference, RR 2015, Berlin,
Germany, August 4-5, 2015, Proceedings, 2015, pp. 61-76.
D. Cantone, M. Nicolosi-Asmundo, D. F. Santamaria, and F. Trapani.
Ontoceramic: an OWL ontology for ceramics classification. In Proc. of the 30th
Italian Conference on Computational Logic, CILC 2015, Genova, Italy, July 1-3,
2015, CEUR Workshop Proceedings, ISSN 1613-0073, vol. 1459, pp. 122127.
D. Caruso, R. Giunta, D. Messina, G. Pappalardo, and E. Tramontana.
Rule-based location extraction from italian unstructured text. In Proc. of the
16th Workshop From Objects to Agents, WOA 2015, Naples, Italy, June
17-19, 2015, CEUR Workshop Proc., ISSN 1613-0073, vol. 1382, pp. 4652.
Cantone, Fornaia, Nicolosi-Asmundo, Santamaria, Tramontana

KDWEB 2016 - Cagliari, September 9, 2016

27 / 28

Introduction
ASM Algorithm
OWL Framework
Future Works
Bibliography

Bibliography
J. Lafferty, A. McCallum, and F. C. Pereira, Conditional random fields:
Probabilistic models for segmenting and labeling sequence data. In Proceedings
of the Eighteenth International Conference on Machine Learning (ICML), 2001,
pp. 282289.
LinkedGeoData, linkedgeodata.org/.
Lyndon J. B. Nixon, R. Volz, F. Ciravegna, and R. Studer. Ontology based
entity disambiguation with natural language patterns. Fourth International
Conference on Digital Information Management, ICDIM 2009, November 1-4,
2009, University of Michigan, Ann Arbor, Michigan, USA, pp. 1926.
K. Nigam, J. Lafferty, and A. McCallum. Using maximum entropy for text
classification. In IJCAI-99 workshop on machine learning for information
filtering, vol. 1, 1999, pp. 6167.
Ontology Web Language, http://www.w3.org/2001/sw/wiki/OWL.
OpenStreetMap, www.openstreetmap.org/.
S. Sarawagi. Information extraction. Foundations and trends in databases, vol.
1, no. 3, pp. 261377, 2008.
Cantone, Fornaia, Nicolosi-Asmundo, Santamaria, Tramontana

KDWEB 2016 - Cagliari, September 9, 2016

28 / 28

Das könnte Ihnen auch gefallen