Beruflich Dokumente
Kultur Dokumente
Yaregal Assabie
2014/15—Sem II
Natural Language Processing (NLP)
Definition
Levels of Linguistic Analysis
Importance of NLP
Disambiguation
Difficulty of NLP
Approaches to NLP
Brief History
Applications of NLP
Course Coverage and Knowledge Requirement
Related Fields
NLP: Definition
Natural Language
Natural language refers to human languages (Amharic, Afaan Oromo, Tigrigna, English,
Arabic, Chinese, etc.), as opposed to artificial/programming languages such as C++,
Java, Pascal, etc.
Natural language is represented using texts in spoken or written forms.
Department of Computer Science, Addis Ababa University Lecture 01: NLP: Background and Overview 2/30
Natural Language Processing (NLP)
Definition
Levels of Linguistic Analysis
Importance of NLP
Disambiguation
Difficulty of NLP
Approaches to NLP
Brief History
Applications of NLP
Course Coverage and Knowledge Requirement
Related Fields
NLP: Definition
...interdisciplinary field...
Several fields including linguistics, psycholinguistics, mathematics, computer science,
and electrical engineering contribute to the research and development of NLP.
...computational techniques...
Multiple models, methods and algorithms are employed to accomplish a particular type
of language analysis.
...naturally occurring texts...
Texts can be in spoken or written forms representing natural languages used by
humans to communicate to one another.
...levels of linguistic analysis...
Multiple types of language processing are known to be at work when humans produce
or comprehend language.
...human-like language processing...
NLP strives for human-like performance, and thus considered as a discipline within
Artificial Intelligence.
...tasks or applications...
The goal of NLP is to accomplish human-like language processing for various tasks and
applications such as machine translation, information retrieval, question-answering,
etc.
Department of Computer Science, Addis Ababa University Lecture 01: NLP: Background and Overview 3/30
Natural Language Processing (NLP)
Definition
Levels of Linguistic Analysis
Importance of NLP
Disambiguation
Difficulty of NLP
Approaches to NLP
Brief History
Applications of NLP
Course Coverage and Knowledge Requirement
Related Fields
NLP: Definition
Closely related (and overlapping) fields are Natural Language Understanding and
Computational Linguistics.
The field of NLP was originally referred to as Natural Language Understanding (NLU) in
the early days of Artificial Intelligence. A full NLU system would be able to:
paraphrase an input text
translate the text into another language
answer questions about the contents of the text
draw inferences from the text.
Department of Computer Science, Addis Ababa University Lecture 01: NLP: Background and Overview 4/30
Natural Language Processing (NLP)
Definition
Levels of Linguistic Analysis
Importance of NLP
Disambiguation
Difficulty of NLP
Approaches to NLP
Brief History
Applications of NLP
Course Coverage and Knowledge Requirement
Related Fields
NLP: Definition
An alternative view on NLP is that it is a computer system which uses natural language
as input and/or output. In this view, NLP is considered to have two distinct focuses—
Natural Language Understanding and Natural Language Generation.
The task of Natural Language Understanding is equivalent to the role of reader/listener,
whereas the task of Natural Language Generation is that of the writer/speaker.
Department of Computer Science, Addis Ababa University Lecture 01: NLP: Background and Overview 5/30
Natural Language Processing (NLP)
Definition
Levels of Linguistic Analysis
Importance of NLP
Disambiguation
Difficulty of NLP
Approaches to NLP
Brief History
Applications of NLP
Course Coverage and Knowledge Requirement
Related Fields
Department of Computer Science, Addis Ababa University Lecture 01: NLP: Background and Overview 6/30
Natural Language Processing (NLP)
Definition
Levels of Linguistic Analysis
Importance of NLP
Disambiguation
Difficulty of NLP
Approaches to NLP
Brief History
Applications of NLP
Course Coverage and Knowledge Requirement
Related Fields
• People generally don’t appreciate how intelligent they are as natural language processors.
For them natural language processing is deceptively simple because no conscious
effort is required.
• Since computers are orders of magnitude faster, many find it hard to believe that
computers are not good at processing natural languages.
Department of Computer Science, Addis Ababa University Lecture 01: NLP: Background and Overview 7/30
Natural Language Processing (NLP)
Definition
Levels of Linguistic Analysis
Importance of NLP
Disambiguation
Difficulty of NLP
Approaches to NLP
Brief History
Applications of NLP
Course Coverage and Knowledge Requirement
Related Fields
Research and development on NLP started along with the advent of computers.
The field emerged in the US from the strong desire of having a Machine Translation
system that automatically translates texts from Russian journals into English.
It was thought that the technical details of natural languages are manageable and
early work in machine translation took the simplistic view that the only differences
between languages resided in their vocabularies and the permitted word orders.
However, the initial efforts to develop an accurate machine translation system were
not successful as automatic translation could not be realized just by translating words.
It was then understood that human-like translations require analyses of languages at
different levels such as:
word level
phrase and sentence level
sequential sentences
whole text context
beyond the text (knowledge about the world).
Department of Computer Science, Addis Ababa University Lecture 01: NLP: Background and Overview 8/30
Natural Language Processing (NLP)
Definition
Levels of Linguistic Analysis
Importance of NLP
Disambiguation
Difficulty of NLP
Approaches to NLP
Brief History
Applications of NLP
Course Coverage and Knowledge Requirement
Related Fields
Such understandings helped many researchers and developers realize that they
needed a more adequate theory of language. Key contributors in this field include:
Noam Chomsky, in his work on generative grammars
Claude Shannon, in his work on applied probabilistic models to automata for
language.
John Backus and Peter Naur, in their work on context-free grammars for
programming languages.
Department of Computer Science, Addis Ababa University Lecture 01: NLP: Background and Overview 9/30
Natural Language Processing (NLP)
Definition
Levels of Linguistic Analysis
Importance of NLP
Disambiguation
Difficulty of NLP
Approaches to NLP
Brief History
Applications of NLP
Course Coverage and Knowledge Requirement
Related Fields
Morphology
Syntax Linguistics
Levels of Semantics
Linguistic Analysis Discourse Psycholinguistics +
Pragmatics Linguistics
Disambiguation
Rule-based Mathematics +
Approaches to NLP Statistical Psycholinguistics +
Connectionist Linguistics
Information Retrieval
Information Extraction
Dialogue Systems Computer Science +
Question-Answering Mathematics +
Applications of NLP Machine Translation Psycholinguistics +
Text Summarization Linguistics
(Spelling Correction)
(Grammar Checking)
(Speech Synthesis)
Related Fields Electrical Engineering +
Speech Recognition
(Signal Processing) Optical Character Recognition Computer Science
Department of Computer Science, Addis Ababa University Lecture 01: NLP: Background and Overview 10/30
Natural Language Processing (NLP)
Morphology
Levels of Linguistic Analysis
Syntax
Disambiguation
Semantics
Approaches to NLP
Discourse
Applications of NLP
Pragmatics
Related Fields
Morphology
Morphology is the study of the componential nature of words.
At morphological level, the smallest parts of words that carry meanings and affixes are
analyzed.
Department of Computer Science, Addis Ababa University Lecture 01: NLP: Background and Overview 11/30
Natural Language Processing (NLP)
Morphology
Levels of Linguistic Analysis
Syntax
Disambiguation
Semantics
Approaches to NLP
Discourse
Applications of NLP
Pragmatics
Related Fields
Syntax
Syntax refers to the study of structural relationships between words in a sentence.
Syntactic analysis requires both a grammar and a parser, the output of which is a
representation of the sentence that reveals the structural dependency relationships
between the words. This structural dependency can be represented using trees as
shown in the following examples.
Department of Computer Science, Addis Ababa University Lecture 01: NLP: Background and Overview 12/30
Natural Language Processing (NLP)
Morphology
Levels of Linguistic Analysis
Syntax
Disambiguation
Semantics
Approaches to NLP
Discourse
Applications of NLP
Pragmatics
Related Fields
Semantics
Semantics deals with the meaning of words, phrases and sentences.
Semantic analysis requires knowledge of:
Lexical semantics – the meanings of the component words
Compositional semantics – how components combine to form larger meanings
Department of Computer Science, Addis Ababa University Lecture 01: NLP: Background and Overview 13/30
Natural Language Processing (NLP)
Morphology
Levels of Linguistic Analysis
Syntax
Disambiguation
Semantics
Approaches to NLP
Discourse
Applications of NLP
Pragmatics
Related Fields
Discourse
Discourse level deals with the properties of the text as a whole that convey meaning by
making connections between component sentences.
Several methods are used in discourse processing, two of the most common being:
Anaphora resolution– replacing words such as pronouns, which are semantically
vacant, with the appropriate entity to which they refer; and
Discourse/text structure recognition– determining the functions of sentences in
the text (which adds to the meaningful representation of the text).
Department of Computer Science, Addis Ababa University Lecture 01: NLP: Background and Overview 14/30
Natural Language Processing (NLP)
Morphology
Levels of Linguistic Analysis
Syntax
Disambiguation
Semantics
Approaches to NLP
Discourse
Applications of NLP
Pragmatics
Related Fields
Pragmatics
Pragmatics is concerned with the purposeful use of language in situations and utilizes
context over and above the contents of the text for understanding.
Pragmatics deals with world knowledge – outside the contents of the document.
Department of Computer Science, Addis Ababa University Lecture 01: NLP: Background and Overview 15/30
Natural Language Processing (NLP)
Levels of Linguistic Analysis
Disambiguation
Disambiguation
Approaches to NLP
Applications of NLP
Related Fields
Disambiguation
Disambiguation
Disambiguation refers to the resolution of ambiguities that occur at different levels of
language analysis.
A given text is said to be ambiguous if there are multiple linguistic structures that can
be built for it.
Department of Computer Science, Addis Ababa University Lecture 01: NLP: Background and Overview 16/30
Natural Language Processing (NLP)
Levels of Linguistic Analysis
Rule-based
Disambiguation
Statistical
Approaches to NLP
Connectionist
Applications of NLP
Related Fields
Rule-based Approach
Rule-based systems are based on explicit representation of facts about language
through well-understood knowledge representation schemes and associated
algorithms.
Rule-based systems usually consist of a set of rules, an inference engine, and a
workspace or working memory.
Knowledge is represented as facts or rules in the rule-based approach.
The inference engine repeatedly selects a rule whose condition is satisfied and
executes the rule.
The primary source of evidence in rule-based systems comes from human-developed
rules (e.g. grammatical rules) and lexicons.
Rule-based approaches have been used tasks such as information extraction, text
categorization, ambiguity resolution, and so on.
Department of Computer Science, Addis Ababa University Lecture 01: NLP: Background and Overview 17/30
Natural Language Processing (NLP)
Levels of Linguistic Analysis
Rule-based
Disambiguation
Statistical
Approaches to NLP
Connectionist
Applications of NLP
Related Fields
Statistical Approach
Statistical approaches employ various mathematical techniques and often use large
text corpora to develop approximate generalized models of linguistic phenomena
based on actual examples of these phenomena provided by the text corpora without
adding significant linguistic or world knowledge.
The primary source of evidence in statistical systems comes from observable data (e.g.
large text corpora).
Statistical approaches have typically been used in tasks such as speech recognition,
parsing, part-of-speech tagging, statistical machine translation, statistical grammar
learning, and so on.
Department of Computer Science, Addis Ababa University Lecture 01: NLP: Background and Overview 18/30
Natural Language Processing (NLP)
Levels of Linguistic Analysis
Rule-based
Disambiguation
Statistical
Approaches to NLP
Connectionist
Applications of NLP
Related Fields
Connectionist Approach
A connectionist model is a network of interconnected simple processing units with
knowledge stored in the weights of the connections between units.
Similar to the statistical approaches, connectionist approaches also develop
generalized models from examples of linguistic phenomena.
What separates connectionism from other statistical methods is that connectionist
models combine statistical learning with various theories of representation.
In addition, in connectionist systems, linguistic models are harder to observe due to the
fact that connectionist architectures are less constrained than statistical ones.
Connectionist approaches have been used in tasks such as word-sense disambiguation,
language generation, syntactic parsing, limited domain translation tasks, and so on.
Department of Computer Science, Addis Ababa University Lecture 01: NLP: Background and Overview 19/30
Spelling Correction and Grammar Checking
Natural Language Processing (NLP)
Information Retrieval
Levels of Linguistic Analysis
Information Extraction
Disambiguation
Machine Translation
Approaches to NLP
Question-Answering
Applications of NLP
Dialogue Systems
Related Fields
Text Summarization
Department of Computer Science, Addis Ababa University Lecture 01: NLP: Background and Overview 20/30
Spelling Correction and Grammar Checking
Natural Language Processing (NLP)
Information Retrieval
Levels of Linguistic Analysis
Information Extraction
Disambiguation
Machine Translation
Approaches to NLP
Question-Answering
Applications of NLP
Dialogue Systems
Related Fields
Text Summarization
Information Retrieval
Information Retrieval provides a list of potentially relevant documents in response to a
user’s query.
ድር አማራጮችን አሳይ… 38,700 ከሚሆኑ የፍልጋ ውጤቶች ከ1 - 10 ስለ አዲስ አበባ ዩኒቨርሲቲ። (0.28 ሰከንድ)
Information Extraction
Information Extraction focuses on the recognition, tagging, and extraction of certain
key elements of information (e.g. persons, companies, locations, organizations, etc.)
from large collections of text into a structured representation.
Example
Text: Firm XYZ is a full service advertising agency specializing in direct and
interactive marketing. Located in Bole, Addis Ababa, Firm XYZ is looking for an
Assistant Account Manager to help manage and coordinate interactive
marketing initiatives. Experience in online marketing and/or the advertising
field is a plus. Depending on the experiences of the applicants, the company
pays an attractive salary of Birr 3,000- Birr 5,000 per month.
Extracted Information:
INDUSTRY Advertising
POSITION Assistant Account Manager
LOCATION Bole, Addis Ababa.
COMPANY Firm XYZ
SALARY Birr 3,000 - Birr 5,000 per month
Department of Computer Science, Addis Ababa University Lecture 01: NLP: Background and Overview 22/30
Spelling Correction and Grammar Checking
Natural Language Processing (NLP)
Information Retrieval
Levels of Linguistic Analysis
Information Extraction
Disambiguation
Machine Translation
Approaches to NLP
Question-Answering
Applications of NLP
Dialogue Systems
Related Fields
Text Summarization
Translate from:
Translate into:
Addis Ababa is the capital of Ethiopia. The city was built in the late 1800s by Emperor Menelik.
The site was selected because of the hot springs, which were considered sacred. The
emperor had built palaces and other monumental buildings by foreign architects and builders.
Architectural styles from Switzerland, India and Yemen were mixed and raised a stone and
wooden architecture known as the Addis Ababa-style. The railway, linking Djibouti with Addis
Ababa (via Dire Dawa), were built in the 1930s.
Department of Computer Science, Addis Ababa University Lecture 01: NLP: Background and Overview 23/30
Spelling Correction and Grammar Checking
Natural Language Processing (NLP)
Information Retrieval
Levels of Linguistic Analysis
Information Extraction
Disambiguation
Machine Translation
Approaches to NLP
Question-Answering
Applications of NLP
Dialogue Systems
Related Fields
Text Summarization
Question-Answering
Question-Answering provides the user with either just the text of the answer itself or
answer-providing passages.
Department of Computer Science, Addis Ababa University Lecture 01: NLP: Background and Overview 24/30
Spelling Correction and Grammar Checking
Natural Language Processing (NLP)
Information Retrieval
Levels of Linguistic Analysis
Information Extraction
Disambiguation
Machine Translation
Approaches to NLP
Question-Answering
Applications of NLP
Dialogue Systems
Related Fields
Text Summarization
Dialogue Systems
Dialogue Systems are agents that converse with human beings in a coherent structure
using several modes of communication such as text, speech, gesture, etc.
Department of Computer Science, Addis Ababa University Lecture 01: NLP: Background and Overview 25/30
Spelling Correction and Grammar Checking
Natural Language Processing (NLP)
Information Retrieval
Levels of Linguistic Analysis
Information Extraction
Disambiguation
Machine Translation
Approaches to NLP
Question-Answering
Applications of NLP
Dialogue Systems
Related Fields
Text Summarization
Text Summarization
Text Summarization is an application of NLP that reduces a larger text into a shorter,
yet richly constituted representation of the original document.
Department of Computer Science, Addis Ababa University Lecture 01: NLP: Background and Overview 26/30
Natural Language Processing (NLP)
Levels of Linguistic Analysis
Modes of Language Representation
Disambiguation
Speech Recognition
Approaches to NLP
Optical Character Recognition
Applications of NLP
Related Fields
Department of Computer Science, Addis Ababa University Lecture 01: NLP: Background and Overview 27/30
Natural Language Processing (NLP)
Levels of Linguistic Analysis
Modes of Language Representation
Disambiguation
Speech Recognition
Approaches to NLP
Optical Character Recognition
Applications of NLP
Related Fields
Speech Recognition
Speech Recognition is the process of converting spoken words (acoustic signals) into
equivalent text.
Speech Synthesis, also known as Text-to-Speech system, performs the reverse process,
i.e. artificially produces human speech from a given text.
Speech Recognition
Speech Synthesis
Department of Computer Science, Addis Ababa University Lecture 01: NLP: Background and Overview 28/30
Natural Language Processing (NLP)
Levels of Linguistic Analysis
Modes of Language Representation
Disambiguation
Speech Recognition
Approaches to NLP
Optical Character Recognition
Applications of NLP
Related Fields
Printing
Writing
OCR
ICR
Editable Text
Department of Computer Science, Addis Ababa University Lecture 01: NLP: Background and Overview 29/30
TOC: Course Syllabus
Previous:
Current: NLP: Background and Overview