Sie sind auf Seite 1von 30

Natural Language Processing (COSC 709)

Lecture 01: NLP: Background and Overview

Department of Computer Science,


Addis Ababa University

Yaregal Assabie

2014/15—Sem II
Natural Language Processing (NLP)
Definition
Levels of Linguistic Analysis
Importance of NLP
Disambiguation
Difficulty of NLP
Approaches to NLP
Brief History
Applications of NLP
Course Coverage and Knowledge Requirement
Related Fields

NLP: Definition

Natural Language
Natural language refers to human languages (Amharic, Afaan Oromo, Tigrigna, English,
Arabic, Chinese, etc.), as opposed to artificial/programming languages such as C++,
Java, Pascal, etc.
Natural language is represented using texts in spoken or written forms.

Natural Language Processing


NLP is the computerized approach to analyzing text that is based on both a set of
theories and a set of technologies.
A more comprehensive definition of NLP is given as:
An interdisciplinary field of study dealing with computational techniques for analyzing
and representing naturally occurring texts at one or more levels of linguistic analysis
for the purpose of achieving human-like language processing for a range of tasks or
applications.

Department of Computer Science, Addis Ababa University Lecture 01: NLP: Background and Overview 2/30
Natural Language Processing (NLP)
Definition
Levels of Linguistic Analysis
Importance of NLP
Disambiguation
Difficulty of NLP
Approaches to NLP
Brief History
Applications of NLP
Course Coverage and Knowledge Requirement
Related Fields

NLP: Definition

...interdisciplinary field...
Several fields including linguistics, psycholinguistics, mathematics, computer science,
and electrical engineering contribute to the research and development of NLP.
...computational techniques...
Multiple models, methods and algorithms are employed to accomplish a particular type
of language analysis.
...naturally occurring texts...
Texts can be in spoken or written forms representing natural languages used by
humans to communicate to one another.
...levels of linguistic analysis...
Multiple types of language processing are known to be at work when humans produce
or comprehend language.
...human-like language processing...
NLP strives for human-like performance, and thus considered as a discipline within
Artificial Intelligence.
...tasks or applications...
The goal of NLP is to accomplish human-like language processing for various tasks and
applications such as machine translation, information retrieval, question-answering,
etc.

Department of Computer Science, Addis Ababa University Lecture 01: NLP: Background and Overview 3/30
Natural Language Processing (NLP)
Definition
Levels of Linguistic Analysis
Importance of NLP
Disambiguation
Difficulty of NLP
Approaches to NLP
Brief History
Applications of NLP
Course Coverage and Knowledge Requirement
Related Fields

NLP: Definition

Closely related (and overlapping) fields are Natural Language Understanding and
Computational Linguistics.
The field of NLP was originally referred to as Natural Language Understanding (NLU) in
the early days of Artificial Intelligence. A full NLU system would be able to:
paraphrase an input text
translate the text into another language
answer questions about the contents of the text
draw inferences from the text.

Computational Linguistics emerged as a field of study in linguistics with the purpose of


providing computational models for various linguistics phenomena.
Currently, Natural Language Understanding is considered as the goal of NLP which has
not been yet accomplished, whereas Computational Linguistics is conceived as related
term used by linguists with more emphasis on the linguistics side than NLP.

Department of Computer Science, Addis Ababa University Lecture 01: NLP: Background and Overview 4/30
Natural Language Processing (NLP)
Definition
Levels of Linguistic Analysis
Importance of NLP
Disambiguation
Difficulty of NLP
Approaches to NLP
Brief History
Applications of NLP
Course Coverage and Knowledge Requirement
Related Fields

NLP: Definition

An alternative view on NLP is that it is a computer system which uses natural language
as input and/or output. In this view, NLP is considered to have two distinct focuses—
Natural Language Understanding and Natural Language Generation.
The task of Natural Language Understanding is equivalent to the role of reader/listener,
whereas the task of Natural Language Generation is that of the writer/speaker.

Natural Language Natural Language


(spoken or written) (spoken or written)

Natural Language Understanding


Natural Language Generation

Department of Computer Science, Addis Ababa University Lecture 01: NLP: Background and Overview 5/30
Natural Language Processing (NLP)
Definition
Levels of Linguistic Analysis
Importance of NLP
Disambiguation
Difficulty of NLP
Approaches to NLP
Brief History
Applications of NLP
Course Coverage and Knowledge Requirement
Related Fields

NLP: Importance of NLP

• Natural language is the preferred medium of communication for people.


™ People communicate with each other in natural languages.
™ Scientific articles, magazines etc. are all in natural languages.
™ Billions of web pages are also in natural languages.

• Computers can do useful things for us if:


™ Data is in structured form, e.g. databases, knowledge bases.
™ Specifications are in formal language, e.g. programming languages.

• NLP bridges the communication gap between people and computers.


™ Can lead to a better and a more natural communication with computers.
™ Process an ever increasing amount of natural language data generated by people,
e.g. extract required information from web.

Department of Computer Science, Addis Ababa University Lecture 01: NLP: Background and Overview 6/30
Natural Language Processing (NLP)
Definition
Levels of Linguistic Analysis
Importance of NLP
Disambiguation
Difficulty of NLP
Approaches to NLP
Brief History
Applications of NLP
Course Coverage and Knowledge Requirement
Related Fields

NLP: Difficulty of NLP

• People generally don’t appreciate how intelligent they are as natural language processors.
™ For them natural language processing is deceptively simple because no conscious
effort is required.

• Since computers are orders of magnitude faster, many find it hard to believe that
computers are not good at processing natural languages.

• NLP is hard because of:


™ Ambiguity - A word, term, phrase or sentence could mean several possible things.
- Computer languages are designed to be unambiguous.
™ Variability - Lots of ways to express the same thing.
- Computer languages have variability but the equivalence of
expressions can be automatically detected.

Department of Computer Science, Addis Ababa University Lecture 01: NLP: Background and Overview 7/30
Natural Language Processing (NLP)
Definition
Levels of Linguistic Analysis
Importance of NLP
Disambiguation
Difficulty of NLP
Approaches to NLP
Brief History
Applications of NLP
Course Coverage and Knowledge Requirement
Related Fields

NLP: Brief History

Research and development on NLP started along with the advent of computers.
The field emerged in the US from the strong desire of having a Machine Translation
system that automatically translates texts from Russian journals into English.
It was thought that the technical details of natural languages are manageable and
early work in machine translation took the simplistic view that the only differences
between languages resided in their vocabularies and the permitted word orders.
However, the initial efforts to develop an accurate machine translation system were
not successful as automatic translation could not be realized just by translating words.
It was then understood that human-like translations require analyses of languages at
different levels such as:
word level
phrase and sentence level
sequential sentences
whole text context
beyond the text (knowledge about the world).

Department of Computer Science, Addis Ababa University Lecture 01: NLP: Background and Overview 8/30
Natural Language Processing (NLP)
Definition
Levels of Linguistic Analysis
Importance of NLP
Disambiguation
Difficulty of NLP
Approaches to NLP
Brief History
Applications of NLP
Course Coverage and Knowledge Requirement
Related Fields

NLP: Brief History

Such understandings helped many researchers and developers realize that they
needed a more adequate theory of language. Key contributors in this field include:
Noam Chomsky, in his work on generative grammars
Claude Shannon, in his work on applied probabilistic models to automata for
language.
John Backus and Peter Naur, in their work on context-free grammars for
programming languages.

These developments gave rise to the field of Natural Language Processing.


Historically, the field has been treated very differently in the departments of computer
science, linguistics and psychology. Because of this diversity, NLP encompasses a
number of different but overlapping fields in these different departments:
Computational Linguistics in linguistics, Natural Language Processing in computer
science, Computational Psycholinguistics in psychology. Speech Recognition has also
been studied as a closely related subject in electrical engineering.

Department of Computer Science, Addis Ababa University Lecture 01: NLP: Background and Overview 9/30
Natural Language Processing (NLP)
Definition
Levels of Linguistic Analysis
Importance of NLP
Disambiguation
Difficulty of NLP
Approaches to NLP
Brief History
Applications of NLP
Course Coverage and Knowledge Requirement
Related Fields

NLP: Course Coverage and Knowledge Requirement

Morphology
Syntax Linguistics
Levels of Semantics
Linguistic Analysis Discourse Psycholinguistics +
Pragmatics Linguistics
Disambiguation

Rule-based Mathematics +
Approaches to NLP Statistical Psycholinguistics +
Connectionist Linguistics

Information Retrieval
Information Extraction
Dialogue Systems Computer Science +
Question-Answering Mathematics +
Applications of NLP Machine Translation Psycholinguistics +
Text Summarization Linguistics
(Spelling Correction)
(Grammar Checking)

(Speech Synthesis)
Related Fields Electrical Engineering +
Speech Recognition
(Signal Processing) Optical Character Recognition Computer Science

Department of Computer Science, Addis Ababa University Lecture 01: NLP: Background and Overview 10/30
Natural Language Processing (NLP)
Morphology
Levels of Linguistic Analysis
Syntax
Disambiguation
Semantics
Approaches to NLP
Discourse
Applications of NLP
Pragmatics
Related Fields

Levels of Linguistic Analysis: Morphology

Morphology
Morphology is the study of the componential nature of words.
At morphological level, the smallest parts of words that carry meanings and affixes are
analyzed.

English Morphology (Examples) Amharic Morphology (Examples)


preregistration Æ preregistration በָጅነٍْው Æ በָጅነ[ُኣ]ْው
books Æ books ָጆ٤ Æ ָ[ጅኦ]٤
converted Æ converted አይና‫ ד‬Æ አይ[ንኣ]‫ד‬
converts Æ converts ቤ‫א ـ‬ንግስُ Æ ቤ[ُኧ] ‫א‬ንግስُ
converting Æ converting ወ‫ר‬ድኩ Æ ወ‫ר‬ድኩ
converter Æ converter ወ‫ר‬ድኩበُ Æ ወ‫ר‬ድኩበُ
convertible Æ convertible ሳָወስድֳُ Æ ሳָወስድֳُ
unconvertible Æ unconvertible አָወ‫ר‬ድኩበُ‫ ו‬Æ አָወ‫ר‬ድኩበُ‫ו‬

Department of Computer Science, Addis Ababa University Lecture 01: NLP: Background and Overview 11/30
Natural Language Processing (NLP)
Morphology
Levels of Linguistic Analysis
Syntax
Disambiguation
Semantics
Approaches to NLP
Discourse
Applications of NLP
Pragmatics
Related Fields

Levels of Linguistic Analysis: Syntax

Syntax
Syntax refers to the study of structural relationships between words in a sentence.
Syntactic analysis requires both a grammar and a parser, the output of which is a
representation of the sentence that reveals the structural dependency relationships
between the words. This structural dependency can be represented using trees as
shown in the following examples.

English Syntactic Tree (Example) Amharic Syntactic Tree (Example)


Sentence
Sentence

Noun Phrase Verb Phrase


Noun Phrase Verb Phrase

Noun Verb Noun Phrase Prepositional Phrase


Noun Prepositional Phrase Noun Phrase Verb

Determiner Noun Preposition Noun Noun

Kassa gave a book to Aster ካሳ ֳአስَ‫ץ‬ ‫א‬ፅሃፍ ‫ר‬ጣُ

Department of Computer Science, Addis Ababa University Lecture 01: NLP: Background and Overview 12/30
Natural Language Processing (NLP)
Morphology
Levels of Linguistic Analysis
Syntax
Disambiguation
Semantics
Approaches to NLP
Discourse
Applications of NLP
Pragmatics
Related Fields

Levels of Linguistic Analysis: Semantics

Semantics
Semantics deals with the meaning of words, phrases and sentences.
Semantic analysis requires knowledge of:
Lexical semantics – the meanings of the component words
Compositional semantics – how components combine to form larger meanings

English Semantics (Example)


Sentence: Fruit flies like a banana.
Meanings: 1. Small insects (fruit flies) love to feed on banana.
2. Fruit can fly in the same way as banana.

Amharic Semantics (Example)


Sentence: ጤንነٍ٤ንን ֳ‫א‬ጠበቅ ዋና ‫ـ‬ጠቃ‫ ֹּל‬ስፖ‫ ُץ‬ነው።
Meanings: 1. ‫א‬ዋኘُ ጤንነٍ٤ንን የ‫ג‬ጠብቅ የስፖ‫ ُץ‬ዓይነُ ነው።
2. ጤንነٍ٤ንን ከ‫ג‬ጠብ‫ שּׁ‬ነገ‫צ‬٤ ውስጥ በዋነኛነُ የ‫ג‬ጠ‫רשׂ‬ው ስፖ‫א ُץ‬ስ‫ ُף‬ነው።

Department of Computer Science, Addis Ababa University Lecture 01: NLP: Background and Overview 13/30
Natural Language Processing (NLP)
Morphology
Levels of Linguistic Analysis
Syntax
Disambiguation
Semantics
Approaches to NLP
Discourse
Applications of NLP
Pragmatics
Related Fields

Levels of Linguistic Analysis: Discourse

Discourse
Discourse level deals with the properties of the text as a whole that convey meaning by
making connections between component sentences.
Several methods are used in discourse processing, two of the most common being:
Anaphora resolution– replacing words such as pronouns, which are semantically
vacant, with the appropriate entity to which they refer; and
Discourse/text structure recognition– determining the functions of sentences in
the text (which adds to the meaningful representation of the text).

English Discourse Processing (Example)


Text: Fruit flies like a banana. They are also feeding on apple.
Meaning: Small insects (fruit flies) love to feed on banana and apple.

Amharic Discourse Processing (Example)


Text: ውሃ ֳ‫ר‬ው ָጅ በ‫ ִב‬ብዙ ጥቅ‫ ו‬ይ‫ר‬ጣָ። ጤንነٍ٤ንን ֳ‫א‬ጠበቅ ዋና ‫ـ‬ጠቃ‫ֹּל‬
ስፖ‫ ُץ‬ነው። ስֳዚֱ ውሃ ֶይ እየ‫ـ‬ዝናናን ጤና٤ን እንዲጠበቅ ‫ד‬ድ‫נ‬ግ እን٤ֳֶን።
Meaning: ‫א‬ዋኘُ ጤንነٍ٤ንን የ‫ג‬ጠብቅ የስፖ‫ ُץ‬ዓይነُ ነው።

Department of Computer Science, Addis Ababa University Lecture 01: NLP: Background and Overview 14/30
Natural Language Processing (NLP)
Morphology
Levels of Linguistic Analysis
Syntax
Disambiguation
Semantics
Approaches to NLP
Discourse
Applications of NLP
Pragmatics
Related Fields

Levels of Linguistic Analysis: Pragmatics

Pragmatics
Pragmatics is concerned with the purposeful use of language in situations and utilizes
context over and above the contents of the text for understanding.
Pragmatics deals with world knowledge – outside the contents of the document.

English Pragmatic Processing (Examples)


Text: The city councilors refused the demonstrators a permit because they feared violence.
Understanding: The city councilors feared violence.
Text: The city councilors refused the demonstrators a permit because they advocated revolution.
Understanding: The demonstrators advocated revolution.

Amharic Pragmatic Processing (Examples)


Text: ‫עדـ‬ዎ٤ አስ‫עדـ‬ዎ٢ْውን ‫וֹ‬ገኙበُ ወቅُ ፈ‫ـ‬ና እየ‫ـ‬ፈ‫ـ‬ኑ ነበ‫ץ‬።
Understanding: ‫עדـ‬ዎ٤ እየ‫ـ‬ፈ‫ـ‬ኑ ነበ‫ץ‬።
Text: ‫עדـ‬ዎ٤ አስ‫עדـ‬ዎ٢ْውን ‫וֹ‬ገኙበُ ወቅُ ፈ‫ـ‬ና እየፈ‫ـ‬ኑ ነበ‫ץ‬።
Understanding: አስ‫עדـ‬ዎ٤ እየፈ‫ـ‬ኑ ነበ‫ץ‬።

Department of Computer Science, Addis Ababa University Lecture 01: NLP: Background and Overview 15/30
Natural Language Processing (NLP)
Levels of Linguistic Analysis
Disambiguation
Disambiguation
Approaches to NLP
Applications of NLP
Related Fields

Disambiguation

Disambiguation
Disambiguation refers to the resolution of ambiguities that occur at different levels of
language analysis.
A given text is said to be ambiguous if there are multiple linguistic structures that can
be built for it.

English Ambiguity (Example)


Text: Your father understands you like your mother.
Ambiguities: 1. Your father understands you as well as your mother understands you.
2. Your father understands you as well as he understands your mother.
3. Your father understands (that) you like your mother.

Amharic Ambiguity (Example)


Text: ጤንነٍ٤ንን ֳ‫א‬ጠበቅ ዋና ‫ـ‬ጠቃ‫ ֹּל‬ስፖ‫ ُץ‬ነው።
Ambiguities: 1. ‫א‬ዋኘُ ጤንነٍ٤ንን የ‫ג‬ጠብቅ የስፖ‫ ُץ‬ዓይነُ ነው።
2. ጤንነٍ٤ንን ከ‫ג‬ጠብ‫ שּׁ‬ነገ‫צ‬٤ ውስጥ በዋነኛነُ የ‫ג‬ጠ‫רשׂ‬ው ስፖ‫א ُץ‬ስ‫ ُף‬ነው።

Department of Computer Science, Addis Ababa University Lecture 01: NLP: Background and Overview 16/30
Natural Language Processing (NLP)
Levels of Linguistic Analysis
Rule-based
Disambiguation
Statistical
Approaches to NLP
Connectionist
Applications of NLP
Related Fields

Approaches to NLP: Rule-based

Rule-based Approach
Rule-based systems are based on explicit representation of facts about language
through well-understood knowledge representation schemes and associated
algorithms.
Rule-based systems usually consist of a set of rules, an inference engine, and a
workspace or working memory.
Knowledge is represented as facts or rules in the rule-based approach.
The inference engine repeatedly selects a rule whose condition is satisfied and
executes the rule.
The primary source of evidence in rule-based systems comes from human-developed
rules (e.g. grammatical rules) and lexicons.
Rule-based approaches have been used tasks such as information extraction, text
categorization, ambiguity resolution, and so on.

Department of Computer Science, Addis Ababa University Lecture 01: NLP: Background and Overview 17/30
Natural Language Processing (NLP)
Levels of Linguistic Analysis
Rule-based
Disambiguation
Statistical
Approaches to NLP
Connectionist
Applications of NLP
Related Fields

Approaches to NLP: Statistical

Statistical Approach
Statistical approaches employ various mathematical techniques and often use large
text corpora to develop approximate generalized models of linguistic phenomena
based on actual examples of these phenomena provided by the text corpora without
adding significant linguistic or world knowledge.
The primary source of evidence in statistical systems comes from observable data (e.g.
large text corpora).
Statistical approaches have typically been used in tasks such as speech recognition,
parsing, part-of-speech tagging, statistical machine translation, statistical grammar
learning, and so on.

Department of Computer Science, Addis Ababa University Lecture 01: NLP: Background and Overview 18/30
Natural Language Processing (NLP)
Levels of Linguistic Analysis
Rule-based
Disambiguation
Statistical
Approaches to NLP
Connectionist
Applications of NLP
Related Fields

Approaches to NLP: Connectionist

Connectionist Approach
A connectionist model is a network of interconnected simple processing units with
knowledge stored in the weights of the connections between units.
Similar to the statistical approaches, connectionist approaches also develop
generalized models from examples of linguistic phenomena.
What separates connectionism from other statistical methods is that connectionist
models combine statistical learning with various theories of representation.
In addition, in connectionist systems, linguistic models are harder to observe due to the
fact that connectionist architectures are less constrained than statistical ones.
Connectionist approaches have been used in tasks such as word-sense disambiguation,
language generation, syntactic parsing, limited domain translation tasks, and so on.

Department of Computer Science, Addis Ababa University Lecture 01: NLP: Background and Overview 19/30
Spelling Correction and Grammar Checking
Natural Language Processing (NLP)
Information Retrieval
Levels of Linguistic Analysis
Information Extraction
Disambiguation
Machine Translation
Approaches to NLP
Question-Answering
Applications of NLP
Dialogue Systems
Related Fields
Text Summarization

Applications of NLP: Spelling Correction and Grammar Checking

Spelling Correction (Example from Microsoft Word)

Grammar Checking (Example from Microsoft Word)

Department of Computer Science, Addis Ababa University Lecture 01: NLP: Background and Overview 20/30
Spelling Correction and Grammar Checking
Natural Language Processing (NLP)
Information Retrieval
Levels of Linguistic Analysis
Information Extraction
Disambiguation
Machine Translation
Approaches to NLP
Question-Answering
Applications of NLP
Dialogue Systems
Related Fields
Text Summarization

Applications of NLP: Information Retrieval

Information Retrieval
Information Retrieval provides a list of potentially relevant documents in response to a
user’s query.

Example from Google


አዲስ አበ‫וֹ‬ ዩኒ‌‫ ٌתץ‬- የጉግָ ፍֳጋ http://www.google.com.et/

ድር ምስሎች ማውጫ የውይይት መድረኮች የፍለጋ ቅንጅቶች | ግባ

Google የላቀ ፍለጋ

ድር አማራጮችን አሳይ… 38,700 ከሚሆኑ የፍልጋ ውጤቶች ከ1 - 10 ስለ አዲስ አበባ ዩኒቨርሲቲ። (0.28 ሰከንድ)

Wikipedia - አዲስ አበባ ዩኒቨርስቲ


አዲስ አበባ ዩኒቨርሲቲ ዋና አስተዳደሩን ሰድስት ኪሎ በሚገኘው በዋናው ጊቢ ያደረገ ሲሆን በአምስት ኪሎ (የቴክኖሎጂ ፋኩለቲ-
ሰሜን)፣ ...
am.wikipedia.org/wiki/አዲስ_አበባ_ዩኒቨርስቲ - የተሸጎጠ - ተመሳሳይ

Wikipedia - አዲስ አበባ


ከእንጦጦ ጋራ ግርጌ ያለችው መዲና የአዲስ አበባ ዩኒቨርሲቲ መገኛ ሆናለች። ይህም በመስራቹ የቀድሞው ንጉሠ ነገሥት ስም
ቀዳማዊ ኀይለ ...
am.wikipedia.org/wiki/አዲስ_አበባ - የተሸጎጠ - ተመሳሳይ

አዲስ አበባ ዩኒቨርሲቲ ለመጀመሪያ ጊዜ ለሠራተኞቹ ሽልማት ...


10 ፌብሩ 2010 ... You are here: ዜና አዲስ አበባ ዩኒቨርሲቲ ለመጀመሪያ ጊዜ ለሠራተኞቹ ... አዲስ አበባ ዩኒቨርሲቲ
እሑድ ጥር 30 ቀን 2002 ዓ.ም. ...
www.ethiopianreporter.com/index.php?...id... - የተሸጎጠ

Wapedia - Wiki: አዲስ አበባ ዩኒቨርስቲ


31 ዲሴም 2009 ... አዲስ አበባ ዩኒቨርሲቲ ዋና አስተዳደሩን ሰድስት ኪሎ በሚገኘው በ ዋናው ጊቢ ያደረገ ሲሆን በ አምስት
ኪሎ (የቴክኖሎጂ ...
wapedia.mobi/am/አዲስ_አበባ_ዩኒቨርስቲ - የተሸጎጠ
Department of Computer Science, Addis Ababa University Lecture 01: NLP: Background and Overview 21/30
Spelling Correction and Grammar Checking
Natural Language Processing (NLP)
Information Retrieval
Levels of Linguistic Analysis
Information Extraction
Disambiguation
Machine Translation
Approaches to NLP
Question-Answering
Applications of NLP
Dialogue Systems
Related Fields
Text Summarization

Applications of NLP: Information Extraction

Information Extraction
Information Extraction focuses on the recognition, tagging, and extraction of certain
key elements of information (e.g. persons, companies, locations, organizations, etc.)
from large collections of text into a structured representation.

Example
Text: Firm XYZ is a full service advertising agency specializing in direct and
interactive marketing. Located in Bole, Addis Ababa, Firm XYZ is looking for an
Assistant Account Manager to help manage and coordinate interactive
marketing initiatives. Experience in online marketing and/or the advertising
field is a plus. Depending on the experiences of the applicants, the company
pays an attractive salary of Birr 3,000- Birr 5,000 per month.

Extracted Information:
INDUSTRY Advertising
POSITION Assistant Account Manager
LOCATION Bole, Addis Ababa.
COMPANY Firm XYZ
SALARY Birr 3,000 - Birr 5,000 per month

Department of Computer Science, Addis Ababa University Lecture 01: NLP: Background and Overview 22/30
Spelling Correction and Grammar Checking
Natural Language Processing (NLP)
Information Retrieval
Levels of Linguistic Analysis
Information Extraction
Disambiguation
Machine Translation
Approaches to NLP
Question-Answering
Applications of NLP
Dialogue Systems
Related Fields
Text Summarization

Applications of NLP: Machine Translation

Google Translate http://translate.google.com/#sv|e


Machine Translation
MachineWeb
Translation is an
Images Videos Maps Newsautomatic translation
Shopping Gmail more ▼ of text from one language to another. Help

Help us improve Google Translate.


Take our survey!

Example from Google


Translation
Translate
Translate text, webpages and documents
Translated Search Enter text or a webpage URL, or upload a document.

Addis Abeba är huvudstaden i Etiopien. Staden anlades i slutet av 1800-talet av


Translator Toolkit
kejsaren Menelik. Platsen valdes på grund av de varma källorna, vilka ansågs vara
heliga. Kejsaren lät uppföra palats och andra monumentalbyggnader med hjälp av
Tools and
utländska arkitekter och byggare. Arkitektoniska stilar från Schweiz, Indien och Jemen
Resources
blandades och gav upphov till en sten- och träarkitektur som kallas Addis Abeba-stil.
Järnvägen, som förbinder Addis Abeba med Djibouti (via Dire Dawa), anlades under
1930-talet.

Translate from:
Translate into:

Swedish to English translation

Addis Ababa is the capital of Ethiopia. The city was built in the late 1800s by Emperor Menelik.
The site was selected because of the hot springs, which were considered sacred. The
emperor had built palaces and other monumental buildings by foreign architects and builders.
Architectural styles from Switzerland, India and Yemen were mixed and raised a stone and
wooden architecture known as the Addis Ababa-style. The railway, linking Djibouti with Addis
Ababa (via Dire Dawa), were built in the 1930s.

Contribute a better translation

Department of Computer Science, Addis Ababa University Lecture 01: NLP: Background and Overview 23/30
Spelling Correction and Grammar Checking
Natural Language Processing (NLP)
Information Retrieval
Levels of Linguistic Analysis
Information Extraction
Disambiguation
Machine Translation
Approaches to NLP
Question-Answering
Applications of NLP
Dialogue Systems
Related Fields
Text Summarization

Applications of NLP: Question-Answering

Question-Answering
Question-Answering provides the user with either just the text of the answer itself or
answer-providing passages.

Example from WikiAnswers

Department of Computer Science, Addis Ababa University Lecture 01: NLP: Background and Overview 24/30
Spelling Correction and Grammar Checking
Natural Language Processing (NLP)
Information Retrieval
Levels of Linguistic Analysis
Information Extraction
Disambiguation
Machine Translation
Approaches to NLP
Question-Answering
Applications of NLP
Dialogue Systems
Related Fields
Text Summarization

Applications of NLP: Dialogue Systems

Dialogue Systems
Dialogue Systems are agents that converse with human beings in a coherent structure
using several modes of communication such as text, speech, gesture, etc.

Example from ELIZA Program


User: You are like my father in some ways.
ELIZA: WHAT RESEMBLANCE DO YOU SEE
User: You are not very aggressive but I think you don’t want me to notice that.
ELIZA: WHAT MAKES YOU THINK I AM NOT AGGRESSIVE
User: You don’t argue with me.
ELIZA: WHY DO YOU THINK I DON’T ARGUE WITH YOU
User: You are afraid of me.
ELIZA: DOES IT PLEASE YOU TO BELIEVE I AM AFRAID OF YOU

Department of Computer Science, Addis Ababa University Lecture 01: NLP: Background and Overview 25/30
Spelling Correction and Grammar Checking
Natural Language Processing (NLP)
Information Retrieval
Levels of Linguistic Analysis
Information Extraction
Disambiguation
Machine Translation
Approaches to NLP
Question-Answering
Applications of NLP
Dialogue Systems
Related Fields
Text Summarization

Applications of NLP: Text Summarization

Text Summarization
Text Summarization is an application of NLP that reduces a larger text into a shorter,
yet richly constituted representation of the original document.

Example from GreatSummary


Original Text:
Addis Ababa University (AAU) is one of the largest higher learning institutions in Africa that was established at the
end of the 1940s. Formerly known as Haile Selassie I University, AAU was established by Ministry of Education in
1949 as a Trinity College with 71 students and 9 academic staff. It was granted a charter in July 1950 as an
autonomous higher learning institution under a different name of the University College of Addis Ababa (UCAA). This
makes AAU one of the oldest, if not the oldest, modern African university. The Ethiopian government created several
institutions since UCAA was established in 1950s. These include a College of Agriculture in Alemaya, Harar, and
College of Building Technology in Addis Ababa. In 1961, the different institutions of higher learning came under a
central administration to form what is to become the AAU. It should be noted that many of the institutions in the
country that have now become separate institutions, were part of AAU at one time.

Summarized Text (constrained to 4 sentences):


Addis Ababa University (AAU) is one of the largest higher learning institutions in Africa that was established at the
end of the 1940s. These include a College of Agriculture in Alemaya, Harar, and College of Building Technology in
Addis Ababa. Formerly known as Haile Selassie I University, AAU was established by Ministry of Education in 1949 as
a Trinity College with 71 students and 9 academic staff. It should be noted that many of the institutions in the
country that have now become separate institutions, were part of AAU at one time.

Department of Computer Science, Addis Ababa University Lecture 01: NLP: Background and Overview 26/30
Natural Language Processing (NLP)
Levels of Linguistic Analysis
Modes of Language Representation
Disambiguation
Speech Recognition
Approaches to NLP
Optical Character Recognition
Applications of NLP
Related Fields

Related Fields: Modes of Language Representation

Spoken Language Written Language

Editable Text Non-Editable Text

Department of Computer Science, Addis Ababa University Lecture 01: NLP: Background and Overview 27/30
Natural Language Processing (NLP)
Levels of Linguistic Analysis
Modes of Language Representation
Disambiguation
Speech Recognition
Approaches to NLP
Optical Character Recognition
Applications of NLP
Related Fields

Related Fields: Speech Recognition

Speech Recognition
Speech Recognition is the process of converting spoken words (acoustic signals) into
equivalent text.
Speech Synthesis, also known as Text-to-Speech system, performs the reverse process,
i.e. artificially produces human speech from a given text.

Spoken Language Editable Text

Speech Recognition

Speech Synthesis

Department of Computer Science, Addis Ababa University Lecture 01: NLP: Background and Overview 28/30
Natural Language Processing (NLP)
Levels of Linguistic Analysis
Modes of Language Representation
Disambiguation
Speech Recognition
Approaches to NLP
Optical Character Recognition
Applications of NLP
Related Fields

Related Fields: Optical Character Recognition

Optical Character Recognition


Optical Character Recognition (OCR) is a computerized system that converts
non-editable text to machine-encoded text.
If the text to be converted is handwritten, the system is also known as Intelligent
Character Recognition (ICR).
Non-Editable Text

Printing

Writing
OCR

ICR
Editable Text

Department of Computer Science, Addis Ababa University Lecture 01: NLP: Background and Overview 29/30
TOC: Course Syllabus

Previous:
Current: NLP: Background and Overview

Next: Morphological Analysis

Das könnte Ihnen auch gefallen