You are on page 1of 61

A Profile of Arabic Script Languages

Bushra Zawaydeh, Ph.D., Senior Linguist

June 7, 2007

Proprietary Information of Basis Technology Corp.

History of the Arabic Script

ƒ Derived from the Nabataean script, which was used in Petra in the
2nd century BC.


ƒ The Nabataean script is an offshoot from the Aramaic script.

ƒ The Aramaic script developed from the Phoenician script.

ƒ The Phoenician script was a model for the Greeks to develop the
Greek writing system (around 1000 B.C.), from which English, and
all Western alphabets were based on.

Development of Phoenician Script

Development of Arabic Script

ƒ Arabic inscriptions became widely available after the birth of


ƒ The Quran descended upon the prophet Mohammad in the year

A.D. 612 (Khan, 2001)


ƒ Before the descension of the Quran, Arabic was primarily an oral


ƒ Arabic is considered a holy language because it is the language of

the Quran. Hence it is the primary prayer language for Muslims.

ƒ Arabic spread through the spread of Islam. By the 11th century,

Arabic became the common medium of expression from China to

Types of Arabic Calligraphy: Kufic

ƒ The earliest manuscripts of the Quran (8th – 10th century) were

written in the Kufic style of Arabic writing (Campbell, 1997).
ƒ Kufic script is angular, which was most likely a product of
inscribing on hard surfaces such as wood or stone.

Types of Arabic Calligraphy: Naskhi

ƒ Since the 11th century, the cursive style that is known as Naskhi
was developed.

Arabic Abjad

ƒ There are different writing systems that

languages use, such as:

ƒ Alphabet – denotes both consonants and vowels.

Ex: English.

ƒ Abjad – denotes consonants.

Ex: Arabic, Hebrew.

ƒ Syllabary - characters denote syllables.

Ex: Japanese Hiragana

Spread of Arabic

ƒ The Muslim Arab civilization flourished in the Arabian Peninsula,

and was embraced by the Turks, Iranians, Afghans, Indians, North
Africans, Spanish Andalusians.

ƒ Arabic became the language of art, science, and technology.

ƒ Islamic Calligraphy became a noble art, that was appreciated more

than any other form of art.

Samples of Arabic Calligraphy

Cursive Arabic Calligraphy

Features of the Arabic Script

ƒ The Arabic alphabet contains 28 letters.

ƒ complex text language, because it has bidirectional script. It is written

right to left, except for numbers and Latin words are written left to right.

ƒ Many letters change their form depending on whether they appear alone,
at the beginning, middle or end of the word.

ƒ Letters that change form, are always joined in both hand-written and
printed Arabic. Hence, it is cursive, as in the English hand writing.

ƒ Only 3 long vowels are written.

ƒ Diacritics indicate things like short vowels and gemination.

Arabic Abjad

Arabic Letters in Different Positions

Letters in Different Positions

Arabic Diacritics

More Features of the Arabic Script

ƒ Lack of capital letters.

ƒ Lack of word division word finally.

ƒ Unlike many other alphabetic scripts, it denotes a high phonetic

accuracy, when diacritics are added.

Arabic Ligatures

ƒ Arabic script uses ligatures. A compulsory one is the lam followed

by an aleph:


ƒ Optional/ stylistic

Arabic Language

ƒ Arabic is a Semitic language.

ƒ 221 million speakers.

ƒ Countries it is spoken in:

ƒ Afghanistan, Algeria, Bahrain, Chad, Cyprus, Djibouti, Egypt, Eritrea,
Iran, Iraq, Israel, Jordan, Kenya, Kuwait, Lebanon, Libya, Mali,
Mauritania, Morocco, Niger, Oman, Palestinian West Bank & Gaza,
Qatar, Saudi Arabia, Somalia, Sudan, Syria, Tajikistan, Tanzania,
Tunisia, Turkey, UAE, Uzbekistan and Yemen.

Worldwide use of the Arabic Abjad

•Dark green → Countries where the Arabic script is the only official orthography.
•Light green → Countries where the Arabic script is used alongside other

Arabic Abjad Usage in Other Languages

ƒ Arabic Abjad is used in a large number of languages other than


ƒ Abjad spread in the world through the Islamic conquests (7-8th


ƒ It is the second most widespread script in the world.

Writing Systems of the World Today

Languages Using the Arabic Script Presently

1. Arabic 11. Berber languages.

2. Persian/ Dari 12. Moplah (dialect of Malayalam)
3. Urdu 13. Malagasy
4. Pashto 14. Sulu
5. Baluchi
6. Kurdish
7. Lahnda
8. Kashmiri
9. Sindhi
10. Uyghur

Languages that Abandoned the Arabic Script

ƒ Languages now using Roman

ƒ Indonesian (Malay)
ƒ Hausa
ƒ Somali
ƒ Sudanese
ƒ Swahili
ƒ Turkish
ƒ Caucasian languages now using Cyrillic
ƒ Chechen
ƒ Kabardian
ƒ Lak
ƒ Avar
ƒ Lezgi

Adoption of the Arabic Script

ƒ When the Arabic Abjad was adopted, it was augmented to fit the
phonologies of the non-semitic languages.

ƒ The alphabet was extended by the different languages. The 28

basic Arabic letters were extended to more than 100 letters
(Esfahbod, 2004).

Method of Adoption

ƒ All the Arabic letters are borrowed directly to preserve the Arabic

ƒ When borrowing Arabic loanwords, the pronunciation would depend on

the phonology of the borrowing language.

ƒ Arabic specific sounds that are not present in the borrowing language,
would be pronounced as a sound that is present in that language. Ex:
the Arabic gutturals and interdentals.

Arabic Gutturals

ƒ Sounds produced with a constriction in the back part of the vocal

tract (Zawaydeh, 1999)
ƒ Emphatics (T, D, S, Z) ‫ ظ‬،‫ ص‬،‫ ض‬،‫ط‬
ƒ Uvulars (q, X) ‫ خ‬،‫ق‬
ƒ Pharyngeals (H, Eiyn) ‫ ع‬،‫ح‬
ƒ Laryngeals (glottal stop, h) ‫ ء‬،‫ﻩ‬

Rendition of Arabic Gutturals and Interdentals

ƒ The Arabic emphatics are not pronounced as uvularized, but rather

as plain, non-uvularized sounds.

ƒ Persian:
ƒ Pharyngeal ‫ ع‬sound is pronounced as a glottal stop.
ƒ Pharyngeal ‫ ح‬sound is pronounced as a [h].

ƒ Persian phonetic redundancies:

ƒ Persian /s/ is rendered as ‫ص‬ ،‫ س‬،‫ث‬
ƒ Persian /z/ is rendered as ‫ ز‬،‫ ذ‬،‫ ض‬،‫ظ‬

Nastaliq Script

ƒ A writing style which is used, with extra letters, to write:

ƒ Farsi
ƒ Urdu
ƒ Pashto
ƒ Kashmiri
ƒ Sindhi
ƒ Turkish - (Under the Ottoman Empire before 1920).

Nastaliq Samples


ƒ Locally called:
ƒ Farsi in Iran.
ƒ Dari in Afghanistan
ƒ Tajiki in Central Asia (former Soviet Union countries)
ƒ Dialects:
ƒ Lari (in Iran)
ƒ Hazaragi (in Afghanistan),
ƒ Darwazi (In Afghanistan and Tajikistan)

Persian Language Map

Status of Languages in Iran

ƒ Main languages:
ƒ Persian and its dialects 58%
ƒ Azeri and other Turkic languages 26%
ƒ Kurdish 9%
ƒ Balochi 1%
ƒ Arabic 1%

ƒ Official language is Persian.

ƒ Ethnologue reports 71 languages!

Strategies for Modifying Arabic Script: Persian

ƒ Basic Strategy:
ƒ Add more dots to certain letters to create new letters.
ƒ Persian added 4 more letters.
ƒ Persian /p/ is: ‫ پ‬while Arabic /b/ is ‫ ب‬.
ƒ Persian /ʒ / is: ‫( ژ‬while ‫ ج‬is /ʤ/)
ƒ Persian /ʧ/ is: ‫چ‬
ƒ Persian /g/ is: ‫ – گ‬this originally had three dots.

Persian/ Dari Alphabet

ƒ 32 letters. Red is the Persian additional letters.

Persian vs. Arabic

ƒ Æ used for Izafet compounds.

ƒ Persian Kaf and Ya

Other Persian Orthographic Modifications

ƒ ‫ إ‬Æ‫ا‬
ƒ ‫ ة‬Æ ‫ ﻩ‬or ‫ت‬
ƒ Arabic words with hamza, may be spelled in various ways,
example: ‫ ﻣﺴﺆول‬is spelled as ‫ﻣﺴﺌﻮل‬.
ƒ Damma is pronounced as an [o] not an [u] as in Arabic.

Languages Extending the Persian Alphabet

ƒ Some languages used the Persian alphabet as a base, which in turn

is based on Arabic, and added more letters that are not in Persian
or Arabic.
ƒ Examples:
ƒ Urdu
ƒ Pashto
ƒ Sindhi

Status of Languages in Pakistan

ƒ Major languages in Pakistan are:

ƒ Punjabi, Saraiki, Sindhi, Pashto, Urdu, Balochi, Hindko, and Brahui.

ƒ Official language is English.

ƒ National Language is Urdu.

ƒ Language Distribution
ƒ Punjabi 44%
ƒ Pashto 15%
ƒ Sindhi 14%
ƒ Siraiki 11%
ƒ Urdu 8%
ƒ Balochi 4%
ƒ others 4%

Languages in Pakistan

Status of Languages in Pakistan

ƒ Urdu and Sindhi have standardized spellings. If a speaker from the

other languages needs to write their language, they would use
either Urdu or Sindhi.

ƒ In Pakistan, the classical spelling standard of Pashto is not always

followed. There is a tendency to use the Urdu forms of letters
instead of the Pashto forms (UCLA Language Materials Project).

Urdu Alphabet

ƒ Red is Persian Letters.

ƒ Blue is the Urdu letters

Urdu Alphabet

ƒ Uses the emphatic ‫ ط‬above the letter to mark sounds that are
retroflex, which are the “d, t, and r”.

ƒ Uses the shape of the Arabic nun ‫ ن‬without the dot, to indicate
nasalized vowels: ‫ ﻣﺎں‬mãː “Arab”

ƒ For aspirated consonants, follows the letter.

ƒ Urdu [h] appears in the following forms:
ƒ Distinguishes between [i] and [e, ɛ] sounds word finally:
ƒ ‫ ﻟﮍﮐﯽ‬laɽkiː “girl”.
ƒ ‫ ﻟﮍﮐﮯ‬laɽke “boys”.

Status of Languages in Afghanistan

ƒ Official languages are Pashto and Dari (Afghan Persian).

ƒ Turkic languages (Uzbek and Turkmen).
ƒ Other languages: Baluchi, Pahsai, Nirisani, etc.


ƒ Uses a modified form of the Perso-Arabic script.

ƒ Improvised the Perso-Arabic script by adding letters that don’t
appear in any other script.
ƒ Used 4 Persian letters.
ƒ Added 8 more letters:
ƒ 4 Retroflex consonants /t/, /d/, /r/, /n/. Written with “pandak”,
“gharwandah”, or “skarraen”: ‫ ټ ډ ړ‬and ‫ڼ‬
ƒ Letters “ge” and “xin”: ‫ښږ‬
ƒ dental affricates /dz/ ‫ ځ‬and /ts/ ‫څ‬
ƒ [g] is written either in the Persian style or as: ‫ګ‬


Pashto Zwarakay

ƒ Pashto has a 4th vowel diacritic, which looks like a horizontal line.

Pashto diacritics

Arabic Numbers

ƒ The decimal numbering system originated in India.

ƒ It got adapted by the Arabic world.
ƒ The Europeans adopted the Arabic numbers.

Arabic Numbers

ƒ The number 4, 5, 6, 7 have various forms in the languages of Iran,

Pakistan, and India.

Basis Technology Products Handling Arabic Script
ƒ Arabic
ƒ Base Linguistics ƒ Urdu
ƒ Arabic Chatroom Reverse ƒ Base Linguistics
ƒ Entity Extractor
ƒ Entity Extractor
ƒ Name Matching ƒ Name Matching
ƒ Name Translation ƒ Name Translation
ƒ Arabic Editor ƒ Language Identification
ƒ Transliteration Assistant
ƒ Digital Forensics
ƒ Pashto
ƒ Language Identification ƒ Transliteration Assistant
ƒ Name Matching
ƒ Persian ƒ Name Translation
ƒ Base Linguistics ƒ Language Identification
ƒ Entity Extractor
ƒ Transliteration Assistant
ƒ Name Matching
ƒ Name Translation
ƒ Digital Forensics
ƒ Language Identification
ƒ Afghan Transitional Islamic Administration. Ministry of Communications. United Nations
Development Program. Computer Local Requirements for Afghanistan.
ƒ Bhurghi, Abdul-Majid. Enabling Pakistani Languages through Unicode. (Written for Microsoft).
ƒ Campbell, George. 1997. Handbook of Scripts and Alphabets. New York: Routledge.
ƒ Eid, Mushira, et. Al. 2006. Encyclopedia of Arabic Language and Linguistics. Volume I.
ƒ Ishida, Richard. 2004. Urdu script notes [Draft].
ƒ Kew, Jonathan. 2005. Notes on some Unicode Arabic characters: recommendations for usage.
Draft 2.
ƒ Khan, Gabriel Mandel. 2001. Arabic Script. New York: Abbeville Press.
ƒ Milo, Thomas. 2002. Authentic Arabic: A case Study. 20th International Unicode Conference.
Washington, DC.
ƒ Salloum, Habeeb. The Odyssey of the Arabic Language and its Script.
ƒ UZT 1.01 & Unicode Mapping for Urdu. Center for Research in Urdu Language Processing.
National University of Computer and Emerging Sciences.
ƒ Unicode Standard 4.0.
ƒ Zawaydeh, 1999. The Phonetics and Phonology of Gutturals in Arabic. Ph.D. Dissertation.
Indiana University.