Beruflich Dokumente
Kultur Dokumente
Phonetics
-2014-
Mingshu Zhao
Abstract ......................................................................................................................... 7
Declaration .................................................................................................................... 8
Acknowledgements ...................................................................................................... 10
1 Introduction ......................................................................................................... 11
1.1 Motivation ............................................................................................................ 11
1.2 Aims ..................................................................................................................... 12
1.3 Objectives ........................................................................................................... 13
1.4 Structure of dissertation .................................................................................... 14
1.5 Summary ............................................................................................................. 15
2 Background .......................................................................................................... 16
2.1 History of Chinese character ............................................................................ 17
2.2 Chinese character structures ........................................................................... 20
2.3 Chinese character components ....................................................................... 23
2.3.1 Strokes ............................................................................................................ 23
2.3.2 Radicals .......................................................................................................... 24
2.3.3 Phonetic component ..................................................................................... 25
2.4 Pinyin system ..................................................................................................... 26
2.5 Barriers of westerners learning Chinese ........................................................ 27
2.6 Effective way of learning Chinese ................................................................... 28
2.7 Summary ............................................................................................................. 30
3 Methodology ....................................................................................................... 31
3.1 Database technology......................................................................................... 31
3.1.1 Relational database ...................................................................................... 31
3.1.2 Entity-relationship model .............................................................................. 32
3.1.3 Relational database management system ................................................ 34
3.2 Programming languages................................................................................... 34
3.2.1 Structured query language ........................................................................... 35
3.2.2 Java ................................................................................................................. 36
3.3 Tools and methods............................................................................................. 37
3.3.1 Database management software ................................................................ 37
3.3.2 Visual database design tool ......................................................................... 38
3.3.3 Development environment ........................................................................... 39
2
3.3.4 Selection of encoding system ...................................................................... 39
3.4 Summary ............................................................................................................. 40
6 Evaluation ............................................................................................................ 71
6.1 A review of current Chinese database ............................................................ 71
6.2 Contribution......................................................................................................... 72
6.2.1 Contribution to Chinese teaching ................................................................ 72
6.2.2 Contribution to the platforms for Chinese enquiries ................................. 73
6.2.3 Contribution to Chinese research ............................................................... 73
6.3 Performance evaluation .................................................................................... 74
6.3.1 Single table test ............................................................................................. 74
6.3.2 Two-tables joint test ...................................................................................... 76
6.3.3 Three-tables joint test ................................................................................... 77
6.3.4 Application test............................................................................................... 77
6.4 Summary ............................................................................................................. 78
7 Conclusion ........................................................................................................... 79
7.1 Deliverables ........................................................................................................ 79
7.2 Project limitation ................................................................................................. 80
3
7.3 Future work ......................................................................................................... 80
7.3.1 Suggestion of database optimization ......................................................... 80
7.3.2 Suggestion of learning system development ............................................ 82
Bibliography ................................................................................................................. 83
4
List of Figures
5
List of tables
6
Abstract
Along with China is playing an increasingly important role driving the world’s
western learners tend to find leaning Chinese difficult because the writing
Chinese characters, investigating the main barriers for western learners then
summarizes the efficient way for learning Chinese. After defining the
7
Declaration
8
Intellectual property statement
ii. Copies of this dissertation, either in full or in extracts and whether in hard
or electronic copy, may be made only in accordance with the Copyright,
Designs and Patents Act 1988 (as amended) and regulations issued under it
or, where appropriate, in accordance with licensing agreements which the
University has entered into. This page must form part of any such copies
made.
iii. The ownership of certain Copyright, patents, designs, trade marks and
other intellectual property (the “Intellectual Property”) and any Page 6 of 14
Guidance for the Presentation of Taught Masters’ Dissertations August 2010
reproductions of copyright works in the dissertation, for example graphs and
tables (“Reproductions”), which may be described in this dissertation, may
not be owned by the author and may be owned by third parties. Such
Intellectual Property and Reproductions cannot and must not be made
available for use without the prior written permission of the owner(s) of the
relevant Intellectual Property and/or Reproductions.
9
Acknowledgements
I would like to thank Dr. Richard Banach for his invaluable support throughout
this project and for his patience in supervising and on-going supporting me. I
also would like to thank my partner Yuhang Lei who has supported me
Last but not least, I would like to thank my family for providing the funding
10
1 Introduction
1.1 Motivation
China towns have been built in most metropolises in western countries. Many
buses and on the show window of luxury shops. Along with Chinese is widely
Chinese students in western counties tends to rise every year. They have
that in the field of high-paying jobs, the lack of language skills results in their
11
in Europe. In France, the annual growth of Chinese learners is 25 percent. In
United Kingdom, Most of Britain's private schools have set Chinese classes.
In Italy, there are more than forty universities have Chinese courses.1
Europe. In addition, the existing learning materials online are usually lack of
which can support Europeans Chinese learning could have bright prospect.
also specifies the combination law and pronunciation rules to help learners to
1.2 Aims
instance, characters that include radical “火” (huo, fire) are associated with
pronunciation, that is to say characters that include same phonetic may have
the same pronunciation to a great extent. Since radicals and phonetics differ
in function, the awareness of the two kinds of components can be helpful for
1
Mandarin fever: http://www.cdrb.com.cn/html/2013-09/03/content_1915531.htm
12
Chinese character database with searching function, which specifies the
1.3 Objectives
The significance of this project manifests not only in its powerful retrieval
function, but also in its excellent support for higher-level software such as an
online Chinese learning system. To achieve the aims of the project, specific
3) For each character, its detailed information will be listed in the database
table.
be developed.
13
1.4 Structure of dissertation
following:
Chapter 2: Background.
terms of the history, structure, basic components and Pinyin system. Then
Chapter 3: Methodology
technology that will be used in the development. In addition, tools for creating
In the design and implementation chapter, the conceptual and logical models
of the database are established. According to the models, data tables are
created. After a statistics-based collection, selected data are input into tables.
14
during the programming is posted.
Chapter 6: Evaluation
Chapter 7: Conclusion
The conclusion chapter mainly summarizes the delivery of this project. It also
1.5 Summary
This chapter mainly represents the motivation of doing the project and
clarifies the problems that need to be solved. Based on that, the objectives
that the project plans to achieve are determined. The whole project will focus
on finding out a solution to accomplish the goals. Finally, the structure of the
Besides, in the next sections, all the mentioned Chinese characters will be
annotated with their Pinyin and general meaning so that readers are able to
15
2 Background
ancient han ethnic group for the need of communication in the long-term
shaped structures.
the largest population at the same time. In the past years, Chinese has been
used by ethnics in East and Southeast Asia (such as Korea, Japan and
Vietnam) over a long period of time. For now, in addition to China, Chinese
characters are still used in Japan, Korea, Singapore and Malaysia. There is
While the alphabets used in western languages are only phonetic, Chinese
symbols, most of which are the abstracts of the objects in real world. Hence
sound.
For example: character “赢” (ying, win) can be divided into five independent
16
figures: “亡”, ”口”, ”月”, ”贝”, ”凡”. United means all those figures will be
3) Tridimensional structure
(ping, apple) has a top-bottom structure while character “呼” (hu, breath) has
a left-right structure.
language system in the world.2 They are believed to have more than 4500
primitive man to record their life. Later, graphical features were used to
structure. The body of inscriptions on oracle bones from the late Shang
dynasty is the earliest evidence discovered that could confirm Chinese origin.
Shang Dynasty. (Keightley D N, 1996) For now, among the 4500 symbols
After the first emperor of Qin dynasty unified the mainland of China, to
17
dynasties. Figure 2.1 shows the evolution of character “马” (ma, horse).
example: in ancient China, there was only one kind of aquatic vehicle, which
was represented as “舟” (zhou, boat) in Chinese. So far, “舨” (ban, sampan),
“艇” (ting, light boat), “船” (chuan, ship), “舰” (jian, warship) evolved from “舟”
widely used and powerful method of creating Chinese characters. For now,
Over time they have been stylized, simplified and standardized to make
the similar shape with the objects they represent. Figure 2.2 shows the
stick-picture of a fish, but today character “鱼’ has been simplified that
3
Chinese etymology, character “马”:
http://www.chineseetymology.org/CharacterEtymology.aspx?submitButton1=Etymology&characterInput=%E9%
A9%AC
18
people cannot imagine it as fish any more. (Qiu zhiwen, 2008)
not find a specific object to express their feelings, they used abstract
symbols. Take character “凶” (xiong, danger) as example, “凵” stood for
a deep hole on the ground, while “×” stood for the feeling of panic.
Hence character ”凶” means the situation of danger. (Qiu Zhiwen, 2008)
(cao) means grass and “田” (tian) means cropland, so “苗” (miao) which
19
the largest percentage of Chinese characters. These characters
generally consist of two parts: one semantic radical, which indicates the
For instance, in character “桐” (tong, tung), component “木” (mu, wood)
Component “同” indicates that character “桐” and character “同” (tong,
traditional and simplified characters, the former are used by the residents live
in Taiwan, Hong Kong, Macau and North America, while the latter are usually
Southeast Asia.5
5
Chinese characters: http://www.zdic.net/appendix/f21.htm
20
acknowledged by GB18030-2000 standard and ISO/IEC16046 standard. (Lu
21
No. Category Name Fram Sample
The structures of Chinese characters may have layers. The character “助”
(zhu, help) has one-level structure (left-right) while character “莅” (li, be
decomposed four times into components in the way showed in Figure 2.3.
22
2.3 Chinese character components
Chinese character. For example: “好" (hao, good) consists of "女"(nv, female)
and "子"(zi, son). In this case, "女" and "子" are components. In the past
years, when Pinyin system has not been created, people search a character
radical and phonetic component may have some internal laws. For example,
may appear in the left side to a great extent and in vertical characters, radical
see the character "鲂" (fang, gurnard), even though he or she hasn't learned
it before, his lexical knowledge can make the reader comprehend that the
character might possibly indicate a kind of fish because in Chinese “鱼” (yu,
fish) means fish, and that the pronunciation of the character has high
possibility to be the same with "方" (fang, square) (though not necessarily the
same tone).
2.3.1 Strokes
23
one, such as "一" (yi, one) while the character "𪚥" (zhe, garrulous) that
consists of four "龍"(long, dragon) has the most strokes (68 strokes in total).
Figure 2.4 shows the basic eight strokes in Chinese character. Basically, all
2.3.2 Radicals
Radicals were selected in Han dynasty. At first, radicals were the same
appeared. For now, 189 radicals have been included in Xinhua Dictionary,
which has the highest approval among different standards. Since radicals are
characters, some of them are characters as well while some are not; they are
just the compounds of strokes, which cannot be read. For some characters, it
6
Chinese E-learning platform: strokes: http://other.allad.com.tw/chinese2/newpage.php
24
is difficult to define their radicals such as those simple characters, then the
changed their appearance when they were used as radicals. For example,
character “手” was simplified as “扌” when it was a radical in characters “打”
(da, hit), “扔” (reng, drop) and so on. In addition, one radical in different
positions tend to have different physical features. For example: character “火”
(huo, fire) as a radical, become oblique when locating on the left side (e.g.
“烛” zhu, candle). But when appearing at the bottom, it becomes 灬 (huo,
fire). Generally, a radical will be associated with one or several meaning. For
indicate the meaning of water and it can be confirmed in the characters “江”
(jiang, river), “湖” (hu, lake), “海” (hai, sea) and “洋” (yang, ocean).
has the similar pronunciation with characters made of it, such as “晴” (qing,
sunny), ”清” (qing, clear), ”精” (jing, smart), ”睛” (jing, eye). Some phonetic
the meaning as well together with radicals. Take character “婚” (hun,
wedding) as an example, “女” (nv, woman) is the radical of it, which implies
the character is associated with woman. The phonetic component “昏” (hun,
dusk) not only gives the pronunciation, but also points the meaning, because
However, unlike radicals, there is no official materials that can define which
25
phonetic components. According to a statistic analysis of seven thousand
1325 phonetic components were used among them. For now, the most
accurately. For example, character “骨” pronounces “gu”, but its compounds
“滑” and “猾” pronounce “hua”. According to a recent survey, among 2500
representing about 65 percentages, but only 490 compounds have the same
initial and one final. The initials, which are usually the first letters of the
spellings, are consonants while finals are all possible combine of medials.
The initials and all combine of finals are listed in Table 2.2:
Initials
b p m f d t n l g k
h j q x zh ch sh r z c
s y w
Finals
a o e i u ü ai ei ui ao
ou iu ie üe er an en in un ün
26
Finals
ang eng ing ong ia ie iao ian iang ua
ue uai uan uang
Unlike other languages, the pronunciation of Chinese has tones, which are
described as tonal mark in Pinyin system. For each phonetic symbol, there
can be five tones. The tonal marks are always located at the top of the finals.
For example, the five tones of spelling ma are: ma, mā, má, mǎ, mà.
Today, Pinyin system has become a tool for foreigners learning Chinese.
deficiency of GPC rules of Chinese characters may become the major barrier
context. Take character “长” for example, when meaning “long”, it pronounces
learning Chinese.
objects. (Qiu Zhiwen, 2008) For alphabet-based language, letters are the
prototypes while for Chinese, The components making up characters are the
terms of learning method, this project mainly takes a cue from a issued
learners who are proficient in the usage of radicals and character formation
rules can still improve their recognition ability of Chinese characters. In other
words, by knowing one radical, they can further grasp a set of similar
characters.
28
one hundred people who took English as their first language, 27 as other
alphabetic language native speaker and two who had logographic language
characters. On the contrary, the main activities of the comparison group were
phrase practice, sentence making and text reading. After three weeks’
29
learners understand Chinese orthographic knowledge. (Chen H C, Hsu C C,
hand, the database will have excellent support in higher-level application. For
2.7 Summary
30
3 Methodology
essential. In this chapter, the technologies and tools used in the process of
and a wide range of application area since its birth till now. Along with the
data models have been put forward in order to face the challenge of new
data form. For now, several data models are widely used according to the
model and semi-structured model. In this project, all the data are organized
Data stored in computers or servers has its logical structure, which may be
each table composed of columns and rows. A record stored in a row while its
31
The existences of relationship can be among columns within one table or
Tables in relational database should conform to the integrity rules, which are
also known as the constraints, to keep the data in the tables accurate and
accessible. Integrity rules include two principle rules: entity integrity and
referential integrity. Before introducing the two rules, two terms should be
clarified first: primary key and foreign key. A primary key is a column of
attributions within a table, with each value can uniquely identify one row. A
foreign key is a common key that links the primary key column to another
table. In other words, foreign key must be the primary key in a relationship in
the meantime. In order to ensure the data integrity of tables, the values of
primary key and foreign key must follow some constrains. Entity integrity rule
defines that the value of primary key must mot repeat and null. Referential
integrity rule defines that if a set of attributions is the primary key and foreign
key in the same time. Then the value of the foreign key column can be only
32
easily understood by people who know computer little because of its
graphically way for presenting entities. Hence ER model is widely used for
ER model abstracts the real world, generating diagrams, which include three
attributes. One entity type can possess one or more attributes, which can be
others for objects have relations in real world. ER diagram describes those
entities also have types. For instance, the relationship between student entity
and teacher entity can be named “teach” and “learn from”. It is essential to
name every single relationship so that the diagram can be easily understood
by readers.
There can be multiple ways of the entities connecting to others. Therefor the
involved.
33
3.1.3 Relational database management system
data presenting in tables. RDBMSs are used for managing the storage,
or modify data, it just need make a “call” to the RDBMS. There are varieties
based while others are open source. All though different products may differ
in the manipulating and interface, Each RDBMS software will provide the
in this project. Structured Query Language (SQL) is used for building and
development.
34
3.2.1 Structured query language
is the basis for using SQL. As mentioned above, SQL includes data definition
language, data manipulation language, data query language and data control
SELECT: retrieves data from one or more than more tables according
combine programming language and the manipulation of SQL. There are two
main ways for integrating the programming language and SQL: embedded
35
SQL and open database connectivity (ODBC).
Embedded SQL is SQL statement written inline with the source code of the
platforms with few changes to the source code. Java database connectivity
3.2.2 Java
achieved by compiling source code into java bytecode rather than platform-
Java platform provides an inclusive set of standard class library, including the
36
standard classes, applications can call the functions defined in those classes
(API). Among those standard classes, there are functions that support
accessing and processing data stored in a data source. In this project, the
following five classes of the java.sql will be imported in the source code. 7
drives.
This section is mainly about the selection of tools for database managing and
Before creating the database, a RDBMS needs to be selected. For now, the
most widely used RDBMSs include DB2, Oracle, MySQL and Microsoft SQL
7
Overview package classes: http://docs.oracle.com/javame/config/cdc/opt-pkgs/api/jsr169/java/sql/package-
summary.html
37
Server. DB2 and Oracle are widely applied for the storage and management
project, MySQL and Microsoft SQL Server Express, which is the free version
of SQL Server, are more suitable. However, Microsoft SQL Server Express
extensibility. 8 For now, MySQL is the second widely used RDBMS. Its
completely free and open source makes it very popular among developers.
MySQL support multiple platforms, including Windows, Linux and Mac OS. It
provides API for almost all the mainstream programming language such as
C, C++, Java, PHP, Python and so on. Besides, it offers several database
connection method, like TCP/IP, JDBC and ODBC. MySQL can not only work
and BIG5 can be applied in MySQL. Many graphical user interface (GUI)
This project selects MySQL Workbench as the graphic user interface tool for
managing the MySQL database. Workbench is a free unified visual tool for
8
SQL Server: http://technet.microsoft.com/en-us/sqlserver/default
9
MySQL 5.1 Reference Manual: http://dev.mysql.com/doc/refman/5.1/en/
38
rather than command-based management interface.10
with MySQL. After installing JDBC driver, Netbeans enables users graphically
addition, the built-in SQL editor of Netbeans allows users execute SQL
each character to a given repertoire with something else such as a bit pattern
encoding system must be selected before creating the database. The next
Chinese characters.
1) UTF-8
system, which can link every character in the unique Unicode character set.
UTF-8 system encodes each character using one to four 8-bits bytes. 11
10
MySQL::Workbench: http://www.mysql.fr/products/workbench/
11
UTF-8, a transformation format of ISO 10646 http://tools.ietf.org/html/rfc3629
39
Specifically, UTF-8 system encodes English character using one byte while
system has become the dominant character encoding for the World Wide
2) GBK
character set for simplified Chinese characters, used in China. GBK system
GBK encodes Chinese characters using one or two bytes. For now CBK
3) Big5
in Taiwan, Hong Kong and Macau for traditional Chinese characters. In fact
the characters in Big5 are included in GBK system. Unless there is special
Based on the analysis above, GBK system is more space efficient than UTF-
3.4 Summary
The methodology chapter interpreted the main technologies that will be used
12
Usage of character encodings for website: http://w3techs.com/technologies/overview/character_encoding/all
40
in the project so that readers are able to understand the implementation
41
4 Design and implementation
design and implementation steps. First the model of the database will be
4.1 Design
tends to store data more efficiently and meet the data requirements and
superior database model, the design procedure needs to follow some certain
42
As described in Figure 4.1, there are four steps in the design procedure:
quantity and type of data that may be used in the system and clarify the
application system.
entities.
model. In other words, the entities and their relationships shown in the
43
4) Physical design: in this step, the physical computer architecture will be
taken into consideration. For example: the access method and access
On the basis of the design above, data will be collected and database will be
database usually experiences repeating the design circle for several times. In
building of this project will be introduced follow the steps in the flow-chart.
The main mission in this step is to gain a clear idea of users’ actual needs,
proper way. For this project, users’ main activities are analyzed to determine
the primary way for beginners using the database. Their activities may
include:
44
For skilled Chinese learners, they tend to know a lot about the pronouncing
retrieve a character by searching its Pinyin and their main aim of searching a
character is likely to get the structure and stroke order of the certain
Pinyin and their relative information such as English meaning. In addition, the
Character entity: the set of all characters, which has ID, Unicode,
GBK code, Stroke number and order and related words as its
attributes.
Radical entity: which includes all the radicals and has ID, Stroke
number and order, English explain and related words as its attributes.
Pinyin entity: that will specify the ID, tone and associated characters
of each Pinyin.
45
Phonetic entity: in addition to ID and associated characters, each
The relationships between entities are shown clearly in the Figure 4.2:
correspond to one single radical and one kind of structure. On the other
hand, each structure, radical, Pinyin and phonetic may be associated with
46
multiple characters. Radical and phonetic components will also correspond to
Each entity can be converted into tables while the attributes of entities will be
following sections.
some extent. Hence the design phase of creating the tables plays an
important role in the whole project. As showed in the ER model, the five
entities can be directly converted into five tables. In addition, there are two
key problems to be solved at the design stage. The first one is how to choose
primary key. In the relational database, a primary key is a unique key of each
row that can define the characteristics. The essential feature of primary key is
that primary keys should be immutable, that is, not changed until the record
to be as primary key for the three tables of radical, phonetic and character
character a unique value. Since Pinyin is not character and do not have
Unicode value, each Pinyin in the table is allocated a unique ID number as its
primary key. The second issue is how to make connection among the tables.
47
The database is supposed to support search function across tables. For
also get its pronunciation and relevant characters. The solution of the second
1) Radical table
Table 4.1 shows the structure of Radical table. The Unicode value is the
primary key of radical table. The Pinyin column is linked to Pinyin table and
character table. The value of Pinyin is the Pinyin_ID in the Pinyin table.
Hence, Pinyin column can be the foreign key of Radical table. The value of
2) Phonetic table
Table 4.2 shows the design of Phonetic component table. Similarly, each
48
phonetic will have Unicode and ID value. The pronunciations of phonetics will
phonetic.
3) Pinyin table
Table 4.3 shows the details of Pinyin table. The value of tone is from zero to
The Characters column is linked to the character table. The value of it is the
To create the above three tables are relatively easy task for radicals,
49
phonetics and Pinyin are objective existences. But structures are the internal
the structure table, that is, how to decompose the characters then present
To tackle this problem, this project took a cue from a published thesis, which
text) are designed for computer to store and decompose Chinese characters.
learning.
50
Although there is certain degree of difficulty to collect and build a complete
set of character elements that can cover all the Chinese characters, a
table. In order to tell the encoding of structures from elements, the structures
“from structure encoding to element encoding, from top to bottom, from left to
right and from outside to inside”. The principle will be explained in the
examples:
44)”, in which H stands for the left-right structure, 47 stands for “亻”
51
(ren, people) and 44 stands for “十” (shi, ten).
up-down structure, 2 strands for “一” (yi, one), 17 stands for “口”
(kou, mouth).
52
Three-level structure character: the encoding of character ”侗” (tong,
introduced in the last two examples. Figure 4.5 shows the structure
of “侗”.
In this project, there will be an element table and structure table to store the
column in the character table. By means of connecting the three tables, the
4) Structure table
Table 4.4 shows the design of structure table. ID is the order number and
primary key of each row. The Name column contains the Chinese name of
each kind of structure while the English_name column stores the English
names. Code attribute is the structure encoding which will be used in the
structure.
5) Element table
Table 4.5 shows the design of element table. There are only two columns in
53
the element table. Element_ID column is the primary key and encoding of
6) Character table
Character table is the most complex table for it contains all information about
each Chinese character. It takes Unicode as the unique primary key of rows.
Radical_ID, Phonetic_ID and Pinyin_ID are foreign keys that have to be the
same with the corresponding ones in the Radical, Phonetic and Pinyin tables.
54
Field Type Null Key Default
Phonetic_ID Int Yes Null
Element_Order Varchar Yes Null
Related_Words Varchar Yes Null
The structures of six tables the Chinese character databases have been
the six tables in the selected RDBMS and input data in. The details will be
4.2 Implementation
Since the structure of the database has been designed. Tables can be
created according to the architecture and data are supposed to be input after
that. Hence, the main task this step is to collect data that need to be stored in
The collection of radicals is relatively easy task because all the radicals are
For example: “火” (huo, fire) as a radical, become oblique when locating on
the left side (e.g. “烛” zhu, candle) but when appearing at the bottom, it
55
4.2.2 Collection of phonetic components
focuses on the research of phonetic components has been found. The paper
characters and their components (if there are). 490 phonetic components
had been selected because they had the same pronunciation with the
found in the paper can only compose one character, then the value of storing
them into database can be considered relatively small while the high frequent
People's Republic of China, it has been summarized that the number of times
Unlike the other three kinds of data, Pinyin has its internal order. Letters on
each position are arranged according to the English alphabetical order, like
from “bai” to “ban” and from “bang” to “bei“. For each Pinyin, its tone is from
zero to five. Hence the only tasks of collecting Pinyin is to arrange all the
number.
56
the letters, which can be regarded as the elements of alphabetic languages,
Chinese character elements give the characters meaning. They reflect the
elements should include basic symbols that created for recording objects in
real world.
can be recorded in the element table directly because all the compound
234 simple characters and 200 radicals (GB18030, 2000). Although the
quantity of character symbols tends to be huge, the basic symbols for making
2009).
processing:
57
characters with various appearances.
Individual structure:
Vertical structure:
Horizontal structure:
58
divided into multiple columns.
Overlapping structure:
to right.
Enclosure structure:
divided into inside and outside two parts. The outside part is an
enclosed component.
can be divided into inside and outside two parts. The inside part is
direction.
and can be divided into inside and outside two parts. The inside part
direction.
can be divided into inside and outside two parts. The inside part is
direction.
59
Lower right-hand enclosure: contains two or more components and
can be divided into inside and outside two parts. One part is
can be divided into inside and outside two parts. One part is
can be divided into inside and outside two parts. One part is
Embedded structure:
Characters can be divided into two parts and the two parts are
mutual embedded.
The frame of each character is showed in Table 4.7. the structured code are
Structure Structure
Frame Name Frame Name
code code
Complete
G Simple P
enclosure
Upper
H Upper-lower Q
three-side
60
Structure Structure
Frame Name Frame Name
code code
Upper-mid- Left-hand
I R
lower three-side
Lower
J Multi-row S
three-side
Lower
K Left-right T
right-hand
Left-middle- Upper
L U
right right-hand
Multi- Lower left-
M V
column hand
Frame-
Pyramid
N W based
overlapping
embedded
Double Mutual
O X
overlapping embedded
6) Collection of characters
simplified Chinese characters. The list has two parts, one is for frequently
used characters, including 2500 characters, and the other is for secondary
(Zhang Xichang, 2007) Hence, it is reasonable that enter the frequently used
4.3 Summary
structure design. The six tables have been created according to the
The remaining work is to evaluate the performance of the database and for
62
5 Application development
that contains searching function of the data stored in the database. Firstly,
application is supposed to enable users view the data stored in the database.
In this section, two widely used learning systems with high performance will
Windows and Ipad platforms. The interface of the Hanzi Master is showed in
the top of the window. In the central part of the interface, the detailed
pronunciation, meaning, evolution process and so on. On the left and right
side of the central window, the words composed by the character will be
listed with their English explain. In the lower three windows, the characters
with the same radical, same phonetic component and same pronunciation
63
will be displayed respectively.
In summary, Hanzi Master provides initial learners with the great support of
However, the menu layouts and display manners tend to be complicated and
daily dialog courses and so on. The strongest feature of Chinese Master is
that it includes massive audio resources. For each character, word and
sentence, users can get their pronunciation, which enables learners grasp
13
Hanzi Master: Features: http://www.learnchinese.cn/interface.htm
64
characters combined with pronunciation. Figure 5.2 shows the main functions
of Chinese Master.
The conceptual structure will define the included functions of the application.
14
Chinese Master: interface: http://www.learnchinese.cn/interface.htm
65
According to the requirements analysis in the database design step, the main
Pinyin or other conditions then get the detailed related information of the
characters. Hence the outline of the application is that users can input the
searching terms such as radicals and Pinyin then matched results will be
At first, users can choose what he or she wants to search, radical, phonetic
users need to input the stroke number a, then system will select radicals or
equals a. All the results will be displayed on the screen. From the results
users can select the one he or she wants. The information of the radical or
radical or phonetic, then similarly, he or she needs to input the stroke number
displayed on the screen, users can get all the related characters related to
the radical or phonetic. Then users need to choose the one he or she wants.
66
Figure 5.3: Process flow
5.3 Implementation
67
the characters into codes that can be identified by computers. Figure 5.4
figure 0-9, letter a-z, A-Z and so on. By using ASCII, those characters can be
well. The above class accepts input Chinese character string then return
can be retrieved and viewed. The whole searching function can be divided
Chinese character class. Different classes will differ from each other mainly
in the query statement. The methods of writing those classes will remain
68
example. Figure 5.5 shows the source code of searching a character’s
The SQL query statements of radical table and phonetic table are assigned
this class is called by main function, it will remind users to enter a character.
Then the character will be converted into ACSII code by calling converting
machine will be connected to the MySQL database, then the RDBMS will
execute the two queries and return the results. The System.out.println
69
function will display the radical and phonetic component on the screen.
be divided into modules. Each module consists of one or more classes. Each
Due to the time limitation, the application has achieved the coding part but
without a user interface. For now, the application can be operated and tested
evaluation chapter.
5.4 Summary
70
6 Evaluation
current existed databases use character as the smallest unit. In other word,
process can draw a character using its ID. Such a model of database can
components first rather than the whole appearance when meeting some
focuses more on the shapes and structures of characters. Initial learners are
divided and extracted from characters. Users of databases are only able to
know set of characters from the very beginning rather than the basic
71
components.
It cannot satisfy the application demands of the whole society. The universal
of Chinese system, some rare characters are still used in special situations.
Chinese characters used around the world in total. GB18030-2005 can only
account for about seventy percent of the whole set of Chinese characters.
From 1978 to 2000, four standards have been carried out. Chinese is a
growing set of characters. Existed Chinese databases cannot keep pace with
6.2 Contribution
oriented Chinese database built in this project has solved the first problem
well.
generally more effective than other teaching method. (Huang Yaping, 2008).
72
component. For the recognizable elements of characters are components
rather than strokes, such a teaching method tend to reduce the errors in
shapes and meanings of radicals and simple characters, learners are able to
as their English explain and stroke order, which can help beginners’ cognize
The functions above can basically meet the general needs of Chinese
learners.
73
contribute to other research in Chinese field. For example, it can be used as
field.
The performance of the database will be tested in this section. For the
Workbench is used as the tool to operate the testing. It is a free official GUI
evaluation will aim at testing if the database can support westerners’ learning
The queries were conducted within one single table to test if the returned
results are correct. Two examples will be provided which were operated on
radical table and character table. The first query is to search the radicals
according to the stroke number. All the radicals which stroke number equals
two are correctly returned with their English explain and related words. The
74
Figure 6.1: Radical table test
The second query was operated on the character table, which searches
equals two. The results showed in Figure 6.2 are sorted by their Unicode.
75
In the single table test step, six queries were conducted on the six tables
In this step, the main mission is to test if two tables can connect correctly.
Queries will be conducted on two joint tables. For most learners may retrieve
characters according to the radical. The test will be conducted on radical and
character table.
Pinyin are written as “li”. The related words are required to be returned
together with the results. 5 results are returned, which are exactly the whole
76
6.3.3 Three-tables joint test
The connectivity among three tables are about to be tested in this step.
Queries will be conducted on character table, radical table and Pinyin table.
As showed in Figure 6.4: first, character table and radical table were
table was connected with Pinyin table to return the Pinyin of selected
The application test will specify one searching function since the operational
77
radical search function testing.
“机” (ji, machine) is the input character that needs to be analyzed its radical.
After executing the function, the radical “木” (mu, wood) was successfully
returned.
6.4 Summary
The evaluation chapter compared the database developed in this project with
the current existed Chinese databases, pointing out the advantages and
78
7 Conclusion
the problems that are supposed to be solved in this project. After researching
7.1 Deliverables
The database includes six tables: character table, radical table, phonetic
component table, Pinyin table, character element table and structure table.
Connection has been built among tables. For now, 263 characters’
information has been collected and input in the character table. Two hundred
The phonetic component table has 182 rows collected from available results
65 out of 877 selected character elements are input into element table. The
more than ten thousand rows was created, waiting to be completed the
detailed information.
79
7.2 Project limitation
commonly used characters can reach two thousand, the mission of collecting
data and input the into the database can be tremendous workload,
and ElementOrder are the keys to connect character table to the other five
Hence, although a huge character table which include more than ten
thousand rows has been established, which can cover almost all the
their absent information. The only solution of solving this difficulty is to invest
field, some suggestions can be provided for the future study in this project
80
database. Moreover, the following optimization suggestions may improve the
performance of database:
online resource which specifies those variants. For example: the top part of
the character “角” (jiao, horn) is the radical, which is a variant of character “刀”
(dao, knife). But it cannot be typed in through keyboards. Hence, the radical
table did not include it. Such a problem is still remained to be solved by
It is better to use 36-ary code for encoding the character elements rather
than decimal digits. Since there are hundreds of character elements have
been selected so far, to encode the elements using decimal digits can be
that have large encoding number, then the code of resenting the combination
rules of the character can be fairly complicated. The 36-ary coding system is
able to solve the problem. It uses ten decimal digits and twenty-six English
letters to present one to thirty-six. A two-digit code can cover 362 (1296)
elements, which can satisfy the encoding requirement of current data size.
structures. This proposition can be achieved in MySQL. The BLOB type can
be used for storing the figure samples of structures so that users can
81
graphically browse the structure information. BLOB stands for binary large
Since the application developed in this project can only execute some basic
searching tasks, more modules that can operate complex queries need to be
designed which can present the searching function clearly so that users can
82
Bibliography
[1] “History of Chinese writing shown in museums”. CCTV online. Retrieved 2010-
03-20.
[4] Qiu Zhiwen. The research of Chinese prototype of intelligent making Chinese
2008.)
[6] Chen Xuezhi, Zhang Canjun, Qiu Yuxiu, at al. The application of Chinese
Chinese teaching.
83
(张熙昌. 论形声字声旁在汉字教学中的作用[J]. 语言教学与研究, 2007, 2.)
[10] Wang, Jian, et al., eds. Reading Chinese script: A cognitive analysis.
19.)
[13] The Table of Indexing Chinese Character Component, State Language Affairs
0011-2009, 2009
2010.)
mehod.
84
[17] GB18030-2000. Chinese National Standard GB 18030-2000: Information
2000
2005
[19] List of Frequently Used Characters in Modern Chinese, State Language Affairs
<http://en.artintern.net/index.php/news/main/html/1/1101>
85
<http://tools.ietf.org/html/rfc3629>
<http://w3techs.com/technologies/overview/character_encoding/all>
<http://docs.oracle.com/javame/config/cdc/optpkgs/api/jsr169/java/sql/package-
summary.html>
[33] Hanzi Master: Features. [Image Online] Available at: < Hanzi Master: Features:
http://www.learnchinese.cn/interface.htm >
[35] Richard Sears, Chinese etymology-character “马”. [Image Online] Available at:
<http://www.chineseetymology.org/CharacterEtymology.aspx?submitButton1=E
tymology&characterInput=%E9%A9%AC>
[36] Richard Sears, Chinese etymology-character “鱼”. [Image Online] Available at:
<http://www.chineseetymology.org/CharacterEtymology.aspx?submitButton1=E
tymology&characterInput=%E9%B1%BC>
86