Sie sind auf Seite 1von 16

Unicode and Keyboards on Windows

Michael S. Kaplan
Cathy Wissink
Windows Globalization, Microsoft Corporation

1. Introduction
To implementers, it seems inputting data into applications via keyboards should be one of the
fundamentally simple features on Windows. However, once additional complexities like fonts
and rendering engines are taken into consideration, input appears to be not quite so simple
anymore. Adding many different keyboard layouts on top of over 135 locales further complicates
the issue. And finally, once you include the ability to define keyboard layouts (whether by
Microsoft interfaces or third party products) where all of Unicode can be supported, it becomes
downright complex!
This paper will discuss the many features that keyboard layouts support (such as dead keys, shift
states and ligatures), the interaction between input, fonts, and rendering engines, the issue of
code pages vs. Unicode, when IMEs are preferred and when they are not, and the collation issues
that enter into the equation. In the end, it will be clear that on Windows, the input of virtually
any characters in Unicode is possible, even if in some cases more work is required than was
originally expected.

2. The low-level details


Before diving into the details of a keyboard layout, it might be helpful to include a definition of a
keyboard layout. A keyboard layout is the collection of data for each keystroke and shift state
combination within a particular keyboard driver. It is not the physical keyboard that a user types
on, but rather, the software that the hardware calls to output text streams to applications.
Generally, anywhere this paper refers to a keyboard, keyboard layout is implied.

Starting with scan codes


Keyboard input starts at the hardware level. The keys on the physical keyboard each have a value
assigned to them called a scan code, and these scan codes are sent whenever you type a key. To
complicate things, keyboard hardware varies depending on the geographical market; in many of
the markets, you will find slightly different relationships between physical keys and scan codes.
Because of this, layout maps (like the full Windows XP list, which can be found at
http://microsoft.com/globaldev/keyboards/keyboards.asp) can be somewhat inaccurate in
some parts of the world, since the maps assume that: (a) physical key placement is identical and
(b) keys will have the same meaning, even if the hardware is different. For two examples of scan
code maps that cover the main part of the keyboard, see Figure 1 for US keyboards and Figure 2
for most European keyboards.

23rd Internationalization and Unicode Conference

Prague, Czech Republic, March 2003

Unicode and Keyboards on Windows

Figure 1: Scan codes for US keyboard hardware

Figure 2: Scan codes for European keyboard hardware


Note the different placement of scan codes between these two types of keyboards. For example,
scan code 0x2b is on the second row of the US keyboard, but is at the end of the third row of the
European keyboard. Scan code 0x56 is an additional scan code on the European keyboard, which
is not on the US keyboard. The shape of the enter key is also different.
(Note also that these maps here do not show other keys such as the numeric keypad or the
function keys; since those types of keys do not change when the language of the keyboard
changes, they are not covered by this paper.)
Scan code values in the hardware are invariant. Allowing scan codes to change would make the
support of multiple languages exceptionally difficult. This brings us to the Virtual Key values....

Virtual Key (VK) values


As we progress from the hardware and move to the software level, what becomes crucial is the
VK or Virtual Key value. These values fit within a byte (0x00 to 0xff) and are defined in
winuser.h: the Platform SDK header file that contains procedure declarations, constant
definitions and macros for the USER subsystem of Windows. You can see the virtual keys for the
US English keyboard layout in Figure 3. The decision of how scan codes and virtual keys map to
each other is made in the keyboard layout.

23rd Internationalization and Unicode Conference

Prague, Czech Republic, March 2003

Unicode and Keyboards on Windows

Figure 3: Virtual keys in the US English keyboard


Unfortunately for the implementer, the bulk of the most important VKs are not officially defined
but are implied in the comments:
/*
* VK_0 - VK_9 are the same as ASCII '0' - '9' (0x30 - 0x39)
* 0x40 : unassigned
* VK_A - VK_Z are the same as ASCII 'A' - 'Z' (0x41 - 0x5A)
*/
The rest of the virtual keys in use are explicitly defined constants, and there is no rule that keeps
all virtual keys with the same keys on the keyboard; as you change keyboard layout, those values
can change between different layouts. Note that in Figure 3, the implicit keys are all light gray,
while the explicit "OEM" keys are white1. You can obtain an array containing the state of every
VK by calling the GetKeyboardState API.
The VK values are important for the window messages that have to deal with keystrokes before
they are processed by the USER subsystem in Windows, such as WM_KEYDOWN. Although
there are minor changes in position between different keyboards even when the character values
are the same, they do not change much between different keyboard layouts. Here is an example
of a typical change: the letter "Q" is represented by the VK_Q on both the French and US English
keyboards though on the French keyboard the "Q" and "A" keys are in reversed positions relative
to the US keyboard (see the VK map for the French keyboard in Figure 4 for comparison with the
US layout in Figure 3).

The OEM keys are keys that add punctuation and symbols. The ones that commonly change
with different keyboards are OEM_1 through OEM_8, OEM_102, OEM_COMMA,
OEM_PERIOD, OEM_PLUS, and OEM_MINUS. On these keyboard layout maps they are
abbreviated with an O* prefix, followed by enough information to uniquely identify the key (e.g.
O2 for OEM_2 and OP for OEM_PERIOD).
1

23rd Internationalization and Unicode Conference

Prague, Czech Republic, March 2003

Unicode and Keyboards on Windows

Figure 4: Virtual Keys in the French (France) keyboard


The position of the OEM keys often changes between different layouts as well. Most of the other
VK positions are static. The changes are all quite minor when compared with the next step -where those keystrokes are processed.

Processing keystrokes
When a Windows message loop handles a VK in the WM_KEYDOWN message, it can pass the
VK to the DefWindowProc API. To handle the message, the code in the USER subsystem will
process the keystroke and convert it (when appropriate) to a character, passed as a WM_CHAR
message. This processing requires a great deal of information:

the shift state

the virtual key

the current keyboard layout

Once all of this information is collected by the USER subsystem (that is, the keyboard layout is
known for each thread and the WM_KEYDOWN message contains the VK and shift state), the
code is then is able to come up with the appropriate character, taking all the information about
shift states, VKs and current layout into account (obviously hitting arrow keys, for example,
would not be expected to insert characters; USER will not have any of this extra character-based
work run). You can mimic this behavior with several different Win32 APIs (see Table 1 for a list
of the APIs that can be useful for this).
Table 1: Keyboard input functions and what they do

Function

Description

keybd_event

Synthesizes a keystroke given a VK, a scan code, etc. (superceded by


the SendInput API)

MapVirtualKey

Maps between scan codes, VKs, and characters for the current keyboard
layout

MapVirtualKeyEx

Maps between scan codes, VKs, and characters for a specified keyboard
layout (layout must be loaded)

OemKeyScan

Maps OEMASCII codes to OEM scan codes and shift states

SendInput

Synthesizes a keystroke given a VK, a scan code, etc.

ToAscii

Maps a VK and shift state to a character on the current keyboard


layout's associated codepage

23rd Internationalization and Unicode Conference

Prague, Czech Republic, March 2003

Unicode and Keyboards on Windows


ToAsciiEx

Maps a VK and shift state to a character on the specified keyboard


layout's associated codepage (layout must be loaded)

ToUnicode

Maps a VK and shift state to a Unicode character per the current


keyboard layout

ToUnicodeEx

Maps a VK and shift state to a Unicode character per the specified


keyboard layout (layout must be loaded)

VkKeyScan

Converts a character to a VK and shift state for the current keyboard


layout

VkKeyScanEx

Converts a character to a VK and shift state for the specified keyboard


layout (layout must be loaded)

The functions in Table 1 are interesting in that when you read the descriptions, the functions
appear to be duplicates of each other. However, once you start needing these functions in an
application, you will see the small differences between these different functions can actually have
a great deal of importance for obtaining the features you need.2
In any case, your code has now passed a character onto an application and inserted text! You can
look at a few of the many keyboards supported on Windows (Figures 5-8) to help you see the
wide variety of possible characters to be inserted.

Figure 5: The Divehi Phonetic keyboard layout

As an example, the definitions of MapVirtualKey and VkKeyScan seem similar, but the former
does not handle shifted characters while the latter does. For more information, you can look at
the Platform SDK:
http://msdn.microsoft.com/library/enus/winui/WinUI/WindowsUserInterface/UserInput/KeyboardInput.asp
2

23rd Internationalization and Unicode Conference

Prague, Czech Republic, March 2003

Unicode and Keyboards on Windows

Figure 6: The Georgian keyboard layout

Figure 7: The Gujarati keyboard layout

Figure 8: The Thai Kedmanee keyboard layout

3. Language features and their influence on input


There are many features that keyboard input can require. These include:

single character keystrokes

ligatures

dead keys

shift states

AltGr shift states

Control shift states

23rd Internationalization and Unicode Conference

Prague, Czech Republic, March 2003

Unicode and Keyboards on Windows

Caps lock key

SGCap shift states

extended shift states

Each of them is described below.

Single character keystrokes


Obviously the mainstay of many of the keyboard layouts, a simple 1-1 mapping of keystrokes to
characters is what the bulk of most keyboard layout will consist of. Some languages will use
many other features as well, but all of them are likely to have at least a few of the single character
keystrokes.

Ligatures
There are many times that a single keystroke needs to enter more than one character. In keyboard
nomenclature, these 1:many mappings are called ligatures.
Note that this definition of ligature is not identical to the one used in typography or in language
orthographies; "ligature" here is used to identify multiple UTF-16 code points that are input by a
single keystroke. This could be used in a number of ways: to represent a linguistic character
consisting of multiple UTF-16 code points (such as Sri and Ksa seen on the Tamil keyboard,
shown in Figure 9); to represent multiple linguistic characters which often work together in the
language; or to develop a keyboard layout to handle a language represented by supplementary
characters (such as the Deseret keyboard layout in Figure 10)3. (Technically, one could even
create a keyboard with a keystroke that would insert "mike" or "cath" or "hiya" using a legal
keyboard layout ligature -- as seen in the silly keyboard layout in Figure 11.)

Figure 9: The Tamil keyboard in the shifted state, showing linguistic characters Sri and Ksa as ligatures

Since keyboards support UTF-16 code points on Windows, the only way to handle
supplementary characters on keyboards is via ligatures (the high surrogate and the low surrogate
make a ligature). The process is seamless from the user perspective; the user will not experience
any difference between supplementary characters and characters on the BMP, aside from a
limitation of 4 UTF-16 code points on a single key.
3

23rd Internationalization and Unicode Conference

Prague, Czech Republic, March 2003

Unicode and Keyboards on Windows

Figure 10: A keyboard layout for Deseret, a language using supplementary characters (each represented by
"ligatures" of UTF-16 high and low surrogates)

Figure 11: A very silly (but real!) keyboard layout (created by a developer for personal use). This shows the
4 UTF-16 character limit for a single keystroke.

Dead keys
The dead key mechanism is either very intuitive or incredibly confusing, depending on your
experience with legacy European keyboards. The basic concept is that you type a character
defined on the particular keyboard as a dead key, then type a specific second character known as
a base character. Rather than displaying these two characters, a unique third character known as
a combining character will be shown. The reason the first character is defined as a "dead" key is
that this character is not shown, and the cursor does not advance.
Dead keys are most commonly used in European keyboard layouts; a diacritic is generally used
as the dead key. An example of this can be found on the Finnish keyboard, where typing a
diaeresis (U+00A8) will initially do nothing, but then typing any of the characters in the first
column in Table 2 will cause the character in the second column of Table 2 to be displayed. For
example, if a user types a diaeresis, followed by a small letter a, Latin small letter A diaeresis ()
will be displayed.

23rd Internationalization and Unicode Conference

Prague, Czech Republic, March 2003

Unicode and Keyboards on Windows


Table 2: The Diaeresis dead key on the Finnish keyboard

Base Character

Combining Character

U+0020

U+00A8 ()

Any other character

U+00A8+other character

The last two rows in gray of the above table are important to note. The first gray row is a
common convention on most keyboards with dead keys; if you type the dead key and then a
space, you will get the spacing version of the character. The second one is not a part of the
keyboard layout definition, but is simply what happens if you type a dead key followed by a
character that is not defined in the keyboard layout as a base character for that dead key: the
deadkey is printed (input), followed by that second character. For example, Latin small letter C
is not defined in the keyboard layout as being a base character for the diaeresis deadkey. If
U+00A8 is typed, followed by c, those two code points will be input. No combining character
will be created.
While deadkeys are not limited to European keyboard layouts, that is where they are most
commonly used.

Shift states
A keyboard layout typically has only 47 or 48 assigned physical keys on it; even the English
alphabet would not fit, if you wanted both uppercase and lowercase A to Z (there wouldnt even
be room for punctuation characters). Therefore, keyboards usually contain another set of 47 or 48
keys that can be accessed by pressing Shift in tandem with a character (for examples, see Figure
12 and 13 for the Greek keyboard in both the unshifted and shifted states).

23rd Internationalization and Unicode Conference

Prague, Czech Republic, March 2003

Unicode and Keyboards on Windows

Figure 12: The Greek keyboard layout (unshifted)

Figure 13: The Greek keyboard layout (shifted)


Note how most of the letter keys are actually cased versions of each other (also note the light gray
keys; those are dead keys). By convention, most of the letters that have a cased version will
usually see that version in the shifted state. However, some languages have no notion of case, so
they do not need to use the shift state for this purpose.

AltGr shift states


Some languages need more than 96 keys to input their language properly. Using just the shift
state is not sufficient, so an additional shift state is added when Control+Alt is pressed. A
shortcut to this key combination is to use the Right Alt key, also known as the AltGr key. This
behavior is only expected for the keyboard layouts that define characters in the Control+Alt shift
state. An example of this is the Polish keyboard layout (see Figures 14-16 for the unshifted,
shifted, and Alt+Gr states of this keyboard).

Figure 14: The Polish keyboard layout (unshifted)

23rd Internationalization and Unicode Conference

10

Prague, Czech Republic, March 2003

Unicode and Keyboards on Windows

Figure 15: The Polish keyboard layout (shifted)

Figure 16: The Polish keyboard layout (Alt+Ctrl or AltGr)


You can also have an AltGr+Shift state as well; thankfully, few keyboards need this, as users find
it difficult to type such characters.

Control shift states


While it is technically possible to use the Control (CTRL) key as a shift character as well, it is
highly discouraged. The reason is that many programs use the CTRL key for various command
functions (such as Ctrl+S to mean "Save...") and many times if keystrokes are assigned in the
keyboard layout, those keystrokes will not work properly in programs that specifically handle
them for other purposes.

Caps Lock key


The caps lock key is usually intended to be a version of the shift key that (a) only shifts characters
that are cased versions of each other, and (b) stays shifted without having to hold down the key.
On keyboard layouts for languages without a notion of case, the caps lock may do nothing, or it
may be used for some other purpose entirely.

SGCap shift states


Some keyboards use the Caps Lock key as an access point for an entirely independent shift state
for some of the keys. Originally named for its use in the "Swiss German" keyboard, the SGCaps
shift state is also used in the Czech and Hebrew keyboards to allow this extra shift state. Like

23rd Internationalization and Unicode Conference

11

Prague, Czech Republic, March 2003

Unicode and Keyboards on Windows


dead keys, they are either very intuitive if you are used to them and incredibly confusing if you
arent familiar with them. The only real distinction of the SGCap shift states is that the Caps Lock
key opens one to two entirely new shift states (an additional 96 characters, between the shifted
and unshifted state). Using SGCap shift states in any other keyboards is discouraged unless you
want a keyboard layout to have the same feel as one of the keyboards that uses the functionality.

Extended shift states


It is technically possible to add up to three additional keys as "Shift" keys. When combined with
all possible combinations of the other shift keys this would allow a total of 55 other shift states.
Thankfully this feature is not used in any keyboards to its fullest extent; the Canadian
Multilingual Standard keyboard layout is the only one that uses even a single extended shift
state.

4. Other technologies and their impact on keyboards


Many other features and functionalities in Windows can influence what is done with the text
created by keyboards, depending on the complexity of the writing system. Several of them are
listed in this section.

Rendering engines and what do they do


The rendering engine has a difficult job. It is tasked with properly displaying complex script
text4 in Windows and any running applications, which is a job made much more difficult by the
wide variety of scripts and languages supported on Windows. On versions of Windows prior to
Windows 2000, many clues about the language/script came from the HKL ("handle to a
keyboard layout", now known as an input locale), since the LOWORD of the HKL is a language
ID5. This usage has largely been deprecated on the newer versions of Windows6, which use the
infinitely more sophisticated Uniscribe (Microsofts shaping engine technology) and its various
engines that render text based on the writing system of the appropriate language. On downlevel
platforms, however, you can still see a great deal of information being obtained by this value.

A complex script is any writing system that needs additional processing in order to properly
display. For example, Arabic needs contextual shaping as well as bidirectional behavior,
Vietnamese needs diacritic positioning, and Indic scripts sometimes need rearrangement of
vowel marks. Uniscribe handles this kind of processing.
4

For more information, see the Platform SDK (http://msdn.microsoft.com/library/enus/winui/WinUI/WindowsUserInterface/UserInput/KeyboardInput/KeyboardInputReference


/KeyboardInputFunctions/GetKeyboardLayout.asp )
5

6 This includes any NT-based version of Windows after Windows NT 4 (Windows 2000,
Windows XP, and the upcoming Windows .NET Server 2003).

23rd Internationalization and Unicode Conference

12

Prague, Czech Republic, March 2003

Unicode and Keyboards on Windows

Shaping Engine

To storage,
collation, etc.

Uniscribe

Input method

Language? Kannada
U+0C97, U+0CBF

Keyboard.dll

Script? Indic
Basis of Analysis? Syllable

Kbdinkan.dll
Unshifted VK_I

Engine breaks run into syllables

Unshifted VK_F

0C97 0CBF |

Code points

Glyphs

To display

OpenType Layout Services


Glyph substitution

Glyph positioning

Figure 17: The relationship between a keyboard, the rendering engine and display in a complex
script (Kannada, an Indic script language).

Fonts
What has diminished the importance of the HKL of a keyboard has been the increased selection
of fonts available, as well as font linking (the borrowing of information from multiple fonts to
obtain glyphs not in the current font), which was introduced in Windows 2000 and improved for
Windows XP. Obviously for a keyboard to work well, it assumed that there will be at least one
font somewhere on the machine to assist in displaying the inputted text, lest every character be
replaced by a null glyph7.

IMEs -- when are they preferred?


An Input Method Editor (IME) is a program that allows computer users to enter complex
characters and symbols, such as Japanese Kanji characters, by using a standard keyboard. It is a
solution to the issue of ideographic languages having tens of thousands of characters, or more.
IMEs allow different, alternate means of input for such cases.
Attached to each IME is a keyboard layout. On Windows the convention has always been to
attach it to the US English keyboard layout, although some third party IMEs might be attached to
other keyboards. The reason that the US English keyboard is usually preferred is that nonA null glyph is used when the font is not available on the system, generally in the shape of a
box.

23rd Internationalization and Unicode Conference

13

Prague, Czech Republic, March 2003

Unicode and Keyboards on Windows


Unicode applications using CJK languages would be relying on default system code pages that
would not include the text for other languages. Using a US English keyboard simplifies matters.
For more information on IMEs, see the Platform SDK.8

Dealing with code pages


Although Windows keyboards are exclusively Unicode, it is important to note that if a keyboard
is used with a non-Unicode application, some effort should be made to support this application
when possible by choosing characters that fit with the appropriate Windows code page (ACP).
Obviously this is not always feasible, since some languages are only supported by Unicode on
Windows (e.g., Armenian, Georgian, Hindi, etc.), and thus do not have a system code page.

Sorting out collation issues


For the most part, collation and keyboards do not have to interact. There is one major area where
they can have an impact, and that is the fact that many keyboards (both ones from Microsoft and
those provided by third parties) fail to have a consistent story in their use of composite versus
precomposed characters. This can require an extra normalization step if the input is going to be
used in XML and other technologies that expect normalized data.
Collation itself is handled well on Windows, with the proper equivalences between the
composite and precomposed forms being an important part of the sorting data kept by the OS9.

5. Keeping it under the covers


One of the most important features of keyboard layouts under Windows is the seamless
behavior: everything discussed in this paperthe USER subsystem, font technology, shapingis
not noticed by the vast majority of the people using the OS. Users simply run setup and choose a
language, and everything seems to work. Obviously it is easy for this to not work properly if the
user does not know what the content of their keyboard layout is, and their assumptions about
what the layout should be turn out to be wrong. It is in fact the users expectations and
assumptions around their keyboard choices that will often lead to the availability of multiple
keyboard layout choices for a single language. For example, there are both Divehi Phonetic and
Divehi Typewriter keyboard layouts in Windows XP, so that the user wanting to type Divehi text
is more likely to find a layout that they prefer.

6. Factors in keyboard layout creation


When developing keyboards for a particular market, a number of factors should be taken into
consideration:

Is there some kind of keyboard standard for the region or country? It is sometimes
required to have an input method which is sanctioned by the government or an
appropriate governing body. Implementers should consider contacting their local or

See http://msdn.microsoft.com/library/en-us/intl/ime_5tiq.asp.

For more detailed information of collation on Windows, please see our talk Sorting it all out: an
introduction to collation, available at
http://www.microsoft.com/globaldev/Presentations/unicode22/016.doc
9

23rd Internationalization and Unicode Conference

14

Prague, Czech Republic, March 2003

Unicode and Keyboards on Windows


national standards body prior to developing a keyboard. In addition, implementers
should consider de facto standards (that is, standards which are not official, but are used
by so many people that they are considered standard).

What languages will the keyboard support? This should be explicitly determined before
allocating keys to characters.

Does the keyboard provide input of all needed linguistic characters for the appropriate
language(s)? This requirement can be met in a number of ways: via dead keys or
additional shift states, for example (not all characters need to be on the unshifted state).
High frequency linguistic characters should be positioned where they are easy to type,
ideally in the unshifted state. (Note that if the keyboard supports multiple languages, the
high frequency keys may change.)

Does the keyboard focus on code points, and not glyphs? It is important to not place the
burden of display or shaping onto the keyboard. All technologies related to visual
display are decoupled from the keyboard (and should be handled by fonts and a
rendering engine if needed; see section 4 for more information).

Do all characters on the keyboard exist in Unicode? Since all input on Windows is based
on Unicode (UTF-16), any code points not encoded in Unicode cannot be handled.

Are supplementary characters (non-BMP characters) encoded in UTF-16 and handled in


the ligature section of the keyboard? Is the limit of 2 supplementary characters (4 UTF-16
code points) met on each key?

Ideally, a keyboard should be consistent in its behavior concerning precomposed


vs.composite characters.

7. Myths about keyboard layouts


We hear many misconceptions about keyboards and what they can do. This section will
hopefully clear up a few of these.
I get the feeling Microsoft just makes up these keyboards by themselves. Why dont they represent my
language the way I expect them to?
New keyboards for a market always get tested in their respective market. A great deal of
research does go into the keyboards shipped with the system, with feedback from linguists,
government officials, other internationalization experts, and local software providers. Often it is
the case of Beckers law applying (that is, for each expert, there is an equal and opposite expert),
unfortunately.
I dont like the keyboard layout Windows ships for my language; can we remove it or change it?
In an ideal world, customers could customize their keyboard infinitely (and there are some
projects out there that will simplify this process, which will be discussed at the presentation), but
due to backwards compatibility, we cannot simply remove a keyboard or change keys. There are
simply too many customers who count on consistent behavior across releases (even if the
behavior is not ideal). In addition, while a customer may not like the keyboard, this may be a
national standard for the language, and there may be a requirement to support this particular
keyboard. There are a number of other input options to help users input characters not on their
keyboards, including:

Character Map (available from Accessories|System Tools)

23rd Internationalization and Unicode Conference

15

Prague, Czech Republic, March 2003

Unicode and Keyboards on Windows

The Insert Symbol Dialog (available in Office)

The ALT+X option, also available in Office. (Typing ALT+X after a character gives you
the Unicode value; typing ALT+X after a Unicode value gives you the character.)

I want to make sure I have every single visual variant of my characters on the keyboardthe canonical (or
isolate) version of the code point is not sufficient.
As is discussed in the other technologies section, keyboards on Windows only deal with code
points, not with glyphs. Code points are used exclusively for text processing, except for display.
At the point of display, technologies such as fonts and rendering engines map between code
points and glyphs. There is an important technical boundary between code points and glyphs,
and this exists in order to maintain at least modicum of simplicity within the system. (Imagine if
every single visual variant of a code point had to be maintained for text processing!) For this
reason, keyboards focus exclusively on code points, and leave the work of linking code points to
the appropriate visual display to fonts and shaping engines.
I want to have an IME rather than a keyboard for my language.
This is generally heard from customers working with complex script languages who feel that
they need to have all visual variants of a code point on an input method. Input Method Editors
really make sense with ideographic languages such as Chinese or Korean, where there are
literally thousands of characters needed for the language. Each of these ideographic characters is
semantically distinct. Compare this with complex scripts, where the number of semantically
distinct characters is generally less than 100, but the number of visually distinct characters is
considerable (into the hundreds). Again, keyboards work with code points, not with glyphs.
Since code points are semantically distinct and not visually distinct, a complex script language
can easily be handled via a keyboard; as noted earlier, the code points are linked to the
appropriate visual display by other non-keyboard technologies.

8. Summary
As has been described in this paper, the inner workings of keyboards are more complicated than
a developer would probably like them to be. What is crucial is understanding the association
between the virtual keys, the scan codes and the shift states in a keyboard. In addition,
developers should understand the relationship input has to other technologies, once the
keyboard passes on the code points (e.g., Uniscribe, font technologies and IMEs). This paper has
only touched upon many of the issues, but we hope that it has provided implementers enough
knowledge to avoid pitfalls, and provide customers with a seamless input experience.

23rd Internationalization and Unicode Conference

16

Prague, Czech Republic, March 2003

Das könnte Ihnen auch gefallen