Sie sind auf Seite 1von 4

Lesson: Internationalization of Network Resources

In a modern Internet community, many users are no longer satisfied with using only ASCII symbols
to identify a domain name or a web resource. For example, they would like to be able to register a
new domain using their native characters in Arabic or Chinese or to define a new URI using Unicode
characters. That is why the internationalization of network resources is a cornerstone in widening
horizons for the World Wide Web.

This lesson describes the internationalization of the network resources Domain Name and Resource
Identifier.

Internationalized Domain Name

This section explains how to perform a mapping between Unicode domain name and its ASCII form.

Internationalized Resource Identifier

This section shows how to use the mapping methods of the URI class to convert between an IRI and a
URI.

Internationalized Domain Name


Historically an Internet domain names contain ASCII symbols only. But lately the number of those
users who want to use Unicode characters when registering their domain names increased steeply.
But domain name resolving system does not allow to apply Unicode characters.

Internationalizing Domain Names in Applications (IDNA) was adopted as the chosen standard and
has a purpose to convert Unicode characters into standard ASCII domain names and thus preserve
the stability of the domain name system.

Examples of the internationalized domain names:

 http:// .cn
 http://www.транспорт.com

As you follow one of these links you may notice that a Unicode domain name represented in the
address bar will be sustituted by the ASCII string.

You may get interested about how to perform such conversion in your application.
According to RFC 3490, IDNA does not extend the service offered by DNS to the applications.
Instead, the applications (and, by implication, the users) continue to see an exact-match lookup
service.

There are two main operations to accomplish the conversion between ASCII and non ASCII formats:

448
 In Java™SE the ToASCII operation is used before sending an IDN to domain name
resolving system or writing an IDN into a file where ASCII characters are expected
(such as a DNS master file).
 The ToUnicode operation is used when displaying names to users, for example names
obtained from a DNS zone.

A special class java.net.IDN in Java™ SE allows to perform these operations. This class has two
methods per each operations. The toASCII(String input, int flag) method allows to convert
Unicode characters to ASCII.

flag parameter defines the behavior of the conversion process. The ALLOW_UNASSIGNED flag
indicates the using of code points that are unassigned in Unicode 3.2 and the
USE_STD3_ASCII_RULES flag enables the check against STD-3 ASCII rules. You can use these
flags separately or logically OR'ed together. If the flag equals zero, you can specify its value in the
two-argument method or just invoke a counterpart method:

toASCII(input);

If the an input argiment doesn't conform to RFC 3490, this method will throw
IllegalArgumentException.

String ace_name = IDN.toASCII("http:// .cn/");

The toUnicode method Translates a string from ASCII Compatible Encoding (ACE) to Unicode
code points. This method never fails, in case of any error the input string remains the same and will
be returned unmodified.

Security concern

A potential security risk appeared because IDN allows websites to use Unicode names. It can make
easier to create a web site that can has a domain name, security certificates or even an outward
appearance exactly like your own site. But in fact, it can be used for phishing purpose in order to
collect private information about your site visitors. These sites are called a spoofed web sites.

For example, somebody can register a site with identical domain name as you have, by substituting a
small Latin "a" or "o" with a resembling Cyrillic "a" or "o". In this case, new domain points users to
another site and potentially opens users up to homograph attacks.

This is a well-known issue from the very beginning of introducing of the IDN conception. You can
avoid it by turning off the IDN support entirely. You should type "about:config" into the address
bar of the browser, find the "network.enableIDN" setting, and change its value to "false".

449
Also, both Mozilla and Opera have now announced the using of per-domain whitelists for selectively
switching on IDN for those domains which are taking appropriate anti-spoofing precautions.
You can try to adjust the "network.IDN.whitelist.<lang>" settings to enable/disable a whitelist
for a partucular language.

Internationalized Resource Identifier


Internationalized Resource Identifier (IRI) like IDN may contain Unicode characters, while
Uniform Resource Identifier (URI) is limited to ASCII symbols only.

According to RFC 3987 IRIs are meant to replace URIs in identifying resources for protocols,
formats, and software components that use a UCS-based character repertoire.

At first sight, you may consider that this task must been decided with the same means as for IDN.
But there is not so exactly. Let's view a resource identifier structure:

You may notice that it has several components.

The authority component of a URI parses according to the following syntax

[user-info@]host[:port]
where the characters @ and : stand for themselves. The host component can be an IP-literal, an
IPv4address, or just a name.

In a case, where a host is a domain name the IDN approach, i.e. the mapping, could be applied.

450
But generally the URI structure is more complicated. Applications can use URI-reference syntax to
make reference to a URI, instead of always using above generic syntax rule. A URI-reference is
either a URI or a relative reference. If a URI-reference doesn't specifies a scheme, it is said to be a
relative reference. Usually, a relative reference expresses a URI reference relative to the name space
of another URI.

Nevertheless, the instances the java.net.URI class can represent IRIs whenever they contain non
ASCII characters.

This class was enhanced by the following methods to perform the operations and conversions
according to RFC 3987:

 toASCIIString() - converts an IRI to a URI and returns its content as a US-ASCII string.
 toString() - returns the content of this URI as a string in its original Unicode form.
 toIRIString()Converts this URI to an IRI and returns its content as a string.

As regards the following code:

URI uri = new URI("http:// .cn/");


HttpURLConnection conn = (HttpURLConnection)
uri.toURL().openConnection();
conn.getResponseCode();

Unfortunately, we can not perform this now, it is planned for the next release of Java™ SE.

451

Das könnte Ihnen auch gefallen