-Z- (Z@Gundam.Com)
Tue, 04 Jan 2000 20:04:25 -0800


Over the Christmas and New Year holidays, I converted my Gundam: High
Frontier web pages from ISO-8899-1 (Western European) to UTF-8 (Unicode)
encoding. I did this once before, then backed off when I discovered that
many browsers that I'd thought Unicode capable, including UNIX versions of
Netscape Navigator, wouldn't parse it properly. I've now decided that
Unicode is the wave of the future and that it's incumbent on those who make
browsers to catch it. I was won over by the simplicity of mixing several
languages with completely different character sets on one page, without the
need for GIF images or plug-ins.

It started with that "Turn A" character, which can be easily encoded in
UTF-8 as &#8704 and browsed correctly with Internet Explorer 3.0 or higher,
although the much-vaunted Netscape still barfs on it on most platforms.

But now the Unicode fever has really caught up with, because now I can
encode katakana and hiragana just as I did the "Turn A" character. Since 1
January 2000, my Gundam High Frontier pages have included the Japanese
phrase "Kodoh Senshi" in hirigana and the word "Gundamu" in katakana, with
the title of a Russian science magazine presented in Cyrillic.

And starting today, the phrase "Kidoh Senshi" is displayed as KANJI!

What's really neat is that these non-English character sets are displayed
in the same font as the surrounding text. I love it! Unfortunately, not
everyone who browses the pages will see this, because only Unicode-capable
browsers can parse it and, alas, some major browsers still aren't
Unicode-capable.

To which I say "Hey! Wake up and smell the Java!"

For those who'd like to drop the name of their favorite show onto their Web
pages, here's the Unicode 3.0 encoding for it:

<META HTTP-EQUIV="Content-Type" CONTENT="text/html; CHARSET=utf-8">

&#12300;&#26719;&#21205; &#25126;&#22763;
&#12460;&#12531;&#12480;&#12512;&#12301;

&#12300; and &#12301; are the upper-left-corner and lower-right-corner
angle brackets that the Japanese use as quotation marks.

&#26719; is the kanji KI, &#21205; is the kanji DOH, &#25126; is the kanji
SEN and &#22763; is the kanji SHI.

&#12460; is the katakana GA, &#12531; is the katakana N, $#12480; is the
katakana DA and &#12512; is the katakana MU.

You'll find all of this and more at the Unicode home page:

http://www.unicode.org

For charts of the various Unicode character sets, browse:

http://charts.unicode.org

Charts of the Chinese/Japanese/Korean (CJK) are in the Unihan section:

http://charts.unicode.org/unihan

If your browser doesn't parse therse characters, in which case you'll get
either a question mark or an empty box where the character should be, then
you need to consider getting another browser.

-Z-

+-----------+
| W E L C O |
| M E T O |
| T H E |
| N E X T |
| L E V E L |
+-----------+

-
Gundam Mailing List Archives are available at http://gundam.aeug.org/



This archive was generated by hypermail 2.0b3 on Thu Jan 06 2000 - 01:53:20 JST