Unicode Chinese and Thunderbird

S

Steve Hay 20 years ago

All,

I just found an interesting extension for all of you Unicode fans out there who post chinese characters. There is an extension for Mozilla Thunderbird called "Mnenhy" that allows encoding and decoding of text between various formats. One can control what is converted by highlighting the selection.

For example, if I see 茶, I can determine its unicode value by highlighting it and selecting "Encode->Decimal" and it is replaced with

33590, the unicode value for 茶 (cha).

I'm sure there are other ways to do this but I found this and thought some folks here might find it helpful.

I also found:

formatting link

which looks like it might be quite helpful for the person trying to find Japanese translations. It seems to also translate some Chinese although maybe the languages have similarities in their noun-space. For example, I found pu-erh tea in there. It translated it as 普アル茶. This is close to what Mike has on his website (普洱茶), although it seems to substitute the middle character for two characters.

Vote

S

Space Cowboy 20 years ago

There are more web pages in the native language sets than Unicode. I've developed routines that convert from the two major Chinese native language sets GB2312 Simplified BIG5 Traditional and the two Japanese language sets JISX208 SHIFT_JIS to Unicode. I use Unicode.Org to see the glyph and the Unicode character for Google searches. I did some previous posts on the process. In summary download the Unicode CJK table from Unicode.Org. Use the Simplified and Traditional language pairs to do a lookup for the Unicode. The JISX208 Japanese code stored on Unicode is the KUTEN value. You need to convert from JISX208 and SHIFT_JS to KUTEN. All 32 bit MS OSes are Unicode compliant except for

95,98,Me which are 16 bit. It takes 4 bytes to store a UTF-8 and UTF-16 value.

Jim

PS: I'll let Kuri expla> All,

Vote

S

Space Cowboy 20 years ago

The other thing I noticed is the Chinese character for ER3 only exists in JIS212 fontset. I don't know what would happen if you pasted in a typical JIS208 IME. Probably as you described. You will get some hits on Japanese webpages for Puer with the Unicode string that Steve provided. However probably due to same paste problem you described. I also understand why Unicode.Org didn't provide any information for the two Unicode characters but defaulted to erroneous Japanese Unicode strings from the Japanese WWW Edict server which provided Steve's string in the first place. I tried I don't know how he came up with the string in the first place. In other words you can't plug the two characters back into EDICT and find a definition for either.

Jim

kuri wrote:

Vote

K

kuri 20 years ago

No, it isn't a translation. There word was transformed when it was pasted into a Japanese program.

"アル" (aru) is the reading of 2nd character written in katakana (Japanese phonetic reading）. The problem is most Japanese programs don't display systematically the 洱 character because they don't have the fonts. If you insist to write 普洱茶, the Japanese computer that don't get fonts for the 2nd character will transform it. Here it was transformed into its*reading* (it could have been cut into 2 characters, replaced by something unrelated, not displayed... ).

Usually to avoid display problem, they write "puer" in phonetics : ﾌﾟｱｰﾙ茶 or　プアル茶 or プーアール茶, and even "puer cha" completely in phonetics :. プーアールチャ. On packages in Japan, they write the Chinese characters

Japanese reading.

Kuri

Vote

Unicode Chinese and Thunderbird

Join the Discussion

Didn't find your answer?