Japanese Chinese tea web sites

I'm not sure exactly what you mean here. Do you mean pasting a CJK character into something that would pull up the appropriate Unihan page?

/Lew

Reply to
Lewis Perin
Loading thread data ...

I just Googled for the Chinese character for tea, limiting it to *.jp sites. The first one that came up

formatting link
has nothing about UTF-8 in its source, but does have

Then why is it that, when I copy the tea character from that Japanese page into Google and search again, the search term is identical? (By the way, the search term is URL-encoded UTF-8: %E8%8C%B6

I don't have access to Google's source code, but it seems clear to me that they're not confused by the Big5 vs. GB vs. JIS. They're probably converting everything to Unicode codepoints before indexing.

They use different codepoints for *some* glyphs - but mostly they use the same codepoints for glyphs they share. The fact that e.g. GB enumerates the Cha character differently than JIS doesn't affect the fact that they both use the same Unicode codepoint.

/Lew

Reply to
Lewis Perin

Something like that. CJK GB,BIG5,JIS,KS national characters to Unicode. I can go from Unicode to CJK national characters. Zhongwen,Mandarintools,Babelfish require Unicode. If I know the Unicode I can use Unihan to look at a graphical representation of the character without loading charasets including Unicode for MS. I still don't know how you are getting GB2312 webpages to show you Unicodes. NJ Star Communicator apparently can do that but it would be overkill for my limited use.

Jim

Lewis Per>

Reply to
Space Cowboy

Try this:

formatting link

You need Javascript, but I promise it won't do anything evil.

/Lew

Reply to
Lewis Perin

At this point I think we are talking past each other. For example I want to take the GB codepoint 1872 and translate it into Unicode codepoint 8336. Agreed the different codepoints for tea in the CJK language packs will point to 8336 which is UTF-16 representation consistent with the language pairs for non Roman language packs. I know Google will take my Unicode strings and return matches it finds in websites coded in charset other than UTF. At that point I can't cut and paste any characters from those webpages into Babelfish,MandarinTools,Zhongwen because they're not Unicode. From what I understand NJ Star Communicator for example will flip charset=GB2312 and charset=UTF-8. If I find anything pertinent on language pack codepoints to Unicode codepoints I'll report back. I can go from Unicode codepoints to language packs codepoints.

Jim

Lewis Per> > As I said before the tea codepoint for GB2312 is 1872, BIG5 AFF9,

Reply to
Space Cowboy

I have a routine that does the same thing offline. It takes Unicode strings, determines their hex value, and calls Unihan. I was hoping it would take CJK language pack strings for example paste in the GB or JIS codepoint character for tea. There has to be an easy way of going from language packs codepoints to Unicode codepoints.

Jim

Lewis Per>

Reply to
Space Cowboy

Sorry, I really don't know what you mean by a "CJK language pack string". The page I cited lets you paste a CJK character from a Chinese website and get back the corresponding Unihan page.

/Lew

Reply to
Lewis Perin

Just download NJ Star Communicator, and you can convert into any of 21 options. It's simple. And easy to use. But beware, some characters don't convert properly. It's a machine conversion. And it doesn't replace human conversion. For example, this software in GB mode only supports about 7 000 characters - or something like that. But in Big5 mode, it supports 15 000 characters. So there are going to be many characters, that don't get converted, or are converted into another character, rendering the meaning of the text useless.

And 15 000 is not a lot of characters. For common, every day Chinese language, it's fine. But for some scholarly or artistic work, I often can't find the character I am looking for in my software - because it's not in there. When it comes to Chinese, computers are still way behind, and woefully inadequate. But somehow, we still get by. Amazing isn't it? Chinese fonts are another big beef of mine. But anyway, save that for later.

Reply to
niisonge

sort of like searching for references to french fries on a french web site?

Reply to
bridger

Here is an interesting site for GB2312 to UNICODE conversion

formatting link
I found yesterday. As I previously suspected it is a mapping and not a mathematical routine even though the table was generated by a Java program with a bunch of but-ifs. I didn't see anything right off the bat that would prevent Javascript from doing the same thing mathematically as Java. The table says B2E8 is the GB value for the Unicode value 8336 and not 1872 as mentioned in Unihan. I can tell I'm going to have some fun. Also if I had DOTNET loaded then there is a simple routine to indicate the language pack such as GB2312 and give the corresponding Unicode value. The charCodeAt routine in Javascript is just a Unicode character to Unicode hex representation. The two byte hexview of a Unicode character is not the same as the result of the charCodeAt conversion. Some things are flipped around in the way the Unicode char is stored on disk. In a file the Unicode tea character is stored as 36383. Notepad will store the Unicode tea character as four bytes with the first two characters high order FFFE.

Jim

Lewis Perin wrote:

Reply to
Space Cowboy

You said before it really doesn't expire. What do you mean by that? Most of the time you'll lose some functions such as printing or limited file size. If I stay with Unicode I am fine for tea terms but occasionally I would like using native language packs.

Thanks, Jim

niis> Just download NJ Star Communicator, and you can convert into any of 21

Reply to
Space Cowboy

What I mean by doesn't expire, is that for the first 30 days, you can use the software fine. After 30 days, you get a splash screen that reminds you to buy the software. It counts down 1 second for however many days you use it beyond the 30 days. Then, after 50 days, the screen kind of stays there permanently. But it's movable. So you can move it right off the desktop, out of your way. Then, you can still use the software without being bothered by that screen. Just don't click "I agree" after the 50 day period.

Some other weird things happen too, but the software is still fully functional.

The only thing the donwload version doesn't include are Chinese fonts. But that doesn't matter if you donwnload the Asian Languages pack for MS Office. You can use the MS Office fonts instead - like Simsun, Mingliu, etc. But they're not very good fonts - just basic ones.

I have used this software for over a year without problems. It has a lot of features that Asiansuite doesn't have.

Reply to
niisonge

DrinksForum website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.