[REBOL] Re: Rebol & XML encoding; use encoding="windows-1252"
From: al:bri:xtra at: 5-Jul-2002 21:40
Actually, I'm fairly sure now that I'm partially wrong!
I believe it's a bug in the MS operating system.
I've been reading Ed Batutis' web site here:
http://www.batutis.com/i18n/papers/mlang/samples/
and I've been trying out his MLangDet on my Windows XP system (with all the
latest upgrades from Microsoft) on a text file, and came across a
interesting problem with the MLangDet software. With a simple .txt file that
contains just the following:
Telephone: +64-6-9748241
with one empty line before and after, the MLangDet program reports this .txt
file as Unicode (UTF-7). If I simply replace both of the "-" with a space,
like this:
Telephone: +64 6 9748241
Then MLangDet reports the .txt file as US-ASCII.
I've also noticed that in MS Internet Explorer, when the first line of text
is placed in XML/XHTML, the browser also declares that the page is now UTF-7
(instead of UTF-8) and shows the telephone number as:
6-9748241
instead of:
+64-6-9748241
I think this behaviour is because both MS Internet Explorer and MLangDet use
the same operating system function to detect the various encoding scheme.
When I turn off MS Internet Explorer automatic detection, then the correct
telephone number is shown.
This is a very curious problem!
Andrew Martin
ICQ: 26227169 http://valley.150m.com/