Mailing List Archive: 49091 messages
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

[REBOL] Re: Rebol & XML encoding; use encoding="windows-1252"

From: al:bri:xtra at: 5-Jul-2002 21:40

Actually, I'm fairly sure now that I'm partially wrong! I believe it's a bug in the MS operating system. I've been reading Ed Batutis' web site here: http://www.batutis.com/i18n/papers/mlang/samples/ and I've been trying out his MLangDet on my Windows XP system (with all the latest upgrades from Microsoft) on a text file, and came across a interesting problem with the MLangDet software. With a simple .txt file that contains just the following: Telephone: +64-6-9748241 with one empty line before and after, the MLangDet program reports this .txt file as Unicode (UTF-7). If I simply replace both of the "-" with a space, like this: Telephone: +64 6 9748241 Then MLangDet reports the .txt file as US-ASCII. I've also noticed that in MS Internet Explorer, when the first line of text is placed in XML/XHTML, the browser also declares that the page is now UTF-7 (instead of UTF-8) and shows the telephone number as: 6-9748241 instead of: +64-6-9748241 I think this behaviour is because both MS Internet Explorer and MLangDet use the same operating system function to detect the various encoding scheme. When I turn off MS Internet Explorer automatic detection, then the correct telephone number is shown. This is a very curious problem! Andrew Martin ICQ: 26227169 http://valley.150m.com/