World: r3wp
[!REBOL3-OLD1]
older newer | first last |
PeterWood 11-Sep-2009 [17494] | I think that to binary! will decode a Rebol string! to utf-8 : >> to binary! "^(20ac)" ;; Unicode code point for Euro sign == #{E282AC} ;; utf-8 character sequence for Euro sign |
Maxim 11-Sep-2009 [17495x3] | maybe peter's excellent encoding script on rebol.org could be used as a basis for converting between ascii -> utf8 when using R3 binary as an input. while R3 has them built-in |
while = until | |
sort of like: print to-ascii to-binary "some text" | |
Pekr 11-Sep-2009 [17498] | I don't want to encode anything for simple CGI purposes, gee ;-) |
Maxim 11-Sep-2009 [17499x2] | but R3 is now fully encoded, which is REALLY nice. you don't have a choice. Resistance is futile ;-) |
and the fact that binary gives us the real byte array without any automatic conversion is also VERY nice, for building tcp handlers... it would have made my life much simpler in the past in fact. | |
Pekr 11-Sep-2009 [17501x2] | But this is some low level issue I should not care about. It displays Czech codepage correctly. Also the script is said being by default UTF-8, which is superset to ASCII. IIRC it was said, that unless we will not use special chars, it will work transparently. If it works on input, it should work also on output, no? |
OK, so we have http headers, which are supposed to be in ASCII, and then html content, which can be encoded. Which responsibility is it to provide correct encoding? A coder, or an http server? Hmm, maybe coder, as I am issuing http content headers in my scripts? | |
PeterWood 11-Sep-2009 [17503] | Pekr: Just try a quick test with: print to binary! "Content-type: text/html^/" print to binary! get-env "REQUEST_METHOD" print to binary! get-env "QUERY_STRING" print to binary! get-env "REMOTE_ADDR" to see if it is an encoding problem. |
Pekr 11-Sep-2009 [17504x2] | I think I tried, but it printed binaries ... |
#{436F6E74656E742D74797065 #{474 #{ #{3132372E3 #{0 | |
Maxim 11-Sep-2009 [17506] | but the loading actually does a re-encoding. utf-8 is compact, buts its slow because you cannot skip unless you traverse the string char by char. which is why they are internally converted to 8 or 16 bit unicode chars... it seems strings become 16 bits a bit too often (maybe a change in later releases, where they are always converted to 16 bits for some reason). |
PeterWood 11-Sep-2009 [17507x2] | The content of the binaries are fine but their format is a probelm. Sorry, I forgot about that when I suggested to try them. |
I tested you show.cgi with Apache on OS X. It runs fine and displays the expected output GET 10.0.1.198 | |
Pekr 11-Sep-2009 [17509] | Should I test with Apache too? I don't think Cheyenne is the problem though. But I already downloaded WAMP, so I will unpack it and check over the weekend ... |
Maxim 11-Sep-2009 [17510x5] | possibly the windows version defaults to 16 bits more quickly than linux and OSX versions... :-/ |
cause IIRC linux shell doesn't expect unicode as much as window's console. | |
(as per a past reading on R3 blogs and previous discussions about this) | |
probably why people say that cgi isn't working on windows. | |
or maybe the windows console (or some versions of the OS) doesn't understand ut8 at all, just 8 or 16 bit unicode... so that could explain why the windows version is dumping to stdout in 16 bits all the time. :-( | |
PeterWood 11-Sep-2009 [17515] | As I understand it the Windows console only handles single-byte encoding (ie Windows CodePages). |
BrianH 11-Sep-2009 [17516] | Windows Unicode works in UTF-16. Linux and OSX work in UTF-8. |
PeterWood 11-Sep-2009 [17517] | Pekr: One difference when I ran the cgi was that I used the -c option not the -q option. Perhaps you could try with the -c option in case Carl has done something under the surface about character encoding. |
Pekr 11-Sep-2009 [17518] | Peter - it is the same for both options -c, and -q ... |
BrianH 11-Sep-2009 [17519] | When last I heard, CGI wasn't working on Windows yet. Thanks for the info - now I know why. |
Maxim 11-Sep-2009 [17520x2] | yep its pretty clear now :-) |
maybe a cgi-specific version of print could be added as a mezz which handles the proper encoding issues to make sure that console and cgi printing are both functional on all distros without needing to change the source. | |
BrianH 11-Sep-2009 [17522] | Maybe there's a trick that --cgi could do with I/O ports. |
Maxim 11-Sep-2009 [17523x4] | ah yess.. --cgi could just tell the core to prevent the UTF-16 encoding being done on stdout... |
but if we need to output latin-1 afterwards (while dumping the html content, for example), the output encoding should be selectable as a "current default", and all the --cgi would do is set that default to UTF-8 for example. | |
since AFAICT, the internal string! representation is encoded to whatever is needed by the host, in the 'PRINT native already. Choosing what that is manually would simplify the porting to other platforms, since the default host code would already have this flexibility covered. | |
and some systems pipe the std to have it pushed remotely to other systems... which can expect a different encoding than what is being used by the local engine... I've had this situation in my render-farm management software, as a real-life example. | |
BrianH 11-Sep-2009 [17527] | The trick is that the headers are pushed in ASCII, but the contents in whatever binary encoding the headers specify. |
Maxim 11-Sep-2009 [17528x2] | yep... which is why it should be switcheable since rebol now does the encoding for us. :-) |
some systeme like RSS even support multiple encodings in the same xml document! | |
Pekr 11-Sep-2009 [17530] | how is that Linux and OS-X don't experience any problems? They do use UTF-8, but that is not ASCII either, no? |
Maxim 11-Sep-2009 [17531x2] | UTF lower's 127 odes are the same as ASII and single byte. so if you don't use special chars, or the null char, you are basically dumping ASCII... this is the reason for its existence. |
(UTF-8) | |
Pekr 11-Sep-2009 [17533] | hmm, and why Windows uses UTF-16? Is it because of Windows console defaulting to UTF-16? |
Maxim 11-Sep-2009 [17534x3] | probably it doesn't even support UTF-8 in any way. |
IIRC the whole windows API is either ASCII or UTF-16. | |
FYI: posted a note on the september plan where I proposed a clear meaning of different releases and detail a better approach to deliveries where different versions are concurrent. Each one addressing a different aspect of the release cycle... this is how releases are handled at the larger places I have worked at ... maybe Carl could comment on this !? | |
Pekr 11-Sep-2009 [17537x4] | Apple open sourced Grand Central - I wonder if they do use good concurrency system, so we could copy the mechanism :-) http://www.osnews.com/story/22152/Apple_Releases_Grand_Central_Dispatch_as_Open_Source |
Max - your release 4 version plan sounds complicated to me. Can you see? Carl is not reacting to those things. IMO he thought he will put R3 into beta in few weeks, and as we know him, he might even do so, that is what I fear of. I agree, that unless some things are not stable enough, there is no point to go to beta. | |
On the other hand - I know Doc would like to use R3 to port Cheyenne. But concurrency model can't be rushed in imo. Why couldn't it come in 3.1? We still have lots of stuff to test. e.g. I think that Module system was not tested properly yet by developers. Rushing concurrency into R3 nowadays might also mean difficulcy in testing - modules plus concurrency ... | |
But - the situation is more diffuclt. If we want ppl to start using R3, we have to provide them at least with R2 funcitonality. But we are far from that. Noone imo gave proper thought to networking protocols. My question is - would they benefit from concurrency model? If so, we should definitely stay in alpha, and work-out to have really feature complete Core (kernel) first ... | |
Maxim 11-Sep-2009 [17541x3] | which is why I say that releasing a Beta with stuff as it is already working is a good idea. but continue working on an alpha which lets us give valuable feedback on stuff to come. The curecode bug squashing is concentrated on beta version, design & core implementation is limited to alpha/experimental versions. |
this way he can let out an experimental version which has threads and let any of us with the know-how and use cases to test extensions and concurency in parrallel, while he continues to squash beta bugs. | |
the problem with RT is the parralelism of the development. its already 100 times better than it was with R2 since we are giving Carl & friends hundreds of man hours of testing which can't perform anyways. | |
older newer | first last |