• Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

AltME groups: search

Help · search scripts · search articles · search mailing list

results summary

worldhits
r4wp40
r3wp321
total:361

results window for this page: [start: 201 end: 300]

world-name: r3wp

Group: Core ... Discuss core issues [web-public]
PeterWood:
10-Apr-2009
Not yet. It is part of some encoding utilities that I am writing 
to help resolve the character encoding issues in REBOL.org. I have 
a number of other conversion functions to wrtie. I will then publish 
them on REBOL.org
Sunanda:
30-May-2009
Peter's code detects the encoding, and can do several comversion 
between encoding types:
http://www.rebol.org/view-script.r?script=str-enc-utils.r
Graham:
8-Aug-2009
But if I do a wireshark trace, I see this

GET /20090806.7z HTTP/1.0
Accept: */*
Connection: close
User-Agent: REBOL View 2.7.6.3.1
Host: remr.s3.amazonaws.com

HTTP/1.0 403 Forbidden
Date: Sat, 08 Aug 2009 21:08:07 GMT
Content-Type: application/xml
x-amz-request-id: D03B3FA12CC875D5

x-amz-id-2: u3b7TkPzJc5NBwvov4HRQuMsCsosD7le9xfRMSGiCN2BXgeae6kKMVQAbhzqRDwY
Server: AmazonS3
Via: 1.1 nc1 (NetCache NetApp/6.0.5P1)

<?xml version="1.0" encoding="UTF-8"?>

<Error><Code>AccessDenied</Code><Message>Access Denied</Message><RequestId>D03B3FA12CC875D5</RequestId><HostId>u3b7TkPzJc5NBwvov4HRQuMsCsosD7le9xfRMSGiCN2BXgeae6kKMVQAbhzqRDwY</HostId></Error>
BrianH:
30-Jan-2010
invalid-utf?: funct [

 "Checks for proper UTF encoding and returns NONE if correct or position 
 where the error occurred."
	data [binary!]
	/utf "Check encodings other than UTF-8"
	num [integer!] "Bit size - positive for BE negative for LE"
] compose [
	ascii: (charset [#"^(00)" - #"^(7F)"])
	utf8+1: (charset [#"^(C2)" - #"^(DF)"])
	utf8+2: (charset [#"^(E0)" - #"^(EF)"])
	utf8+3: (charset [#"^(F0)" - #"^(F4)"])
	utf8rest: (charset [#"^(80)" - #"^(BF)"])
	switch/default any [num 8] [
		8 [ ; UTF-8
			unless parse/all/case data [(pos: none) any [
				pos: ascii | utf8+1 utf8rest |
				utf8+2 2 utf8rest | utf8+3 3 utf8rest
			]] [as-binary pos]
		]
		16 [ ; UTF-16BE
			pos: data
			while [not tail? pos] [
				hi: first pos
				case [
					none? lo: pick pos 2 [break/return pos]
					55296 > w: hi * 256 + lo [pos: skip pos 2]  ; #{D800}
					57343 < w [pos: skip pos 2]  ; #{DFFF}
					56319 < w [break/return pos]  ; #{DBFF}
					none? hi: pick pos 3 [break/return pos]
					none? lo: pick pos 4 [break/return pos]
					56320 > w: hi * 256 + lo [break/return pos]  ; #{DC00}
					57343 >= w [pos: skip pos 4]  ; #{DFFF}
				]
				none
			] ; none = valid, break/return pos = invalid
		]
		-16 [ ; UTF-16LE
			pos: data
			while [not tail? pos] [
				lo: first pos
				case [
					none? hi: pick pos 2 [break/return pos]
					55296 > w: hi * 256 + lo [pos: skip pos 2]  ; #{D800}
					57343 < w [pos: skip pos 2]  ; #{DFFF}
					56319 < w [break/return pos]  ; #{DBFF}
					none? lo: pick pos 3 [break/return pos]
					none? hi: pick pos 4 [break/return pos]
					56320 > w: hi * 256 + lo [break/return pos]  ; #{DC00}
					57343 >= w [pos: skip pos 4]  ; #{DFFF}
				]
				none
			] ; none = valid, break/return pos = invalid
		]
		32 [ ; UTF-32BE
			pos: data
			while [not tail? pos] [
				if any [
					4 > length? pos
					negative? c: to-integer pos
					1114111 < c  ; to-integer #{10FFFF}
				] [break/return pos]
			]
		]
		-32 [ ; UTF-32LE
			pos: data
			while [not tail? pos] [
				if any [
					4 > length? pos

     negative? c: also to-integer reverse/part pos 4 reverse/part pos 
     4
					1114111 < c  ; to-integer #{10FFFF}
				] [break/return pos]
			]
		]
	] [
		throw-error 'script 'invalid-arg num
	]
]

; Note: Native in R3, which doesn't support or screen the /utf option 
yet.

; See http://en.wikipedia.org/wiki/Unicodefor charset/value explanations.
Geomol:
25-May-2010
This can be even more complicated when talking UTF encoding. Hm, 
who knows how R3 do this...
Andreas:
14-Jul-2010
I am mainly about how to regain a decimal representation from IEEE754 
encoding.
Group: View ... discuss view related issues [web-public]
PeterWood:
30-Oct-2008
I've come across what seems to be an oddity with View on the Mac.Iit 
seems that the Rebol/View console is using UTF-8 encoding but that 
View is using MacRoman.
Group: !REBOL3-OLD1 ... [web-public]
Henrik:
27-Jan-2008
I think the plan is to release a new alpha once the unicode changes 
are done so you can get to test it. We already have a Unicode version 
internally, but it contains only a few of the required changes. Carl 
reviewed the CHECKSUM and ENCLOAK functions yesterday and mentioned 
how CHECKSUM is now binary only. It won't work on strings directly 
anymore, because encoding issues would make it work incorrectly.
BrianH:
25-Jul-2008
That bug can't be fixed without the string-to-binary encoding and 
decoding infrastructure being there. Those native functions don't 
exist yet because their design is not finalized.
BrianH:
25-Jul-2008
For that matter, I recall that there was some talk of changing the 
SAVE and LOAD functions completely. It is an unresolved design issue, 
unless Carl's current work includes string encoding and decoding 
as well.
BrianH:
25-Jul-2008
So to answer Louis' question: Not yet, as far as we know. The data 
structures for Unicode strings are there, as are UTF-8 word! values, 
but binary encoding and decoding is not yet there, and there are 
some limts to Unicode input and output (mostly due to the Windows 
console). The encoding/decoding work seems likely to get done as 
a part of Carl's GUI work, as that will probably include text display. 
The console IO limits are likely to remain until the written-in-REBOL 
GUI console is adopted.
PeterWood:
28-Oct-2008
So does this mean that the graphics library is still treating a string 
as being 8-bit encoded?  No doubt according to the current Windows 
codepage?


does READ-STRING convert  utf-8 to whatever 8-bit encoding the graphics 
library is using?
BrianH:
28-Oct-2008
As far as your code is concerned, a string! will be a series of Unicode 
codepoints. Internally, who cares? The implementation of string! 
is likely to be the same as the native implementation on the platform 
is running on, or whatever is more efficient. I think that string! 
is now UTF-16 on Windows, and the symbols behind word! values are 
internally UTF-8.


Still, it doesn't matter what strings are internally because AS-STRING 
and AS-BINARY are gone. All string-to-binary conversions will need 
encoding. REBOL scripts are going to be UTF-8 encoded though, as 
I recall.
BrianH:
28-Oct-2008
READ-STRING is a temporary function because it is intended to replace 
it with a full encoding and decoding infrastructure supporting multiple 
formats and encodings. Until then, we have READ-STRING and WRITE-STRING.
Henrik:
29-Oct-2008
h264 realtime encoding is CPU intensive :-)
BrianH:
31-Oct-2008
Gabriele, cool, I was just concerned about speed. I suppose calls 
to external APIs are likely to be less frequent than internal manipulations, 
and UCS encoding would make the internal code faster. Either way 
I'm sure that it will be handled :)
PeterWood:
1-Jan-2009
Not trusting non-ascii characters infers  that the current desing 
of  RebDev is "ignorant"of character encoding. If that is the case, 
it is a shame as RebDev could have been a great example of an "up-to-date" 
application built with R3.
PeterWood:
1-Jan-2009
Even if the server is running on R2, all the strings could be stored 
with a consistent encoding method, such as ISO-8859-1. Of course, 
there'd be a lot of work detecting the client encoding method and 
converting all input strings to the chosen consistent method. Most 
of this work would be needed even if the server supported Unicode 
strings.
PeterWood:
1-Jan-2009
Personally, I think ignoring character encoding does say something 
about the design of RebDev.
BrianH:
2-Jan-2009
That would have to be the case with R2 clients, as the client is 
the part that handles character encoding. However, there are no R2 
clients yet. The messages appear to be UTF-8 encoded end-to-end, 
stored in binary on the server, which is encoding agnostic. Once 
we have R2 clients, they will have to handle the codepage-to-UTF-8 
encoding, or just stick to ASCII.
BrianH:
2-Jan-2009
And yes, it does say something about the design of RebDev, that character 
encoding issues of R2 won't affect it, by design.
Reichart:
2-Jan-2009
This is one of those things where a picture is worth a thousand words. 
 We need a diagram of the hardware and software set up, and show 
WHERE encoding becomes a problem.

For example, if you paste some text from a Word doc into a webbrowser, 
 this then gets moved to the server.  Then it gets rendered out again...you 
wil run into problems with encoding.

Word use some SPECIAL encodoing for things like " : - and '
Reichart:
3-Jan-2009
Gab, not an issue of "fault", I'm simply modeling examples of problems 
I see on dozens of websites, due to encoding "issues".  Don't care 
where the fault is, just that we need better black box tools fro 
dealing with it.
PeterWood:
3-Jan-2009
Reichart: From my point if view, the root of the problem is not so 
much that Word replaces key certain key sequences with other characters 
but on eof character encoding. The text will look okay on your machine 
but unless it is correctly converted may display incorrectly on other 
machines.


As I understand, Rebol/View uses the users default "codepage" on 
Windows and  MacRoman encoding on Mac. AltME doesn't take into account 
the different the different text encodings so when I type £ (a British 
pound sign) you will probably see some thing different.
 .
Sunanda:
3-Jan-2009
REBOL.org shows a ? because if blindly emits all Alte pages as charset=utf-8.

If (this works in Firefox....You change your default for the page 
-- view/character encoding / western iso-8859-1) then:
-- Peter's post shows a GBP [for his char 163]
-- Chris' post shows a 1/2 [for his char 189]
Reichart:
3-Jan-2009
Peter....I'm confused....

Word, nor REBOL have anything to do with the problem....


Encoding problems happen on hundreds of websites (big, popular website), 
that do not use REBOL, and where Word is not the source.


I'll state again... we need strong clear black box logic that unifies 
all character maps (yeah, all).

WE need a single unified character system.
PeterWood:
4-Jan-2009
Reichart ...you are right thep problem is one of encoding. My point 
is that because Rebol/View uses different encoding systems on different 
platforms it  is left to the application to either ignore the encoding 
differences or handle them.


This may be quite difficult if, as Chris indicated, it is not possible 
to determine which Windows Codepage is in use from Rebol/View. 


Tthere is a single unified character system (Unicode ) but there 
are at least five different ways of representing it (UTF-8, UTF-16LE, 
UTF-16BE, UTF-32LE & UTF-32BE). Standardisation is a long way off.
BrianH:
21-Jan-2009
However, RIF was intended to store its data in Rebin format (binary 
encoding of REBOL values).
BrianH:
21-Jan-2009
I meant the function SCRIPT? not your text encoding :)
BrianH:
10-Apr-2009
Codecs are like port schemes, but for encoding and decoding. Different 
thing.
Geomol:
16-Apr-2009
I get second thought about auto encoding. The reason is, if url! 
have auto encoding of some characters, then it would be expected, 
that e.g. file! auto encode too. How do you specify a file named 
% on disk? In R3, you write: %%25

If the % char should be auto encoded, then you should write that 
filename as: %%
But what if your file is named %25 on disk?
It's a bit confusing, but what is the best way? Encoding or not?
Geomol:
16-Apr-2009
Acutally file! does have auto encoding of space. You can specify 
a filename like this:
%"a b"
which will give %a b
So maybe auto encoding is a good thing in general?
Oldes:
16-Apr-2009
Actually the auto encoding was cousing me some problems some time 
ago. I'm not sure if it was fixed. Also with the auto encoding urls 
there is a problem, that for example the second @ char in Pekr's 
url must not be encoded.
Gabriele:
17-Apr-2009
Geomol: PLEASE NO!!!! The bug that REBOL has is exactly THAT. I beg 
you guys, please NO! Encoding is there for a reason. If it could 
be done automatically, there would be no need for encoding!
Geomol:
17-Apr-2009
Gabriele, so you mean, auto encoding shold be avoided? Should auto 
encoding be removed from these examples:

>> %"a b"
== %a b
>> a%[b-:-c]
== [a%25b-:-c]
>> a<>[b-:-c]
== [a%3C%3Eb-:-c]


My view is, that there is a lot of auto encoding already. If auto 
encoding should be there, it should be done right in all cases. Else 
it should be avoided alltogether. This situation with - some auto 
encoding in some cases but not all - is not good.
Geomol:
17-Apr-2009
I guess, auto encoding is user-friendly, if it can be done right 
in all cases. With auto encoding, you don't have to remember all 
the strange encoding rules for different datatypes (especially url 
and email).


No auto encoding is technical-programmer-friendly. It's for the programmer, 
who knows all the strange rules and want complete control.


It goes beyond url and email. How should a space be represented in 
an issue! datatype? Like:

>> to-issue "a b"
== == #a?b

Today you just see a question sign, but it's a space in there.
Oldes:
17-Apr-2009
Geomol, yes. I would like to avoid auto encoding. It's exactly the 
case where I had the problems. If I write file as %"a b" and it's 
valid file, I prefere to have it samewhen I for example print it
Oldes:
17-Apr-2009
Instead of auto encoding I would like to see such a basic functions 
like official url-encode presented in Rebol. (Of course we have our 
own - another %user.r usage)
Geomol:
17-Apr-2009
I understand the concern against auto encoding. But without it, and 
with all the datatypes, we have in REBOL, good documentation about 
what encoding, we have to use for every datatype, is required.
BrianH:
17-Apr-2009
I don't mind the ? issue! display in this case, but I'd like MOLD/all 
issue! to return a serialized encoding like: #[issue! "a b"]
BrianH:
17-Apr-2009
In general I prefer simple encoding syntax rules over autoencoding, 
because it is easier to remember explicit rules than it is to remember 
magic dwim patterns.
BrianH:
17-Apr-2009
Gabriele, RFC compliance of url encoding is important and will be 
fixed in upcoming R3 releases, even if I have to fix it myself. R2 
as well if I end up being the R2 release manager (it's possible).
Geomol:
18-Apr-2009
Are you having a bad week?


I'm in doubt about auto encoding, whether it's a good idea or not. 
And I talk in general, not just one datatype. In R3, you can use 
% in an email:

>> a%[b-:-c]
== [a%25b-:-c]


I first thought, it was an error. After some talk here, I realized, 
it's auto encoding of the % character. In R2, you have to write the 
encoding yourself:

>> [a%25b-:-c]
== a%[b-:-c]


So it's the other way around between R2 and R3. Clearly Carl try 
to make REBOL smart. Make it figure out, what the programmer mean. 
In general with computers, I tend to dislike the systems, that try 
to be smart, if they don't get it 100% correct in every situation 
(Windows), and I like the systems, that does not try to be smart 
but put the user in charge (Amiga).


So at this point, I think, auto encoding should be avoided. And avoid 
it in all datatypes, not just url. I may change my mind, if auto 
encoding can be done 100% correct in all datatypes. For url, it would 
mean e.g. this:

>> to url! "ftp://[me-:-inter-:-net]:[pass-:-server-:-net]"
== ftp://me%40inter.net:[pass-:-server-:-net]


So my question is, can auto encoding be done 100% correct for all 
datatypes? If not, avoid it. If auto encoding should be there in 
some cases but not all, I would like to hear the arguments for that.
Gabriele:
19-Apr-2009
I'm talking about "escaping", while you use the term "encoding" ambiguously 
to mean both encoding and escaping. THEY ARE TWO DIFFERENT THINGS.
Gabriele:
21-Apr-2009
Now, if your array was representing a url, you could encode it to 
UTF-8 using the % encoding as well to stay in the ascii subset. This 
is encoding, but still, it will not solve your @ problem. each @ 
in the array of integers will become an @ (which is an ascii char) 
in the final string.
Gabriele:
21-Apr-2009
it is in your *source array* (re: shouting, i just want to give emphasis 
but we don't have rich text, and the * thing does not work very well 
for long text) that you must distinguish between @ (the field separator) 
and % 4 0 (an escaped @, part of the url field text). There is no 
encoding process that can *automatically* go from your array of integers 
to the correct url string.
Geomol:
21-Apr-2009
Maybe we got unicode encoding end escape encoding confused.


As I see it, given correct rules, auto converting of user input to 
correct url can be achieved. I made this function to illustrate, 
what I mean (it's not optimized, but should be easy to read):

encode-url: func [input /local url components host] [
	components: parse input "@"
	host: back tail components

	url: clear ""
	append url components/1
	components: next components

	forall components [
		either components = host [
			append url "@"
			append url components/1
		][
			append url "%40"
			append url components/1
		]
	]
	url
]


I can use it both with and without specifying %40 for the first @ 
in the url:

>> encode-url "ftp://[name-:-home-:-net]:[pass-:-server-:-net]"
== "ftp://name%40home.net:[pass-:-server-:-net]"
>> encode-url "ftp://name%40home.net:[pass-:-server-:-net]"
== "ftp://name%40home.net:[pass-:-server-:-net]"


It will give correct result in both cases (I use strings, but of 
course it should be url! datatype in REBOL). Now comes unicode. Given 
precise rules, how that should happen, I see no problem with encoding 
this in e.g. UTF-8.


So I think, it's possible to do this correctly. But maybe it's better 
to keep it simple and not do such auto convertions. In any case, 
the behaviour needs to be well documented, so users can figure out, 
how to create a valid url. I had same problem as Pekr years ago, 
and I missed documentation of that.
Geomol:
21-Apr-2009
unicode encoding *and* escape encoding
sqlab:
21-Apr-2009
I think it is good to have a flexible encoding method, but it should 
not be invoked automatically.
BrianH:
27-Apr-2009
ReBin - binary encoding for REBOL values. Carl is working on it now 
- as the new host interfaces require it. We will have it very soon.
BrianH:
7-Jul-2009
Having a margin of error is standard operating procedure for IEEE754 
floating point numbers, because anything over 15 digits are subject 
to rounding errors inherent in the encoding.
BrianH:
7-Jul-2009
For instance:
>> 0.3 < (0.1 + 0.1 + 0.1)
== false
>> 0.3 <= (0.1 + 0.1 + 0.1)
== true


Those values differ in the greater-than-15-digits range due to encoding 
errors.
BrianH:
7-Jul-2009
Those encoding errors are inherent in the IEEE754 format. The standard 
way to work around this is to not consider differences in the past-15-digits 
range. This is the case for all sorts of systems.
BrianH:
9-Jul-2009
It's about finding UTF-8 encoding errors, particularly the overlong 
forms that are used for security breaches. We can't do that check 
in TO-STRING because of the overhead (+50%), but it can still be 
a good idea to check in some cases, and the code is better written 
in C than REBOL.
Maxim:
11-Sep-2009
optionally encoding them in ascii first... http headers are ascii.
Maxim:
11-Sep-2009
it being so old, its possible the decault encoding was still askin 
at that point.
Pekr:
11-Sep-2009
REBOL 3.0 accepts UTF-8 encoded scripts, and because UTF-8 is a superset 
of ASCII, that standard is also accepted.

If you are not familiar 
with the UTF-8 Unicode standard, it is an 8 bit encoding that accepts 
ASCII directly (no special encoding is needed), but allows the full 
Unicode character set by encoding them with characters that have 
values 128 or greater.
Maxim:
11-Sep-2009
maybe peter's excellent encoding script on rebol.org could be used 
as a basis for converting between ascii -> utf8  when using R3 binary 
 as an input.  while R3 has them built-in
PeterWood:
11-Sep-2009
Pekr: Just try a quick test with: 
 print to binary! "Content-type: text/html^/"
 print to binary! get-env "REQUEST_METHOD"
 print to binary! get-env "QUERY_STRING"
 print to binary! get-env "REMOTE_ADDR"

to see if it is an encoding problem.
Maxim:
11-Sep-2009
but the loading actually does a re-encoding.  utf-8 is compact, buts 
its slow because you cannot skip unless you traverse the string char 
by char.  which is why they are internally converted to 8 or 16 bit 
unicode chars... it seems strings become 16 bits a bit too often 
(maybe a change in later releases, where they are always converted 
to 16 bits for some reason).
PeterWood:
11-Sep-2009
As I understand it the Windows console only handles single-byte encoding 
(ie Windows CodePages).
PeterWood:
11-Sep-2009
Pekr: One difference when I ran the cgi was that I used the -c option 
not the -q option. Perhaps you could try with the -c option in case 
Carl has done something under the surface about character encoding.
Maxim:
11-Sep-2009
maybe a cgi-specific version of print could be added as a mezz which 
handles the proper encoding issues to make sure that console and 
cgi printing are both functional on all distros without needing to 
change the source.
Maxim:
11-Sep-2009
ah yess.. --cgi could just tell the core to prevent the UTF-16 encoding 
being done on stdout...
Maxim:
11-Sep-2009
but if we need to output latin-1 afterwards (while dumping the html 
content, for example), the output encoding  should be selectable 
as a "current default", and all the --cgi would do is set that default 
to UTF-8 for example.
Maxim:
11-Sep-2009
and some systems pipe the std to have it pushed remotely to other 
systems... which can expect a different encoding than what is being 
used by the local engine... I've had this situation in my render-farm 
management software, as a real-life example.
BrianH:
11-Sep-2009
The trick is that the headers are pushed in ASCII, but the contents 
in whatever binary encoding the headers specify.
Maxim:
11-Sep-2009
yep... which is why it should be switcheable since rebol now does 
the encoding for us.  :-)
Maxim:
23-Sep-2009
you must realize that the format  of a document (encoding of the 
layout) isn't directly  tied to its content.
BrianH:
8-Oct-2009
Any encoding is none of the business of the CGI channel - it is a 
matter between the script and the cliennt.
Maxim:
30-Oct-2009
I also think the "default" user text format should be configurable. 
  I have absolutely no desire to start using utf-8 for my code and 
data, especially when I have a lot of stuff that already is in iso 
latin-1 encoding.
Maxim:
30-Oct-2009
but for data, I would like to have default encoding of my choice.
PeterWood:
30-Oct-2009
Loading programs are not totally immune from encoding problems. An 
unlikely but possible example:

if name = "Ashley TrŸter" [print "Hello Ashley"]
Maxim:
30-Oct-2009
handling encoding is complex in any environment... I had a lot of 
"fun" handling encodings in php, which uses such a unicode datatype... 
its not really easier... cause you can't know by the text if its 
unicode or ascii or binary values unless you tell it to load a sequence 
of bytes AS one or the other.
Maxim:
30-Oct-2009
cause there is just ONE encoding.
Maxim:
30-Oct-2009
but having some kind of default for read/write could be usefull, 
instead of having to add a refinement all the time, and force a script 
to expect a specific encoding.
Maxim:
30-Oct-2009
then it would be easier to change it one place, do all I/O without 
the refinement.  and less work for another to change encoding for 
the whole app and having to put conditionals everytime we use read/write.
Maxim:
30-Oct-2009
I put a suggestion on the blog about allowing user-creating encoding 
maps... otherwise, you can load it as binary in R3 and just convert 
the czech chars to utf-8 multi-byte sequences and convert the binary 
to string using decode.
Maxim:
30-Oct-2009
is the czech encoding the standard windows ansi  encoding?
Maxim:
30-Oct-2009
R3 will interpret litteral strings and decode them using utf-8 (or 
the header encoding, if its supported) so in this case no.


but if the data is stored within binaries (equivalent to R2 which 
doesn't handle encoding) then, yes, since the binary represents the 
sequence of bytes not chars.


if you use a utf-8 editor, and type characters above 127 and look 
at them in  notepad, you will then see the UTF-8 byte sequences (which 
will look like garbled text, obviously).
Maxim:
30-Oct-2009
I don't know if R3 has a way of specifying the encoding litterally... 
like  UTF8{}  UTF16{}  or WIN1252{} ... this would be nice.
Gabriele:
31-Oct-2009
Petr: notepad, as most windows stuff, uses utf-16. much easier to 
detect though, and R3 could do that (actually, didn't Carl just add 
that recently?) most "real" editors allow you to use whatever encoding 
you want, and definitely support utf-8.
Gabriele:
1-Nov-2009
Max, maybe i was not clear. If your rebol scripts are latin1 by default, 
while my rebol scripts are utf-8 by default, when i send you a rebol 
script IT WILL NOT WORK in the same way in your machine. the *script*'s 
encoding *must* be a standard everyone agrees on. then, the script 
can do whatever it wants with the data, it's your fault if you make 
it so data cannot be exchanged easily among systems.
Maxim:
1-Nov-2009
although having an encoding parameter in the header would allow us 
to tell the interpreter in what format the text is without breaking 
anything.
Maxim:
1-Nov-2009
actually, it is a problem in R2.  if you store your code, and I open 
it with a different codepage version of windows... some letters will 
be skewed. 


In an application I wrote, I couldn't write out proper strings for 
the netherlands, as an example.


unicode is slowly becoming the standard for text... especially utf-8. 
 but yes, users have to be educated.  


within your apps, though, you can handle the encoding as you want... 
only the rebol sources have to be UTF-8 .  as R3 matures, more encodings 
will be most probably be included in string codecs to support 8 bit 
Extended ascii from different areas of the world.


and even high-profile applications like Apple's iweb have issues 
with text encoding... so this is a problem for the whole industry 
& users to adapt to.
BrianH:
1-Nov-2009
Even if we had a text encoding header for R3, it would be a *bad* 
idea to ever use encodings other than UTF-8. So don't.
BrianH:
9-Nov-2009
From your code, it looks like this is the problem:
>> round/floor 3.3 / 1.1
== 2.0
>> 3.3 / 1.1
== 3.0


1.1 and 3.3 aren't exactly representable in IEEE754 encoding, so 
the 3.0 value you see is actually a little less than 3.0.
Geomol:
19-Nov-2009
See e.g. http://en.wikipedia.org/wiki/Percent-encoding#Percent-encoding_reserved_characters
Chris:
19-Nov-2009
Just percent encoding, of the percent symbol.
BrianH:
19-Nov-2009
Chris, url::%23 and url::# should not be the same. The purpose of 
percent encoding is to allow you to specify character values without 
them being treated as syntax. If you specify a # directly in an http 
url, for instance, it should be taken as the start of the anchor 
portion of the url. If you percent encode it, it shouldn't be an 
anchor.
BrianH:
20-Nov-2009
The main gotcha so far to the keep-encoded approach is whether INSERT 
and APPEND should do some magic percent encoding or not. It seems 
that it may be a better approach to just assume that the programmer 
knows what they are doing and just insert what they say to insert 
as is, as long as the url character set restrictions are met. This 
would mean that the programmer would need to handle their own percent 
encoding where needed, and that INSERT or APPEND would not do any 
encoding or decoding. Or perhaps some non-syntax characters, such 
as space, could be encoded by MOLD instead of rejected and DECODE-URL 
just adjusted to not freak out when it seees them. What do you think?
Maxim:
20-Nov-2009
I vote for NO automatic encoding.
Chris:
20-Nov-2009
I think I'd look for at least the following behaviour:

	>> url::%23#
	== url::%23#
	>> join url:: "%23#"
	== url::%23#

 >> join url:: " " ; space is not in the uri spec, so could arguably 
 be converted
	== url:: 
	>> read url::%23# ; dependent on the scheme, I guess
	== "GET %23"


The problem with magic percent encoding is with the special characters. 
 As it is now, it is impossible (so far as I can ascertain) to build 
an http url that encodes special characters eg "#=&%" - Twitter being 
a great case where an encoded # is integral to the service.  Given 
though that the list of special characters is short and well defined, 
perhaps they could be the exception to a magic encoding rule.
BrianH:
21-Nov-2009
The standard TELLS you when

 - No it doesn't. The standard doesn't cover R3 internals, not even 
 in a generic non-language-specific way. The "when" I was talking 
 about has nothing to do with the encoding itself - it has to do with 
 internal data formats.
BrianH:
16-Dec-2009
I mean 32-bit integer! type, not 32-bit binary encoding converting 
to the current 64-bit integer! type.
Pekr:
11-Jan-2010
hmm, but /as was proposed to specify just type of encoding IIRC, 
not some other functionality ... some of us wanted /as being more 
general, allowing you to specify a codec to decode. Codecs are so 
far inefficient (not streamed), because you have to read all data 
firts, then pass it to encode/decode.

Carl never posted a resolution to read/write case ....
Group: !Cheyenne ... Discussions about the Cheyenne Web Server [web-public]
Graham:
19-Aug-2009
this is the request


GET /md/creategoogledoc.rsp?gdoc=simple-letter.rtf&patientid=2832&encounter=none 
HTTP/1.1
Host: gchiu.no-ip.biz:8000

User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.1.2) 
Gecko/20090729 Firefox/3.5.2 (.NET CLR 3.5.30729)

Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Connection: keep-alive
Referer: http://gchiu.no-ip.biz:8000/md/Listgoogledocs.rsp
Cookie: RSPSID=QZPTPCZIWWMMYBKWHWRQETGM
Will:
19-Aug-2009
answer from the redirection:
HTTP/1.1 302 Moved Temporarily
Content-Type: text/html; charset=UTF-8
Cache-Control: no-cache, no-store, max-age=0, must-revalidate
Pragma: no-cache
Expires: Fri, 01 Jan 1990 00:00:00 GMT
Date: Wed, 19 Aug 2009 21:43:58 GMT

Set-Cookie: WRITELY_UID=001dfpwvx2b|928b9de9e7bf56448b665282fc69988b; 
Path=/; HttpOnly

Set-Cookie: GDS_PREF=hl=en;Expires=Sat, 17-Aug-2019 21:43:58 GMT;HttpOnly

Set-Cookie: SID=DQAAAHcAAAB0kldc4zZSC_0FoiL6efkWE11k9SQkAIn-N3WfAzIOVe1cM-remnLUtV3Z4M-BFRf5eknz7hr_U3YzW94nECo0-aDnpxrLGiBglWGN4VkfLr5Hh7t2XNyRCA3VWd005SfCmZ9D8-1MUltjRI8X56VLde5Wy8HD92gh-8YkJBJxQA;Domain=.google.com;Path=/;Expires=Sat, 
17-Aug-2019 21:43:58 GMT

Location: https://www.google.com/accounts/ServiceLogin?service=writely&passive=true&nui=1&continue=http%3A%2F%2Fdocs.google.com%2FDoc%3Fdocid%3D0AcdrOHdpKfrWZGZwd3Z4MmJfMnNxcDJkNmZu%26amp%3Bhl%3Den&followup=http%3A%2F%2Fdocs.google.com%2FDoc%3Fdocid%3D0AcdrOHdpKfrWZGZwd3Z4MmJfMnNxcDJkNmZu%26amp%3Bhl%3Den&ltmpl=homepage&rm=false
Content-Encoding: gzip
X-Content-Type-Options: nosniff
Content-Length: 325
Server: GFE/2.0
Will:
19-Aug-2009
more redirection:
HTTP/1.1 302 Moved Temporarily

Set-Cookie: WRITELY_SID=DQAAAHoAAADh80lBIw7e5Hg06TLEBgCY33XQGJ1aUH5OrCF_ir1xLwffKNaCqNdUL6qYfvgjNppDBI4lTNBSTjJWMG_Ze0_qJnveBCAtihBDFwBlOb-H7RlkfgJwM7pBbyKV7bm4M3mqUivD1emtpxgl32vG8CEP1poQ2479HQXrlobsp7Egzw;Domain=docs.google.com;Path=/;Expires=Thu, 
03-Sep-2009 21:43:59 GMT

Location: http://docs.google.com/Doc?docid=0AcdrOHdpKfrWZGZwd3Z4MmJfMnNxcDJkNmZu&amp%3Bhl=en&pli=1
Content-Type: text/html; charset=UTF-8
Content-Encoding: gzip
Date: Wed, 19 Aug 2009 21:43:59 GMT
Expires: Wed, 19 Aug 2009 21:43:59 GMT
Cache-Control: private, max-age=0
X-Content-Type-Options: nosniff
Content-Length: 232
Server: GFE/2.0
Will:
19-Aug-2009
and the the target page:
HTTP/1.1 200 OK

Set-Cookie: WRITELY_SID=DQAAAHoAAADh80lBIw7e5Hg06TLEBgCY33XQGJ1aUH5OrCF_ir1xLwffKNaCqNdUL6qYfvgjNppDBI4lTNBSTjJWMG_Ze0_qJnveBCAtihBDFwBlOb-H7RlkfgJwM7pBbyKV7bm4M3mqUivD1emtpxgl32vG8CEP1poQ2479HQXrlobsp7Egzw;Domain=docs.google.com;Path=/;Expires=Thu, 
03-Sep-2009 21:43:59 GMT

Set-Cookie: GDS_PREF=hl=en;Expires=Sat, 17-Aug-2019 21:43:59 GMT;HttpOnly

Set-Cookie: user=; Expires=Tue, 18-Aug-2009 21:43:59 GMT; Path=/; 
HttpOnly

Set-Cookie: login=; Expires=Tue, 18-Aug-2009 21:43:59 GMT; Path=/; 
HttpOnly
Content-Type: text/html; charset=UTF-8
Cache-Control: no-cache, no-store, max-age=0, must-revalidate
Pragma: no-cache
Expires: Fri, 01 Jan 1990 00:00:00 GMT
Date: Wed, 19 Aug 2009 21:43:59 GMT
Content-Encoding: gzip
Transfer-Encoding: chunked
X-Content-Type-Options: nosniff
Server: GFE/2.0
201 / 36112[3] 4