AltME groups: search
Help · search scripts · search articles · search mailing listresults summary
world | hits |
r4wp | 40 |
r3wp | 321 |
total: | 361 |
results window for this page: [start: 201 end: 300]
world-name: r3wp
Group: Core ... Discuss core issues [web-public] | ||
PeterWood: 10-Apr-2009 | Not yet. It is part of some encoding utilities that I am writing to help resolve the character encoding issues in REBOL.org. I have a number of other conversion functions to wrtie. I will then publish them on REBOL.org | |
Sunanda: 30-May-2009 | Peter's code detects the encoding, and can do several comversion between encoding types: http://www.rebol.org/view-script.r?script=str-enc-utils.r | |
Graham: 8-Aug-2009 | But if I do a wireshark trace, I see this GET /20090806.7z HTTP/1.0 Accept: */* Connection: close User-Agent: REBOL View 2.7.6.3.1 Host: remr.s3.amazonaws.com HTTP/1.0 403 Forbidden Date: Sat, 08 Aug 2009 21:08:07 GMT Content-Type: application/xml x-amz-request-id: D03B3FA12CC875D5 x-amz-id-2: u3b7TkPzJc5NBwvov4HRQuMsCsosD7le9xfRMSGiCN2BXgeae6kKMVQAbhzqRDwY Server: AmazonS3 Via: 1.1 nc1 (NetCache NetApp/6.0.5P1) <?xml version="1.0" encoding="UTF-8"?> <Error><Code>AccessDenied</Code><Message>Access Denied</Message><RequestId>D03B3FA12CC875D5</RequestId><HostId>u3b7TkPzJc5NBwvov4HRQuMsCsosD7le9xfRMSGiCN2BXgeae6kKMVQAbhzqRDwY</HostId></Error> | |
BrianH: 30-Jan-2010 | invalid-utf?: funct [ "Checks for proper UTF encoding and returns NONE if correct or position where the error occurred." data [binary!] /utf "Check encodings other than UTF-8" num [integer!] "Bit size - positive for BE negative for LE" ] compose [ ascii: (charset [#"^(00)" - #"^(7F)"]) utf8+1: (charset [#"^(C2)" - #"^(DF)"]) utf8+2: (charset [#"^(E0)" - #"^(EF)"]) utf8+3: (charset [#"^(F0)" - #"^(F4)"]) utf8rest: (charset [#"^(80)" - #"^(BF)"]) switch/default any [num 8] [ 8 [ ; UTF-8 unless parse/all/case data [(pos: none) any [ pos: ascii | utf8+1 utf8rest | utf8+2 2 utf8rest | utf8+3 3 utf8rest ]] [as-binary pos] ] 16 [ ; UTF-16BE pos: data while [not tail? pos] [ hi: first pos case [ none? lo: pick pos 2 [break/return pos] 55296 > w: hi * 256 + lo [pos: skip pos 2] ; #{D800} 57343 < w [pos: skip pos 2] ; #{DFFF} 56319 < w [break/return pos] ; #{DBFF} none? hi: pick pos 3 [break/return pos] none? lo: pick pos 4 [break/return pos] 56320 > w: hi * 256 + lo [break/return pos] ; #{DC00} 57343 >= w [pos: skip pos 4] ; #{DFFF} ] none ] ; none = valid, break/return pos = invalid ] -16 [ ; UTF-16LE pos: data while [not tail? pos] [ lo: first pos case [ none? hi: pick pos 2 [break/return pos] 55296 > w: hi * 256 + lo [pos: skip pos 2] ; #{D800} 57343 < w [pos: skip pos 2] ; #{DFFF} 56319 < w [break/return pos] ; #{DBFF} none? lo: pick pos 3 [break/return pos] none? hi: pick pos 4 [break/return pos] 56320 > w: hi * 256 + lo [break/return pos] ; #{DC00} 57343 >= w [pos: skip pos 4] ; #{DFFF} ] none ] ; none = valid, break/return pos = invalid ] 32 [ ; UTF-32BE pos: data while [not tail? pos] [ if any [ 4 > length? pos negative? c: to-integer pos 1114111 < c ; to-integer #{10FFFF} ] [break/return pos] ] ] -32 [ ; UTF-32LE pos: data while [not tail? pos] [ if any [ 4 > length? pos negative? c: also to-integer reverse/part pos 4 reverse/part pos 4 1114111 < c ; to-integer #{10FFFF} ] [break/return pos] ] ] ] [ throw-error 'script 'invalid-arg num ] ] ; Note: Native in R3, which doesn't support or screen the /utf option yet. ; See http://en.wikipedia.org/wiki/Unicodefor charset/value explanations. | |
Geomol: 25-May-2010 | This can be even more complicated when talking UTF encoding. Hm, who knows how R3 do this... | |
Andreas: 14-Jul-2010 | I am mainly about how to regain a decimal representation from IEEE754 encoding. | |
Group: View ... discuss view related issues [web-public] | ||
PeterWood: 30-Oct-2008 | I've come across what seems to be an oddity with View on the Mac.Iit seems that the Rebol/View console is using UTF-8 encoding but that View is using MacRoman. | |
Group: !REBOL3-OLD1 ... [web-public] | ||
Henrik: 27-Jan-2008 | I think the plan is to release a new alpha once the unicode changes are done so you can get to test it. We already have a Unicode version internally, but it contains only a few of the required changes. Carl reviewed the CHECKSUM and ENCLOAK functions yesterday and mentioned how CHECKSUM is now binary only. It won't work on strings directly anymore, because encoding issues would make it work incorrectly. | |
BrianH: 25-Jul-2008 | That bug can't be fixed without the string-to-binary encoding and decoding infrastructure being there. Those native functions don't exist yet because their design is not finalized. | |
BrianH: 25-Jul-2008 | For that matter, I recall that there was some talk of changing the SAVE and LOAD functions completely. It is an unresolved design issue, unless Carl's current work includes string encoding and decoding as well. | |
BrianH: 25-Jul-2008 | So to answer Louis' question: Not yet, as far as we know. The data structures for Unicode strings are there, as are UTF-8 word! values, but binary encoding and decoding is not yet there, and there are some limts to Unicode input and output (mostly due to the Windows console). The encoding/decoding work seems likely to get done as a part of Carl's GUI work, as that will probably include text display. The console IO limits are likely to remain until the written-in-REBOL GUI console is adopted. | |
PeterWood: 28-Oct-2008 | So does this mean that the graphics library is still treating a string as being 8-bit encoded? No doubt according to the current Windows codepage? does READ-STRING convert utf-8 to whatever 8-bit encoding the graphics library is using? | |
BrianH: 28-Oct-2008 | As far as your code is concerned, a string! will be a series of Unicode codepoints. Internally, who cares? The implementation of string! is likely to be the same as the native implementation on the platform is running on, or whatever is more efficient. I think that string! is now UTF-16 on Windows, and the symbols behind word! values are internally UTF-8. Still, it doesn't matter what strings are internally because AS-STRING and AS-BINARY are gone. All string-to-binary conversions will need encoding. REBOL scripts are going to be UTF-8 encoded though, as I recall. | |
BrianH: 28-Oct-2008 | READ-STRING is a temporary function because it is intended to replace it with a full encoding and decoding infrastructure supporting multiple formats and encodings. Until then, we have READ-STRING and WRITE-STRING. | |
Henrik: 29-Oct-2008 | h264 realtime encoding is CPU intensive :-) | |
BrianH: 31-Oct-2008 | Gabriele, cool, I was just concerned about speed. I suppose calls to external APIs are likely to be less frequent than internal manipulations, and UCS encoding would make the internal code faster. Either way I'm sure that it will be handled :) | |
PeterWood: 1-Jan-2009 | Not trusting non-ascii characters infers that the current desing of RebDev is "ignorant"of character encoding. If that is the case, it is a shame as RebDev could have been a great example of an "up-to-date" application built with R3. | |
PeterWood: 1-Jan-2009 | Even if the server is running on R2, all the strings could be stored with a consistent encoding method, such as ISO-8859-1. Of course, there'd be a lot of work detecting the client encoding method and converting all input strings to the chosen consistent method. Most of this work would be needed even if the server supported Unicode strings. | |
PeterWood: 1-Jan-2009 | Personally, I think ignoring character encoding does say something about the design of RebDev. | |
BrianH: 2-Jan-2009 | That would have to be the case with R2 clients, as the client is the part that handles character encoding. However, there are no R2 clients yet. The messages appear to be UTF-8 encoded end-to-end, stored in binary on the server, which is encoding agnostic. Once we have R2 clients, they will have to handle the codepage-to-UTF-8 encoding, or just stick to ASCII. | |
BrianH: 2-Jan-2009 | And yes, it does say something about the design of RebDev, that character encoding issues of R2 won't affect it, by design. | |
Reichart: 2-Jan-2009 | This is one of those things where a picture is worth a thousand words. We need a diagram of the hardware and software set up, and show WHERE encoding becomes a problem. For example, if you paste some text from a Word doc into a webbrowser, this then gets moved to the server. Then it gets rendered out again...you wil run into problems with encoding. Word use some SPECIAL encodoing for things like " : - and ' | |
Reichart: 3-Jan-2009 | Gab, not an issue of "fault", I'm simply modeling examples of problems I see on dozens of websites, due to encoding "issues". Don't care where the fault is, just that we need better black box tools fro dealing with it. | |
PeterWood: 3-Jan-2009 | Reichart: From my point if view, the root of the problem is not so much that Word replaces key certain key sequences with other characters but on eof character encoding. The text will look okay on your machine but unless it is correctly converted may display incorrectly on other machines. As I understand, Rebol/View uses the users default "codepage" on Windows and MacRoman encoding on Mac. AltME doesn't take into account the different the different text encodings so when I type £ (a British pound sign) you will probably see some thing different. . | |
Sunanda: 3-Jan-2009 | REBOL.org shows a ? because if blindly emits all Alte pages as charset=utf-8. If (this works in Firefox....You change your default for the page -- view/character encoding / western iso-8859-1) then: -- Peter's post shows a GBP [for his char 163] -- Chris' post shows a 1/2 [for his char 189] | |
Reichart: 3-Jan-2009 | Peter....I'm confused.... Word, nor REBOL have anything to do with the problem.... Encoding problems happen on hundreds of websites (big, popular website), that do not use REBOL, and where Word is not the source. I'll state again... we need strong clear black box logic that unifies all character maps (yeah, all). WE need a single unified character system. | |
PeterWood: 4-Jan-2009 | Reichart ...you are right thep problem is one of encoding. My point is that because Rebol/View uses different encoding systems on different platforms it is left to the application to either ignore the encoding differences or handle them. This may be quite difficult if, as Chris indicated, it is not possible to determine which Windows Codepage is in use from Rebol/View. Tthere is a single unified character system (Unicode ) but there are at least five different ways of representing it (UTF-8, UTF-16LE, UTF-16BE, UTF-32LE & UTF-32BE). Standardisation is a long way off. | |
BrianH: 21-Jan-2009 | However, RIF was intended to store its data in Rebin format (binary encoding of REBOL values). | |
BrianH: 21-Jan-2009 | I meant the function SCRIPT? not your text encoding :) | |
BrianH: 10-Apr-2009 | Codecs are like port schemes, but for encoding and decoding. Different thing. | |
Geomol: 16-Apr-2009 | I get second thought about auto encoding. The reason is, if url! have auto encoding of some characters, then it would be expected, that e.g. file! auto encode too. How do you specify a file named % on disk? In R3, you write: %%25 If the % char should be auto encoded, then you should write that filename as: %% But what if your file is named %25 on disk? It's a bit confusing, but what is the best way? Encoding or not? | |
Geomol: 16-Apr-2009 | Acutally file! does have auto encoding of space. You can specify a filename like this: %"a b" which will give %a b So maybe auto encoding is a good thing in general? | |
Oldes: 16-Apr-2009 | Actually the auto encoding was cousing me some problems some time ago. I'm not sure if it was fixed. Also with the auto encoding urls there is a problem, that for example the second @ char in Pekr's url must not be encoded. | |
Gabriele: 17-Apr-2009 | Geomol: PLEASE NO!!!! The bug that REBOL has is exactly THAT. I beg you guys, please NO! Encoding is there for a reason. If it could be done automatically, there would be no need for encoding! | |
Geomol: 17-Apr-2009 | Gabriele, so you mean, auto encoding shold be avoided? Should auto encoding be removed from these examples: >> %"a b" == %a b >> a%[b-:-c] == [a%25b-:-c] >> a<>[b-:-c] == [a%3C%3Eb-:-c] My view is, that there is a lot of auto encoding already. If auto encoding should be there, it should be done right in all cases. Else it should be avoided alltogether. This situation with - some auto encoding in some cases but not all - is not good. | |
Geomol: 17-Apr-2009 | I guess, auto encoding is user-friendly, if it can be done right in all cases. With auto encoding, you don't have to remember all the strange encoding rules for different datatypes (especially url and email). No auto encoding is technical-programmer-friendly. It's for the programmer, who knows all the strange rules and want complete control. It goes beyond url and email. How should a space be represented in an issue! datatype? Like: >> to-issue "a b" == == #a?b Today you just see a question sign, but it's a space in there. | |
Oldes: 17-Apr-2009 | Geomol, yes. I would like to avoid auto encoding. It's exactly the case where I had the problems. If I write file as %"a b" and it's valid file, I prefere to have it samewhen I for example print it | |
Oldes: 17-Apr-2009 | Instead of auto encoding I would like to see such a basic functions like official url-encode presented in Rebol. (Of course we have our own - another %user.r usage) | |
Geomol: 17-Apr-2009 | I understand the concern against auto encoding. But without it, and with all the datatypes, we have in REBOL, good documentation about what encoding, we have to use for every datatype, is required. | |
BrianH: 17-Apr-2009 | I don't mind the ? issue! display in this case, but I'd like MOLD/all issue! to return a serialized encoding like: #[issue! "a b"] | |
BrianH: 17-Apr-2009 | In general I prefer simple encoding syntax rules over autoencoding, because it is easier to remember explicit rules than it is to remember magic dwim patterns. | |
BrianH: 17-Apr-2009 | Gabriele, RFC compliance of url encoding is important and will be fixed in upcoming R3 releases, even if I have to fix it myself. R2 as well if I end up being the R2 release manager (it's possible). | |
Geomol: 18-Apr-2009 | Are you having a bad week? I'm in doubt about auto encoding, whether it's a good idea or not. And I talk in general, not just one datatype. In R3, you can use % in an email: >> a%[b-:-c] == [a%25b-:-c] I first thought, it was an error. After some talk here, I realized, it's auto encoding of the % character. In R2, you have to write the encoding yourself: >> [a%25b-:-c] == a%[b-:-c] So it's the other way around between R2 and R3. Clearly Carl try to make REBOL smart. Make it figure out, what the programmer mean. In general with computers, I tend to dislike the systems, that try to be smart, if they don't get it 100% correct in every situation (Windows), and I like the systems, that does not try to be smart but put the user in charge (Amiga). So at this point, I think, auto encoding should be avoided. And avoid it in all datatypes, not just url. I may change my mind, if auto encoding can be done 100% correct in all datatypes. For url, it would mean e.g. this: >> to url! "ftp://[me-:-inter-:-net]:[pass-:-server-:-net]" == ftp://me%40inter.net:[pass-:-server-:-net] So my question is, can auto encoding be done 100% correct for all datatypes? If not, avoid it. If auto encoding should be there in some cases but not all, I would like to hear the arguments for that. | |
Gabriele: 19-Apr-2009 | I'm talking about "escaping", while you use the term "encoding" ambiguously to mean both encoding and escaping. THEY ARE TWO DIFFERENT THINGS. | |
Gabriele: 21-Apr-2009 | Now, if your array was representing a url, you could encode it to UTF-8 using the % encoding as well to stay in the ascii subset. This is encoding, but still, it will not solve your @ problem. each @ in the array of integers will become an @ (which is an ascii char) in the final string. | |
Gabriele: 21-Apr-2009 | it is in your *source array* (re: shouting, i just want to give emphasis but we don't have rich text, and the * thing does not work very well for long text) that you must distinguish between @ (the field separator) and % 4 0 (an escaped @, part of the url field text). There is no encoding process that can *automatically* go from your array of integers to the correct url string. | |
Geomol: 21-Apr-2009 | Maybe we got unicode encoding end escape encoding confused. As I see it, given correct rules, auto converting of user input to correct url can be achieved. I made this function to illustrate, what I mean (it's not optimized, but should be easy to read): encode-url: func [input /local url components host] [ components: parse input "@" host: back tail components url: clear "" append url components/1 components: next components forall components [ either components = host [ append url "@" append url components/1 ][ append url "%40" append url components/1 ] ] url ] I can use it both with and without specifying %40 for the first @ in the url: >> encode-url "ftp://[name-:-home-:-net]:[pass-:-server-:-net]" == "ftp://name%40home.net:[pass-:-server-:-net]" >> encode-url "ftp://name%40home.net:[pass-:-server-:-net]" == "ftp://name%40home.net:[pass-:-server-:-net]" It will give correct result in both cases (I use strings, but of course it should be url! datatype in REBOL). Now comes unicode. Given precise rules, how that should happen, I see no problem with encoding this in e.g. UTF-8. So I think, it's possible to do this correctly. But maybe it's better to keep it simple and not do such auto convertions. In any case, the behaviour needs to be well documented, so users can figure out, how to create a valid url. I had same problem as Pekr years ago, and I missed documentation of that. | |
Geomol: 21-Apr-2009 | unicode encoding *and* escape encoding | |
sqlab: 21-Apr-2009 | I think it is good to have a flexible encoding method, but it should not be invoked automatically. | |
BrianH: 27-Apr-2009 | ReBin - binary encoding for REBOL values. Carl is working on it now - as the new host interfaces require it. We will have it very soon. | |
BrianH: 7-Jul-2009 | Having a margin of error is standard operating procedure for IEEE754 floating point numbers, because anything over 15 digits are subject to rounding errors inherent in the encoding. | |
BrianH: 7-Jul-2009 | For instance: >> 0.3 < (0.1 + 0.1 + 0.1) == false >> 0.3 <= (0.1 + 0.1 + 0.1) == true Those values differ in the greater-than-15-digits range due to encoding errors. | |
BrianH: 7-Jul-2009 | Those encoding errors are inherent in the IEEE754 format. The standard way to work around this is to not consider differences in the past-15-digits range. This is the case for all sorts of systems. | |
BrianH: 9-Jul-2009 | It's about finding UTF-8 encoding errors, particularly the overlong forms that are used for security breaches. We can't do that check in TO-STRING because of the overhead (+50%), but it can still be a good idea to check in some cases, and the code is better written in C than REBOL. | |
Maxim: 11-Sep-2009 | optionally encoding them in ascii first... http headers are ascii. | |
Maxim: 11-Sep-2009 | it being so old, its possible the decault encoding was still askin at that point. | |
Pekr: 11-Sep-2009 | REBOL 3.0 accepts UTF-8 encoded scripts, and because UTF-8 is a superset of ASCII, that standard is also accepted. If you are not familiar with the UTF-8 Unicode standard, it is an 8 bit encoding that accepts ASCII directly (no special encoding is needed), but allows the full Unicode character set by encoding them with characters that have values 128 or greater. | |
Maxim: 11-Sep-2009 | maybe peter's excellent encoding script on rebol.org could be used as a basis for converting between ascii -> utf8 when using R3 binary as an input. while R3 has them built-in | |
PeterWood: 11-Sep-2009 | Pekr: Just try a quick test with: print to binary! "Content-type: text/html^/" print to binary! get-env "REQUEST_METHOD" print to binary! get-env "QUERY_STRING" print to binary! get-env "REMOTE_ADDR" to see if it is an encoding problem. | |
Maxim: 11-Sep-2009 | but the loading actually does a re-encoding. utf-8 is compact, buts its slow because you cannot skip unless you traverse the string char by char. which is why they are internally converted to 8 or 16 bit unicode chars... it seems strings become 16 bits a bit too often (maybe a change in later releases, where they are always converted to 16 bits for some reason). | |
PeterWood: 11-Sep-2009 | As I understand it the Windows console only handles single-byte encoding (ie Windows CodePages). | |
PeterWood: 11-Sep-2009 | Pekr: One difference when I ran the cgi was that I used the -c option not the -q option. Perhaps you could try with the -c option in case Carl has done something under the surface about character encoding. | |
Maxim: 11-Sep-2009 | maybe a cgi-specific version of print could be added as a mezz which handles the proper encoding issues to make sure that console and cgi printing are both functional on all distros without needing to change the source. | |
Maxim: 11-Sep-2009 | ah yess.. --cgi could just tell the core to prevent the UTF-16 encoding being done on stdout... | |
Maxim: 11-Sep-2009 | but if we need to output latin-1 afterwards (while dumping the html content, for example), the output encoding should be selectable as a "current default", and all the --cgi would do is set that default to UTF-8 for example. | |
Maxim: 11-Sep-2009 | and some systems pipe the std to have it pushed remotely to other systems... which can expect a different encoding than what is being used by the local engine... I've had this situation in my render-farm management software, as a real-life example. | |
BrianH: 11-Sep-2009 | The trick is that the headers are pushed in ASCII, but the contents in whatever binary encoding the headers specify. | |
Maxim: 11-Sep-2009 | yep... which is why it should be switcheable since rebol now does the encoding for us. :-) | |
Maxim: 23-Sep-2009 | you must realize that the format of a document (encoding of the layout) isn't directly tied to its content. | |
BrianH: 8-Oct-2009 | Any encoding is none of the business of the CGI channel - it is a matter between the script and the cliennt. | |
Maxim: 30-Oct-2009 | I also think the "default" user text format should be configurable. I have absolutely no desire to start using utf-8 for my code and data, especially when I have a lot of stuff that already is in iso latin-1 encoding. | |
Maxim: 30-Oct-2009 | but for data, I would like to have default encoding of my choice. | |
PeterWood: 30-Oct-2009 | Loading programs are not totally immune from encoding problems. An unlikely but possible example: if name = "Ashley TrŸter" [print "Hello Ashley"] | |
Maxim: 30-Oct-2009 | handling encoding is complex in any environment... I had a lot of "fun" handling encodings in php, which uses such a unicode datatype... its not really easier... cause you can't know by the text if its unicode or ascii or binary values unless you tell it to load a sequence of bytes AS one or the other. | |
Maxim: 30-Oct-2009 | cause there is just ONE encoding. | |
Maxim: 30-Oct-2009 | but having some kind of default for read/write could be usefull, instead of having to add a refinement all the time, and force a script to expect a specific encoding. | |
Maxim: 30-Oct-2009 | then it would be easier to change it one place, do all I/O without the refinement. and less work for another to change encoding for the whole app and having to put conditionals everytime we use read/write. | |
Maxim: 30-Oct-2009 | I put a suggestion on the blog about allowing user-creating encoding maps... otherwise, you can load it as binary in R3 and just convert the czech chars to utf-8 multi-byte sequences and convert the binary to string using decode. | |
Maxim: 30-Oct-2009 | is the czech encoding the standard windows ansi encoding? | |
Maxim: 30-Oct-2009 | R3 will interpret litteral strings and decode them using utf-8 (or the header encoding, if its supported) so in this case no. but if the data is stored within binaries (equivalent to R2 which doesn't handle encoding) then, yes, since the binary represents the sequence of bytes not chars. if you use a utf-8 editor, and type characters above 127 and look at them in notepad, you will then see the UTF-8 byte sequences (which will look like garbled text, obviously). | |
Maxim: 30-Oct-2009 | I don't know if R3 has a way of specifying the encoding litterally... like UTF8{} UTF16{} or WIN1252{} ... this would be nice. | |
Gabriele: 31-Oct-2009 | Petr: notepad, as most windows stuff, uses utf-16. much easier to detect though, and R3 could do that (actually, didn't Carl just add that recently?) most "real" editors allow you to use whatever encoding you want, and definitely support utf-8. | |
Gabriele: 1-Nov-2009 | Max, maybe i was not clear. If your rebol scripts are latin1 by default, while my rebol scripts are utf-8 by default, when i send you a rebol script IT WILL NOT WORK in the same way in your machine. the *script*'s encoding *must* be a standard everyone agrees on. then, the script can do whatever it wants with the data, it's your fault if you make it so data cannot be exchanged easily among systems. | |
Maxim: 1-Nov-2009 | although having an encoding parameter in the header would allow us to tell the interpreter in what format the text is without breaking anything. | |
Maxim: 1-Nov-2009 | actually, it is a problem in R2. if you store your code, and I open it with a different codepage version of windows... some letters will be skewed. In an application I wrote, I couldn't write out proper strings for the netherlands, as an example. unicode is slowly becoming the standard for text... especially utf-8. but yes, users have to be educated. within your apps, though, you can handle the encoding as you want... only the rebol sources have to be UTF-8 . as R3 matures, more encodings will be most probably be included in string codecs to support 8 bit Extended ascii from different areas of the world. and even high-profile applications like Apple's iweb have issues with text encoding... so this is a problem for the whole industry & users to adapt to. | |
BrianH: 1-Nov-2009 | Even if we had a text encoding header for R3, it would be a *bad* idea to ever use encodings other than UTF-8. So don't. | |
BrianH: 9-Nov-2009 | From your code, it looks like this is the problem: >> round/floor 3.3 / 1.1 == 2.0 >> 3.3 / 1.1 == 3.0 1.1 and 3.3 aren't exactly representable in IEEE754 encoding, so the 3.0 value you see is actually a little less than 3.0. | |
Geomol: 19-Nov-2009 | See e.g. http://en.wikipedia.org/wiki/Percent-encoding#Percent-encoding_reserved_characters | |
Chris: 19-Nov-2009 | Just percent encoding, of the percent symbol. | |
BrianH: 19-Nov-2009 | Chris, url::%23 and url::# should not be the same. The purpose of percent encoding is to allow you to specify character values without them being treated as syntax. If you specify a # directly in an http url, for instance, it should be taken as the start of the anchor portion of the url. If you percent encode it, it shouldn't be an anchor. | |
BrianH: 20-Nov-2009 | The main gotcha so far to the keep-encoded approach is whether INSERT and APPEND should do some magic percent encoding or not. It seems that it may be a better approach to just assume that the programmer knows what they are doing and just insert what they say to insert as is, as long as the url character set restrictions are met. This would mean that the programmer would need to handle their own percent encoding where needed, and that INSERT or APPEND would not do any encoding or decoding. Or perhaps some non-syntax characters, such as space, could be encoded by MOLD instead of rejected and DECODE-URL just adjusted to not freak out when it seees them. What do you think? | |
Maxim: 20-Nov-2009 | I vote for NO automatic encoding. | |
Chris: 20-Nov-2009 | I think I'd look for at least the following behaviour: >> url::%23# == url::%23# >> join url:: "%23#" == url::%23# >> join url:: " " ; space is not in the uri spec, so could arguably be converted == url:: >> read url::%23# ; dependent on the scheme, I guess == "GET %23" The problem with magic percent encoding is with the special characters. As it is now, it is impossible (so far as I can ascertain) to build an http url that encodes special characters eg "#=&%" - Twitter being a great case where an encoded # is integral to the service. Given though that the list of special characters is short and well defined, perhaps they could be the exception to a magic encoding rule. | |
BrianH: 21-Nov-2009 | The standard TELLS you when - No it doesn't. The standard doesn't cover R3 internals, not even in a generic non-language-specific way. The "when" I was talking about has nothing to do with the encoding itself - it has to do with internal data formats. | |
BrianH: 16-Dec-2009 | I mean 32-bit integer! type, not 32-bit binary encoding converting to the current 64-bit integer! type. | |
Pekr: 11-Jan-2010 | hmm, but /as was proposed to specify just type of encoding IIRC, not some other functionality ... some of us wanted /as being more general, allowing you to specify a codec to decode. Codecs are so far inefficient (not streamed), because you have to read all data firts, then pass it to encode/decode. Carl never posted a resolution to read/write case .... | |
Group: !Cheyenne ... Discussions about the Cheyenne Web Server [web-public] | ||
Graham: 19-Aug-2009 | this is the request GET /md/creategoogledoc.rsp?gdoc=simple-letter.rtf&patientid=2832&encounter=none HTTP/1.1 Host: gchiu.no-ip.biz:8000 User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.1.2) Gecko/20090729 Firefox/3.5.2 (.NET CLR 3.5.30729) Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 Accept-Language: en-us,en;q=0.5 Accept-Encoding: gzip,deflate Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7 Keep-Alive: 300 Connection: keep-alive Referer: http://gchiu.no-ip.biz:8000/md/Listgoogledocs.rsp Cookie: RSPSID=QZPTPCZIWWMMYBKWHWRQETGM | |
Will: 19-Aug-2009 | answer from the redirection: HTTP/1.1 302 Moved Temporarily Content-Type: text/html; charset=UTF-8 Cache-Control: no-cache, no-store, max-age=0, must-revalidate Pragma: no-cache Expires: Fri, 01 Jan 1990 00:00:00 GMT Date: Wed, 19 Aug 2009 21:43:58 GMT Set-Cookie: WRITELY_UID=001dfpwvx2b|928b9de9e7bf56448b665282fc69988b; Path=/; HttpOnly Set-Cookie: GDS_PREF=hl=en;Expires=Sat, 17-Aug-2019 21:43:58 GMT;HttpOnly Set-Cookie: SID=DQAAAHcAAAB0kldc4zZSC_0FoiL6efkWE11k9SQkAIn-N3WfAzIOVe1cM-remnLUtV3Z4M-BFRf5eknz7hr_U3YzW94nECo0-aDnpxrLGiBglWGN4VkfLr5Hh7t2XNyRCA3VWd005SfCmZ9D8-1MUltjRI8X56VLde5Wy8HD92gh-8YkJBJxQA;Domain=.google.com;Path=/;Expires=Sat, 17-Aug-2019 21:43:58 GMT Location: https://www.google.com/accounts/ServiceLogin?service=writely&passive=true&nui=1&continue=http%3A%2F%2Fdocs.google.com%2FDoc%3Fdocid%3D0AcdrOHdpKfrWZGZwd3Z4MmJfMnNxcDJkNmZu%26amp%3Bhl%3Den&followup=http%3A%2F%2Fdocs.google.com%2FDoc%3Fdocid%3D0AcdrOHdpKfrWZGZwd3Z4MmJfMnNxcDJkNmZu%26amp%3Bhl%3Den<mpl=homepage&rm=false Content-Encoding: gzip X-Content-Type-Options: nosniff Content-Length: 325 Server: GFE/2.0 | |
Will: 19-Aug-2009 | more redirection: HTTP/1.1 302 Moved Temporarily Set-Cookie: WRITELY_SID=DQAAAHoAAADh80lBIw7e5Hg06TLEBgCY33XQGJ1aUH5OrCF_ir1xLwffKNaCqNdUL6qYfvgjNppDBI4lTNBSTjJWMG_Ze0_qJnveBCAtihBDFwBlOb-H7RlkfgJwM7pBbyKV7bm4M3mqUivD1emtpxgl32vG8CEP1poQ2479HQXrlobsp7Egzw;Domain=docs.google.com;Path=/;Expires=Thu, 03-Sep-2009 21:43:59 GMT Location: http://docs.google.com/Doc?docid=0AcdrOHdpKfrWZGZwd3Z4MmJfMnNxcDJkNmZu&%3Bhl=en&pli=1 Content-Type: text/html; charset=UTF-8 Content-Encoding: gzip Date: Wed, 19 Aug 2009 21:43:59 GMT Expires: Wed, 19 Aug 2009 21:43:59 GMT Cache-Control: private, max-age=0 X-Content-Type-Options: nosniff Content-Length: 232 Server: GFE/2.0 | |
Will: 19-Aug-2009 | and the the target page: HTTP/1.1 200 OK Set-Cookie: WRITELY_SID=DQAAAHoAAADh80lBIw7e5Hg06TLEBgCY33XQGJ1aUH5OrCF_ir1xLwffKNaCqNdUL6qYfvgjNppDBI4lTNBSTjJWMG_Ze0_qJnveBCAtihBDFwBlOb-H7RlkfgJwM7pBbyKV7bm4M3mqUivD1emtpxgl32vG8CEP1poQ2479HQXrlobsp7Egzw;Domain=docs.google.com;Path=/;Expires=Thu, 03-Sep-2009 21:43:59 GMT Set-Cookie: GDS_PREF=hl=en;Expires=Sat, 17-Aug-2019 21:43:59 GMT;HttpOnly Set-Cookie: user=; Expires=Tue, 18-Aug-2009 21:43:59 GMT; Path=/; HttpOnly Set-Cookie: login=; Expires=Tue, 18-Aug-2009 21:43:59 GMT; Path=/; HttpOnly Content-Type: text/html; charset=UTF-8 Cache-Control: no-cache, no-store, max-age=0, must-revalidate Pragma: no-cache Expires: Fri, 01 Jan 1990 00:00:00 GMT Date: Wed, 19 Aug 2009 21:43:59 GMT Content-Encoding: gzip Transfer-Encoding: chunked X-Content-Type-Options: nosniff Server: GFE/2.0 |
201 / 361 | 1 | 2 | [3] | 4 |