AltME groups: search
Help · search scripts · search articles · search mailing listresults summary
world | hits |
r4wp | 28 |
r3wp | 173 |
total: | 201 |
results window for this page: [start: 101 end: 200]
world-name: r3wp
Group: All ... except covered in other channels [web-public] | ||
Louis: 31-Oct-2006 | I'm rather badly needing a pagemaker pm5 file converted to ASCII format. My copy of Pagemaker has been corrupted, and I just want to print a document using LaTeX. The file is about 309 MB. Is there anyone here that can do this for me? | |
Group: !AltME ... Discussion about AltME [web-public] | ||
PeterWood: 16-Jan-2011 | It's not really a problem if you remember that AttME is designed to support 7-bit ASCII across operating systems. It's just these users that wan't to write some fancy characters ;-) | |
Group: Core ... Discuss core issues [web-public] | ||
Gordon: 29-Sep-2006 | When you import data using "data: read/binary {sometextfile}" you seem to get a string of hex values. Ex: probe 'data' of a file containg the word "Hello" results in #{48656C6C6F} but if you probe first data it returns 72. So when you probe the entire data stream it returns it in hexidecimal format but when you probe each character it returns a decimal value. At any rate how do you convert the characters in the variable 'data' back into ASCII values? IOW, how do you convert the decimal value of 72 back into an "H" or the #{48656C6C6F} back into "Hello"? | |
Maxim: 19-Oct-2006 | not saying /lines has an issue, but I have loaded 700MB ascii files on a 1GB RAM computer... 150 is peanuts. but I never use the /lines argument. | |
Jerry: 20-Oct-2006 | The following code: unicode-to-ascii: func [ from to /local fs ts sz] [ fs: open/binary/direct/read from ts: open/binary/direct/write to sz: size? from fs/1 fs/1 ; discard the first two bytes, FFFE for i 3 sz 2 [ append ts to-char fs/1 fs: skip fs 1 ; SKIP is the problem ] close fs close ts ] unicode-to-ascii %/c/Unicode.txt %/c/Ascii.txt In REBOL/View 1.2.7.3.1 12-Sep-2006 Core 2.6.0 ** CRASH (Should not happen) - Expand series overflow In REBOL/View 1.3.2.3.1 5-Dec-2005 Core 2.6.3 ** Script Error: Not enough memory ** Where: do-body ** Near: fs: skip fs 1 | |
Rebolek: 2-Nov-2007 | I need to sort some French words but REBOL's SORT puts accented characters on the end (sorts just by ASCII). Has anybody got some enhanced SORT for French? | |
james_nak: 18-Feb-2008 | Slight change of subject but here I am all happy saving/all and loading my objects and then it hits me: Just what is this "serialized" data? How is it different (outside of that fact that it's ascii representation is different.) I don't know if I need to know to use it but in case I'm ever on TV I want to answer it correctly. | |
Louis: 20-Sep-2008 | Ok, I found the problem. When I saved the file with the Windows program, I saved it in utf8 format. Resaving it in ascii format solved the problem. I realized the problem with I noticed some Chinese characters in the output past what I pasted in above. | |
BrianH: 5-Mar-2009 | kib2: "Does that mean that we can use unicode encoding with the help of r2-forward ?" No, I only can only spoof datatypes that don't exist in R2, and R2 has a string! type. The code should be equivalent if the characters in the string are limited to the first 256 codepoints of Unicode (aka Latin-1), though only the first 128 codepoints (aka ASCII) can be converted from binary! to string and have the binary data be the same as minimized UTF-8. | |
Sunanda: 30-May-2009 | I have a printable? function that checks if a string has only ASCII printable characters. Would that meed your need, Maxim? | |
BrianH: 30-Jan-2010 | ascii?: funct [ "Returns TRUE if value or string is in ASCII character range (below 128)." value [string! file! email! url! tag! issue! char! integer!] ] compose [ ascii: (charset [#"^(00)" - #"^(7F)"]) either any-string? value [parse/all/case value [any ascii]] [value < 128] ] ; Note: Native in R3. | |
BrianH: 30-Jan-2010 | invalid-utf?: funct [ "Checks for proper UTF encoding and returns NONE if correct or position where the error occurred." data [binary!] /utf "Check encodings other than UTF-8" num [integer!] "Bit size - positive for BE negative for LE" ] compose [ ascii: (charset [#"^(00)" - #"^(7F)"]) utf8+1: (charset [#"^(C2)" - #"^(DF)"]) utf8+2: (charset [#"^(E0)" - #"^(EF)"]) utf8+3: (charset [#"^(F0)" - #"^(F4)"]) utf8rest: (charset [#"^(80)" - #"^(BF)"]) switch/default any [num 8] [ 8 [ ; UTF-8 unless parse/all/case data [(pos: none) any [ pos: ascii | utf8+1 utf8rest | utf8+2 2 utf8rest | utf8+3 3 utf8rest ]] [as-binary pos] ] 16 [ ; UTF-16BE pos: data while [not tail? pos] [ hi: first pos case [ none? lo: pick pos 2 [break/return pos] 55296 > w: hi * 256 + lo [pos: skip pos 2] ; #{D800} 57343 < w [pos: skip pos 2] ; #{DFFF} 56319 < w [break/return pos] ; #{DBFF} none? hi: pick pos 3 [break/return pos] none? lo: pick pos 4 [break/return pos] 56320 > w: hi * 256 + lo [break/return pos] ; #{DC00} 57343 >= w [pos: skip pos 4] ; #{DFFF} ] none ] ; none = valid, break/return pos = invalid ] -16 [ ; UTF-16LE pos: data while [not tail? pos] [ lo: first pos case [ none? hi: pick pos 2 [break/return pos] 55296 > w: hi * 256 + lo [pos: skip pos 2] ; #{D800} 57343 < w [pos: skip pos 2] ; #{DFFF} 56319 < w [break/return pos] ; #{DBFF} none? lo: pick pos 3 [break/return pos] none? hi: pick pos 4 [break/return pos] 56320 > w: hi * 256 + lo [break/return pos] ; #{DC00} 57343 >= w [pos: skip pos 4] ; #{DFFF} ] none ] ; none = valid, break/return pos = invalid ] 32 [ ; UTF-32BE pos: data while [not tail? pos] [ if any [ 4 > length? pos negative? c: to-integer pos 1114111 < c ; to-integer #{10FFFF} ] [break/return pos] ] ] -32 [ ; UTF-32LE pos: data while [not tail? pos] [ if any [ 4 > length? pos negative? c: also to-integer reverse/part pos 4 reverse/part pos 4 1114111 < c ; to-integer #{10FFFF} ] [break/return pos] ] ] ] [ throw-error 'script 'invalid-arg num ] ] ; Note: Native in R3, which doesn't support or screen the /utf option yet. ; See http://en.wikipedia.org/wiki/Unicodefor charset/value explanations. | |
Geomol: 24-May-2010 | The only way using SWITCH, I see, is to operate with ascii values, and that isn't good. | |
Henrik: 13-Jun-2010 | ascii: charset [#"^(00)" - #"^(7F)"] ascii-rule: [ copy transfer [ascii some ascii] ( ; <- problem head insert tail output-string transfer ) ] This rule does not look correct. I replaced [ascii some ascii] with [some ascii] and now it works. | |
Graham: 15-Sep-2010 | ascii printable characters ... we are talking about saving ink here! | |
Group: View ... discuss view related issues [web-public] | ||
Gabriele: 4-Dec-2006 | insert is a word because there is no char for it in ascii; there is a char for delete, so it's a char :) | |
Jerry: 9-Dec-2006 | Gabriele, Actually, Oldes is right. Showing two-byte characters is good enough. IME is not necessary for REBOL/View, because every Chinese/Japanese/Korea OS has proper IMEs installed. IME sends the codes encoded in the OS codepage to the focused window. For Example, If the codepage used by Windows XP is Big5 and I type in the Character which means one ( #{A440} in Big5, #{4E00} in Unicode, see http://www.unicode.org/cgi-bin/GetUnihanData.pl?codepoint=4E00 ), my REBOL/View program will get two key events sequentially, which are #{A4} and #{40}. REBOL/View shows it as two characters instead of one. I hope that REBOL/View can let the OS do the text-drawing, like the REBOL/core console does. REBOL/core console doesn't have the Chinese-Character-Showing issue, because it basically send the #{A4} and #{40} to the console, and let the OS do the text-drawing. the OS knows that #{A4} and #{40} should be combine to one Big5 Character, so it just show it as one character. Of course, if I type in two ASCII characters, the OS is smart enough not to combine them into one "non-existing" Big5 Character. CJK encodings are supersets of ASCII, just like UTF-8 is a superset of ASCII. It's nothing to do with Unicode, so it is not too difficult to fix, I guess. Please fix this in 2.7.5 or 2.7.6 please ... It's on my wish list for Santa Claus this year. | |
Sunanda: 3-Jan-2009 | Base 64 in REBOL is, basically, a type of ASCII represention. It can stand a certain amount of damage (like whitespace being inserted -- imagine it is sent as an email) and can still be reconstructed: str: "abcdefabcdef" ;; a string s64: enbase str ;; enbased to base-64 by default replace/case/all s64 "W" " W " ;; whitespace polluted in transit str = to-string debase s64 ;; do we get it back intact? | |
Group: I'm new ... Ask any question, and a helpful person will try to answer. [web-public] | ||
Gregg: 21-Jun-2009 | In my largest grammar, where incoming data may be malformed, I've found it invaluable to have the rule tracing built in, enabled by a flag. e.g. TSAFE-CHAR: [ (rule-trace "TSAFE-CHAR IN") copy =TSAFE-CHAR charset-21 | charset-22 | charset-23 | charset-24 | charset-25 | NON-US-ASCII (rule-trace "TSAFE-CHAR OUT") ] rule-trace: func [value /local rule-name action] [ rule-name: first parse value none ;print [tab rule-name tab found? find don't-trace rule-name] action: second parse value none if all [ any [ trace-rules? = true action = form trace-rules? ] not found? find don't-trace rule-name ][ val: attempt [mold get to word! join "=" rule-name] print ["===" value any [val ""]] ] ] Don't-trace allows you to turn off selected rules that may get called a lot. You could also set tracing levels if you wanted. | |
joannak: 26-Dec-2009 | I have no plans on jumping into R3 at this point, since there are so much even on R2 I need to learn. But for the future reference, is there any plan for a tool (or mode in Rebol itself) to help Flagging out those R2->3 differences... For example, I remember seeing that PICK works differently on R3 (right, unlike R2 which is offsetted by one), it'll be quite hard to spot all those from source alone, since parameters are often defined at runtime? Some changes will of course be obvious (for spotting), like sockets, since their parameters have been canged a lot. but difference on data readiding/writing (ascii/binary/unicode etc) may hide itself quite a while. | |
Davide: 30-Jun-2010 | >> append #{} 15 == #{3135} >> append #{} "15" == #{3135} Why if I append an integer to a binary it is first converted to an ascii string? IMHO it should be like this: >> append #{} to-char 15 == #{0F} | |
Anton: 2-Aug-2010 | Then there's ascii char 160, which you can generate in rebol with to-char 160. I think they call it a 'hard space' or something. | |
Endo: 8-Dec-2011 | then I found a very simple way to convert a unicode file to ascii in DOS, TYPE my-unicode-file > my-ascii-file This line converts the file to ascii, just non-convertable characters looks wierd but rest is ok. | |
Group: Parse ... Discussion of PARSE dialect [web-public] | ||
Chris: 22-Oct-2009 | Is there any advantage in breaking up charsets that represent a large varied range of the 16-bit character space? For example, XML names are defined as below (excluding > 2 ** 16), but are most commonly limited to the ascii-friendly subset: w1: charset [ #"A" - #"Z" #"_" #"a" - #"z" #"^(C0)" - #"^(D6)" #"^(D8)" - #"^(F6)" #"^(F8)" - #"^(02FF)" #"^(0370)" - #"^(037D)" #"^(037F)" - #"^(1FFF)" #"^(200C)" - #"^(200D)" #"^(2070)" - #"^(218F)" #"^(2C00)" - #"^(2FEF)" #"^(3001)" - #"^(D7FF)" #"^(f900)" - #"^(FDCF)" #"^(FDF0)" - #"^(FFFD)" ] w+: charset [ #"-" #"." #"0" - #"9" #"A" - #"Z" #"_" #"a" - #"z" #"^(B7)" #"^(C0)" - #"^(D6)" #"^(D8)" - #"^(F6)" #"^(F8)" - #"^(037D)" #"^(037F)" - #"^(1FFF)" #"^(200C)" - #"^(200D)" #"^(203F)" - #"^(2040)" #"^(2070)" - #"^(218F)" #"^(2C00)" - #"^(2FEF)" #"^(3001)" - #"^(D7FF)" #"^(f900)" - #"^(FDCF)" #"^(FDF0)" - #"^(FFFD)" ] word: [w1 any w+] | |
Chris: 22-Oct-2009 | Both w1 and w+ appear to be very large values. Would it be smart to perhaps do: [[aw1 | w1] any [aw+ | w+]] Where 'aw1 and 'aw+ are limited to ascii values? | |
Maxim: 20-Sep-2010 | claude... so, did you try to run it as a script? one thing I would do... since this is a strange error is to retype this: " ^-" in your editor... to make sure its not using the wrong ascii character on your os... it should not be a problem... but there is something weird going on here. | |
Group: !RebGUI ... A lightweight alternative to VID [web-public] | ||
Ashley: 17-Feb-2007 | Use REBOL/Core and an ASCII interface then! ;) | |
Group: Tech News ... Interesting technology [web-public] | ||
Pavel: 8-Apr-2011 | in Datamatrix definition is written capacity of max 2335 bytes per one symbol of size 144x144 pixels, with some inbuilt compression it can be 3116 ascii characters (readable chars are ess than 8bit encoded), scanner may read mutiple symbols at once. much more importand characteristics is using reed-solomon self repairing code to ensure readability up to 30% picture damage for each symbol. | |
Group: !REBOL3-OLD1 ... [web-public] | ||
Graham: 10-Oct-2007 | ascii art | |
Pekr: 14-Dec-2007 | as for UTF8 - is it compatible to current +128 char extension? I mean e.g. czech alphabet uses special characters above 128 ASCII value .... | |
BrianH: 14-Dec-2007 | UTF-8 is a strict extention of ASCII, but ASCII is only defined between 0 and 127. Characters 128+ are not ASCII, they are extensions, and their meaning depends on the codepage. The codepage of an 8-bit string is unknown, unless you specify it externally (or do a lot of statistical calculations). Strings or scripts with characters extended by a codepage will have to be translated by a codepage-to-utf-8 function or process specific to the particular codepage, ahead of time. Fortunately, such a process can be fast and even implemented in a byte-oriented language easily. | |
BrianH: 14-Dec-2007 | ASCII characters fit in one byte, the rest take some more. It can progress up to 5 bytes but those are rare. | |
PeterWood: 7-Dec-2008 | I can understand how Pekr, Graham and many others feel about the lack of R3 releases especially given all the early announcements and the first public alpha. However, I really feel that Carl is still prototyopng R3, he is a long way from settling on the design of R3. There is too much missing for the current version to be considered an Alpha (e.g. No modules, no threads, ASCII GUI, Host environment & Runtime Core in a single executable,) This is seems to be Carl's way of working and something that he has to work through step by step. | |
BrianH: 31-Dec-2008 | I would not trust non-ascii characters for now. With any luck the server saves the messages as binary UTF-8, don't know yet. | |
PeterWood: 1-Jan-2009 | Not trusting non-ascii characters infers that the current desing of RebDev is "ignorant"of character encoding. If that is the case, it is a shame as RebDev could have been a great example of an "up-to-date" application built with R3. | |
BrianH: 2-Jan-2009 | That would have to be the case with R2 clients, as the client is the part that handles character encoding. However, there are no R2 clients yet. The messages appear to be UTF-8 encoded end-to-end, stored in binary on the server, which is encoding agnostic. Once we have R2 clients, they will have to handle the codepage-to-UTF-8 encoding, or just stick to ASCII. | |
btiffin: 3-Jan-2009 | If I was a betting man, by 2020 UTF-8 will reign and compsci grads will need a history book to learn about ASCII. | |
Chris: 4-Jan-2009 | Brian -- ASCII is a subset of UTF-8... | |
Chris: 4-Jan-2009 | With QM, I try to assume (and enforce) UTF-8 (declaring on forms, html escaping everything ASCII+), but it's definitely a chore. | |
Maxim: 7-Jan-2009 | but load can also understand just about all human readable ascii data ALSO. | |
Maxim: 7-Jan-2009 | then, maybe, although it does the same thing AS load, it wouldn't be used by the interpreter, and would explicitely allow the interpreter to use the loading functionality which already understands about 95% of human readable ascii text as it is. | |
Maxim: 7-Jan-2009 | really brian, I can't recall how many times I've had ascii files from sources which I could almost just load as-is. and when the extra syntax, was just useless decoration, which can be ignored. | |
[unknown: 5]: 21-Jan-2009 | It can be used on binary data as well as ascii data and will carve out the blocks of the buffer. | |
kib2: 15-Feb-2009 | BrianH: ok, thanks. What about allowing ASCII chars in user names until it's really finished? | |
Gabriele: 21-Apr-2009 | Now, if your array was representing a url, you could encode it to UTF-8 using the % encoding as well to stay in the ascii subset. This is encoding, but still, it will not solve your @ problem. each @ in the array of integers will become an @ (which is an ascii char) in the final string. | |
Geomol: 31-Jul-2009 | Some languages only allow 7-bit ascii in the source except for strings. | |
BrianH: 31-Jul-2009 | All standard functions and syntax in REBOL fit within 7-bit ASCII, which is why R3 source is UTF-8. | |
Maxim: 11-Sep-2009 | optionally encoding them in ascii first... http headers are ascii. | |
Maxim: 11-Sep-2009 | the header MUST be printed out in ASCII. | |
Maxim: 11-Sep-2009 | askin = ascii | |
Maxim: 11-Sep-2009 | AFAIK unicode -> ascii is possible in R3 but don't know how... not having done it myself. IIRC its on the R3 wiki or docs pages somehow.... googling it should give you some clues. | |
Pekr: 11-Sep-2009 | REBOL 3.0 accepts UTF-8 encoded scripts, and because UTF-8 is a superset of ASCII, that standard is also accepted. If you are not familiar with the UTF-8 Unicode standard, it is an 8 bit encoding that accepts ASCII directly (no special encoding is needed), but allows the full Unicode character set by encoding them with characters that have values 128 or greater. | |
Pekr: 11-Sep-2009 | It should accept Ascii directly .... | |
Maxim: 11-Sep-2009 | string! printing, to be more precise. UTF and ASCII are converted to two byte strings IIRC. which is why you must re-encode them before spitting them via print. | |
Maxim: 11-Sep-2009 | maybe peter's excellent encoding script on rebol.org could be used as a basis for converting between ascii -> utf8 when using R3 binary as an input. while R3 has them built-in | |
Maxim: 11-Sep-2009 | sort of like: print to-ascii to-binary "some text" | |
Pekr: 11-Sep-2009 | But this is some low level issue I should not care about. It displays Czech codepage correctly. Also the script is said being by default UTF-8, which is superset to ASCII. IIRC it was said, that unless we will not use special chars, it will work transparently. If it works on input, it should work also on output, no? | |
Pekr: 11-Sep-2009 | OK, so we have http headers, which are supposed to be in ASCII, and then html content, which can be encoded. Which responsibility is it to provide correct encoding? A coder, or an http server? Hmm, maybe coder, as I am issuing http content headers in my scripts? | |
BrianH: 11-Sep-2009 | The trick is that the headers are pushed in ASCII, but the contents in whatever binary encoding the headers specify. | |
Pekr: 11-Sep-2009 | how is that Linux and OS-X don't experience any problems? They do use UTF-8, but that is not ASCII either, no? | |
Maxim: 11-Sep-2009 | UTF lower's 127 odes are the same as ASII and single byte. so if you don't use special chars, or the null char, you are basically dumping ASCII... this is the reason for its existence. | |
Maxim: 11-Sep-2009 | IIRC the whole windows API is either ASCII or UTF-16. | |
BrianH: 8-Oct-2009 | CGI output should be binary, and the headers output in 7bit ASCII (not UTF-8) through that binary output. | |
Pekr: 30-Oct-2009 | if in ascii, it will be loaded ok, no? | |
Maxim: 30-Oct-2009 | ascii is 127 bytes... we are talking about the upper 127 chars. | |
Pekr: 30-Oct-2009 | Ascii is 255 ;-) | |
Maxim: 30-Oct-2009 | upper 127 are NOT ascii. | |
Maxim: 30-Oct-2009 | http://en.wikipedia.org/wiki/ASCII | |
Maxim: 30-Oct-2009 | if you only use ascii (lower 127 chars) you will see no difference. | |
Maxim: 30-Oct-2009 | hum... cause everything I use is ascii or latin-1 ? | |
Maxim: 30-Oct-2009 | but utf-8 editors aren't rare nowadays, and using utf-8 sequences isn't hard... really, if you tuely want to keep using as ascii editor | |
Maxim: 30-Oct-2009 | handling encoding is complex in any environment... I had a lot of "fun" handling encodings in php, which uses such a unicode datatype... its not really easier... cause you can't know by the text if its unicode or ascii or binary values unless you tell it to load a sequence of bytes AS one or the other. | |
PeterWood: 30-Oct-2009 | A script cpud have two different encodings if differenlty encoded files were included. For example, you could use a script from Rebol.org in one of your scripts. You probably use Windows Code Page 1250 but most scripts in the library use other encodings. This doesn't give big problems as most of the code in the Library is "pure" ASCII | |
Maxim: 1-Nov-2009 | actually, it is a problem in R2. if you store your code, and I open it with a different codepage version of windows... some letters will be skewed. In an application I wrote, I couldn't write out proper strings for the netherlands, as an example. unicode is slowly becoming the standard for text... especially utf-8. but yes, users have to be educated. within your apps, though, you can handle the encoding as you want... only the rebol sources have to be UTF-8 . as R3 matures, more encodings will be most probably be included in string codecs to support 8 bit Extended ascii from different areas of the world. and even high-profile applications like Apple's iweb have issues with text encoding... so this is a problem for the whole industry & users to adapt to. | |
Geomol: 16-Dec-2009 | In R2: >> to binary! 10000 == #{3130303030} So we get the ascii value of each digit in the number. In R3: >> to binary! 10000 == #{0000000000002710} The number is seen as a 64-bit integer, and we get the binary representation of that. | |
Group: !Cheyenne ... Discussions about the Cheyenne Web Server [web-public] | ||
Dockimbel: 17-Sep-2009 | Btw, in order to forge emails to be sent, I've tried to rely on REBOL's builtin email support functions (big mistake!). You should know that they *are not* RFC compliant, the biggest issues being : - emails produces by REBOL are using LF as EOL instead of CRLF (RFC 2822). See http://cr.yp.to/docs/smtplf.html - headers are not encoded for non ASCII-7bits characters (RFC 2047) So, I've deeply patched the builtin code at runtime to workaround this, but, I should have better rewrote it all from scratch (that's what I intend to do when I'll have enough free time). | |
PeterWood: 7-Jan-2011 | Alan, I'm logged in to AltME from Ubuntu - so many non-ascii characters get displayed incorrectly. In your script the closing double-quote after /jsontest.cgi doesn't display properly. Perhaps you could check that it really is a double-quote and not a "smart-quote" in the actual source. | |
Group: !REBOL2 Releases ... Discuss 2.x releases [web-public] | ||
BrianH: 2-Jan-2010 | OK, now that we have 2.7.7 released (even though there is more work to do, i.e. platforms and the SDK), it is time to look ahead to 2.7.8 - which is scheduled for release in one month on February 1. The primary goal of this release is to migrate to REBOL's new development infrastructure. This means: - Migrating the RAMBO database to a new CureCode project and retiring RAMBO. - Using Carl's generation code for the manual to regenerate the R2 manual, so we can start to get to work updating it. - Porting the chat client to R2 using the new functions and building a CHAT function into R2 similar to the R3 version. The R2 chat client might be limited to the ASCII character set, though support for the Latin-1 character set might be possible. Still text mode for now, though if anyone wants to write a GUI client (Henrik?) we can put it on the official RT reb site accessible from the View desktop. The server is accessed through a simple RPC protocol and is designed to be easily scriptable. It turns out that Carl already rewrote the installer for 2.7.something, but it was turned off because of a couple minor bugs that we were able to fix in 2.7.7. With any luck, only minor fixes to the registry usage will be needed and we'll be good to go. As for the rest, it's up to you. Graham seems to have a good tweak to the http protocol, and others may want to contribute their fixes. | |
Group: !REBOL3 Extensions ... REBOL 3 Extensions discussions [web-public] | ||
Robert: 28-Nov-2009 | Playing with the extension example: IMO it's done to complicated. - Why do I need make-ext.r? Do I always need it or just for this specific example? - Why is the init block a const char array and not just a plain ASCII text? | |
Oldes: 11-Nov-2010 | So with Cyphre's help I have this function: char* rebser_to_utf8(REBSER* series) { char *uf8str; REBCHR* str; REBINT result = RL_GET_STRING(series, 0 , (void**)&str); if (result > 0){ //unicode string int iLen = wcslen(str); int oLen = iLen * sizeof(REBCHR); uf8str = malloc(oLen); int result = WideCharToMultiByte(CP_UTF8, 0, str, iLen, uf8str, oLen, 0, 0); if (result == 0) { int err = GetLastError(); RL->print("err: %d\n", err); } } else if (result < 0) { //bytes string (ascii or latin-1) uf8str = malloc(strlen((char *)str)); strcpy(uf8str, (char *)str); } return uf8str; } and I can than use: .. char *filename = rebser_to_utf8(RXA_SERIES(frm, 1)); status=MagickReadImage(current_wand, filename); free(filename); if (status == MagickFalse) { ThrowWandException(current_wand); } return RXR_TRUE; | |
Oldes: 11-Nov-2010 | This seems to be working: char* REBSER_to_UTF8(REBSER* series) { char *uf8str; REBCHR* str; REBINT result = RL_GET_STRING(series, 0 , (void**)&str); if (result > 0){ //unicode string int iLen = wcslen(str); //int oLen = iLen * sizeof(REBCHR); int oLen = WideCharToMultiByte( CP_UTF8, 0, str, -1, NULL, 0, NULL, NULL); uf8str = malloc(oLen); int result = WideCharToMultiByte(CP_UTF8, 0, str, iLen, uf8str, oLen, 0, 0); if (result == 0) { int err = GetLastError(); RL->print("err: %d\n", err); } uf8str[oLen] = 0; } else if (result < 0) { //bytes string (ascii or latin-1) uf8str = strdup((char *)str); } return uf8str; } | |
Group: !REBOL3 ... [web-public] | ||
Henrik: 26-Oct-2010 | That is, I get ?? along with a few other chars that I'm not sure are outside the ascii range. | |
BrianH: 18-Nov-2010 | One thing will definitely be easier though: JSON and Javascript define that they have Unicode source, but don't have a way to specify the encoding (they are text standards, not binary). They can be handled easily in R3 once the source is converted to a string though, since that conversion will handle the encoding issues. In R2 you'd have to either stick to ASCII data or use Gabriele's text codecs and then parse the UTF-8. | |
Pavel: 3-Dec-2010 | An idea of NTP scheme, but servers comunicates only on 123 UDP port. overview of time services: Daytime: Ascii response, Graham and Ladislav has written a scheme/tool already port 13 Time: most simple possible server listening on port 37 answer 32bit unsigned number of second from 1-1-1900/0:00 (calculation of human readable date is not so trivial because of leaping seconds inserted to UTC with no rule at all, an Earth is dancing a Jive in fact) HTTP: use inserted Date-time from any header returned from server port 80 SNTP: more precise protocol (contains also fraction of second in reply) subprotocol of NTP on UDP port 37 NTP: most precise available to compare more time servers, and calculate with computed transport delay and phase shift from evaluated couple of handshaking packets. UDP port 37 The latter two use minimally 12 32bit binary packets for request and response, symmetric or asymetric cryptography possible (honestly I've no clue why this). | |
BrianH: 17-Feb-2011 | I'm experimenting to determine the exact syntax of words in R3, and see whether there are any undiscovered bugs. Just sticking to ASCII for now - due to http://issue.cc/r3/1230- but things look promising so far. I'll convert the results to PARSE rules. | |
Group: !REBOL3 Host Kit ... [web-public] | ||
Oldes: 10-Jan-2011 | RL_GET_STRING returns number > 0 if the source is unicode and < 0 if ascii | |
Group: Core ... Discuss core issues [web-public] | ||
Ladislav: 16-Oct-2010 | not to mention, that I could have put in all 127 ASCII characters | |
BrianH: 16-Oct-2010 | Oh, and not just ASCII; full Unicode. | |
Gabriele: 6-Nov-2010 | well... enbase just converts binary (8-bit) data to a form that is ascii printable. it does not say anything about what the 8-bit data contains. | |
Group: Red ... Red language group [web-public] | ||
BrianH: 29-Mar-2011 | Doc, by multibyte chars I wasn't talking about variable-size, I was talking about fixed-size with Unicode support. A char! would have a single size, but that size would either be 1, 2 or 4 bytes depending on whether the base plarform supports ASCII, Unicode2 or full Unicode. | |
Andreas: 29-Mar-2011 | US ASCII only defines 128 characters. | |
BrianH: 29-Mar-2011 | It still doesn't handle the full set of Unicode, just ASCII, but I can reverse the charsets to be complemented opposites and it will handle those too. | |
BrianH: 11-Oct-2011 | http://issue.cc/r3/1302for the ASCII range in R3. The R3 parser tends to be excessively forgiving outside the ASCII range, accepting too much, though I haven't done the thorough test. | |
Group: World ... For discussion of World language [web-public] | ||
Geomol: 2-Dec-2011 | The lexer is 7 bit, so words can only hold 7-bit ascii characters. String and other data is 8-bit. | |
Oldes: 2-Dec-2011 | Words are probably ok as ascii, but unicode! datatype is a must if you don't want to end with binary data instead which is doable like in R2, but ugly. | |
BrianH: 10-Dec-2011 | I wish you luck with World. It may be a bit difficult for me to use it though, because of the ASCII strings. Any language that isn't built from scratch with Unicode strings, instead having them retrofitted later when the inevitible need to support users outside the the English-speaking world, will have a great deal of problems. Python 3 is just the latest example of the problems with not having a well-thought-through Unicode string model. One of the best parts of R3 is how well we handled the Unicode transition. | |
Geomol: 11-Dec-2011 | My view is, implementing unicode everywhere will add to unnecesssary complexity. Each such level of complexity is a sure step to downfall. My first rule of development is simplicity, then performance, then low footprint, then maybe features. Words in World can hold 7-bit ASCII. Chars and strings can hold 8-bit characters. That's the level of simplicity, I aim at. I will have to deal with unicode, of course, and I'll do that, when World is a bit more mature. There could be a unicode! datatype. | |
Geomol: 13-Dec-2011 | That's cool, Brian! :) A note about KWATZ!, you suggest it to be text!, but it's not quite. It sure can be e.g. UTF-8 data: (Setting my Terminal program to character encoding Unicode (UTF-8) and trying to load 3 ASCII letters, 3 danish letters and 3 greek letters) w> load "abc ?????? ??????" == [abc #{C3A6C3B8C3A5} #{CEB1CEB2CEB3}] (Notice World isn't prepared to unicode yet, but can load it, as it can just be seen as bytes.) But beside text, KWATZ! can also handle all other data, like escape codes or any binary format maybe combined with more understandable data, you wanna load. | |
Group: REBOL Syntax ... Discussions about REBOL syntax [web-public] | ||
BrianH: 17-Feb-2012 | All of the syntax characters in R3 fit in the ASCII range. That is why there are no Unicode delimiters, such as the other space characters. | |
BrianH: 23-Feb-2012 | That's a good start! I'm really curious about whether ulrs and emails deal with chars over 127, especially in R3. As far as I know, the URI standards don't support them directly, but various internationalization extensions add recodings for these non-ASCII characters. It would be good to know exactly which chars supported in the data model, so we can hack the code that supports that data to match. |
101 / 201 | 1 | [2] | 3 |