• Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

AltME groups: search

Help · search scripts · search articles · search mailing list

results summary

worldhits
r4wp28
r3wp173
total:201

results window for this page: [start: 101 end: 200]

world-name: r3wp

Group: All ... except covered in other channels [web-public]
Louis:
31-Oct-2006
I'm rather badly needing a pagemaker pm5 file converted to ASCII 
format. My copy of Pagemaker has been corrupted, and I just want 
to print a document using LaTeX. The file is about 309 MB. Is there 
anyone here that can do this for me?
Group: !AltME ... Discussion about AltME [web-public]
PeterWood:
16-Jan-2011
It's not really a problem if you remember that AttME is designed 
to support 7-bit ASCII across operating systems.


It's just these users that wan't to write some fancy characters ;-)
Group: Core ... Discuss core issues [web-public]
Gordon:
29-Sep-2006
When you import data using "data: read/binary {sometextfile}" you 
seem to get a string of hex values.  Ex:

probe 'data' of a file containg the word "Hello"  results in #{48656C6C6F} 
but if you 

probe first data it returns 72.  So when you probe the entire data 
stream it returns it in hexidecimal format
but when you probe each character it returns a decimal value.


At any rate how do you convert the characters in the variable 'data' 
back into ASCII values?  IOW, how do you convert the decimal value 
of 72 back into an "H" or the #{48656C6C6F} back into "Hello"?
Maxim:
19-Oct-2006
not saying /lines has an issue, but I have loaded 700MB ascii files 
on a 1GB RAM computer... 150 is peanuts.  but I never use the /lines 
argument.
Jerry:
20-Oct-2006
The following code: 

unicode-to-ascii: func [ from to /local fs ts sz] [
    fs: open/binary/direct/read from
    ts: open/binary/direct/write to
    sz: size? from
    fs/1 fs/1 ; discard the first two bytes, FFFE
    for i 3 sz 2 [
        append ts to-char fs/1 
        fs: skip fs 1 ; SKIP is the problem
    ]
    close fs
    close ts
]
unicode-to-ascii %/c/Unicode.txt %/c/Ascii.txt

In REBOL/View 1.2.7.3.1 12-Sep-2006 Core 2.6.0
** CRASH (Should not happen) - Expand series overflow

In REBOL/View 1.3.2.3.1 5-Dec-2005 Core 2.6.3
** Script Error: Not enough memory
** Where: do-body
** Near: fs: skip fs 1
Rebolek:
2-Nov-2007
I need to sort some French words but REBOL's SORT puts accented characters 
on the end (sorts just by ASCII). Has anybody got some enhanced SORT 
for French?
james_nak:
18-Feb-2008
Slight change of subject but here I  am all happy saving/all  and 
loading my objects and then it hits me: Just what is this "serialized" 
data? How is it different (outside of that fact that it's ascii representation 
is different.) I don't know if I need to know to use it but in case 
I'm ever on TV I want to answer it correctly.
Louis:
20-Sep-2008
Ok, I found the problem. When I saved the file with the Windows program, 
I saved it in utf8 format. Resaving it in ascii format solved the 
problem.  I realized the problem with I noticed some Chinese characters 
in the output past what I pasted in above.
BrianH:
5-Mar-2009
kib2: "Does that mean that we can use unicode encoding with the help 
of r2-forward ?"

No, I only can only spoof datatypes that don't exist in R2, and R2 
has a string! type. The code should be equivalent if the characters 
in the string are limited to the first 256 codepoints of Unicode 
(aka Latin-1), though only the first 128 codepoints (aka ASCII) can 
be converted from binary! to string and have the binary data be the 
same as minimized UTF-8.
Sunanda:
30-May-2009
I have a printable? function that checks if a string has only ASCII 
printable characters. Would that meed your need, Maxim?
BrianH:
30-Jan-2010
ascii?: funct [

 "Returns TRUE if value or string is in ASCII character range (below 
 128)."
	value [string! file! email! url! tag! issue! char! integer!]
] compose [
	ascii: (charset [#"^(00)" - #"^(7F)"])

 either any-string? value [parse/all/case value [any ascii]] [value 
 < 128]
]
; Note: Native in R3.
BrianH:
30-Jan-2010
invalid-utf?: funct [

 "Checks for proper UTF encoding and returns NONE if correct or position 
 where the error occurred."
	data [binary!]
	/utf "Check encodings other than UTF-8"
	num [integer!] "Bit size - positive for BE negative for LE"
] compose [
	ascii: (charset [#"^(00)" - #"^(7F)"])
	utf8+1: (charset [#"^(C2)" - #"^(DF)"])
	utf8+2: (charset [#"^(E0)" - #"^(EF)"])
	utf8+3: (charset [#"^(F0)" - #"^(F4)"])
	utf8rest: (charset [#"^(80)" - #"^(BF)"])
	switch/default any [num 8] [
		8 [ ; UTF-8
			unless parse/all/case data [(pos: none) any [
				pos: ascii | utf8+1 utf8rest |
				utf8+2 2 utf8rest | utf8+3 3 utf8rest
			]] [as-binary pos]
		]
		16 [ ; UTF-16BE
			pos: data
			while [not tail? pos] [
				hi: first pos
				case [
					none? lo: pick pos 2 [break/return pos]
					55296 > w: hi * 256 + lo [pos: skip pos 2]  ; #{D800}
					57343 < w [pos: skip pos 2]  ; #{DFFF}
					56319 < w [break/return pos]  ; #{DBFF}
					none? hi: pick pos 3 [break/return pos]
					none? lo: pick pos 4 [break/return pos]
					56320 > w: hi * 256 + lo [break/return pos]  ; #{DC00}
					57343 >= w [pos: skip pos 4]  ; #{DFFF}
				]
				none
			] ; none = valid, break/return pos = invalid
		]
		-16 [ ; UTF-16LE
			pos: data
			while [not tail? pos] [
				lo: first pos
				case [
					none? hi: pick pos 2 [break/return pos]
					55296 > w: hi * 256 + lo [pos: skip pos 2]  ; #{D800}
					57343 < w [pos: skip pos 2]  ; #{DFFF}
					56319 < w [break/return pos]  ; #{DBFF}
					none? lo: pick pos 3 [break/return pos]
					none? hi: pick pos 4 [break/return pos]
					56320 > w: hi * 256 + lo [break/return pos]  ; #{DC00}
					57343 >= w [pos: skip pos 4]  ; #{DFFF}
				]
				none
			] ; none = valid, break/return pos = invalid
		]
		32 [ ; UTF-32BE
			pos: data
			while [not tail? pos] [
				if any [
					4 > length? pos
					negative? c: to-integer pos
					1114111 < c  ; to-integer #{10FFFF}
				] [break/return pos]
			]
		]
		-32 [ ; UTF-32LE
			pos: data
			while [not tail? pos] [
				if any [
					4 > length? pos

     negative? c: also to-integer reverse/part pos 4 reverse/part pos 
     4
					1114111 < c  ; to-integer #{10FFFF}
				] [break/return pos]
			]
		]
	] [
		throw-error 'script 'invalid-arg num
	]
]

; Note: Native in R3, which doesn't support or screen the /utf option 
yet.

; See http://en.wikipedia.org/wiki/Unicodefor charset/value explanations.
Geomol:
24-May-2010
The only way using SWITCH, I see, is to operate with ascii values, 
and that isn't good.
Henrik:
13-Jun-2010
ascii: charset [#"^(00)" - #"^(7F)"]

ascii-rule: [
      copy transfer [ascii some ascii] ( ; <- problem
        head insert tail output-string transfer
      )
    ]


This rule does not look correct. I replaced [ascii some ascii] with 
[some ascii] and now it works.
Graham:
15-Sep-2010
ascii printable characters ... we are talking about saving ink here!
Group: View ... discuss view related issues [web-public]
Gabriele:
4-Dec-2006
insert is a word because there is no char for it in ascii; there 
is a char for delete, so it's a char :)
Jerry:
9-Dec-2006
Gabriele, 

Actually, Oldes is right. Showing two-byte characters is good enough. 
IME is not necessary for REBOL/View, because every Chinese/Japanese/Korea 
OS has proper IMEs installed. IME sends the codes encoded in the 
OS codepage to the focused window. For Example, If the codepage used 
by Windows XP is Big5 and I type in the Character which means one 
( #{A440} in Big5, #{4E00} in Unicode, see http://www.unicode.org/cgi-bin/GetUnihanData.pl?codepoint=4E00
), my REBOL/View program will get two key events sequentially, which 
are #{A4} and #{40}. REBOL/View shows it as two characters instead 
of one. I hope that REBOL/View can let the OS do the text-drawing, 
like the REBOL/core console does. REBOL/core console doesn't have 
the Chinese-Character-Showing issue, because it basically send the 
#{A4} and #{40} to the console, and let the OS do the text-drawing. 
the OS knows that #{A4} and #{40} should be combine to one Big5 Character, 
so it just show it as one character. Of course, if I type in two 
ASCII characters, the OS is smart enough not to combine them into 
one "non-existing" Big5 Character. CJK encodings are supersets of 
ASCII, just like UTF-8 is a superset of ASCII.


It's nothing to do with Unicode, so it is not too difficult to fix, 
I guess. Please fix this in 2.7.5 or 2.7.6 please ... 

It's on my wish list for Santa Claus this year.
Sunanda:
3-Jan-2009
Base 64 in REBOL is, basically, a type of ASCII represention. It 
can stand a certain amount of damage (like whitespace being inserted 
-- imagine it is sent as an email) and can still be reconstructed:
    str: "abcdefabcdef"      ;; a string
    s64: enbase str    ;; enbased to base-64 by default

    replace/case/all s64 "W" "  W  "   ;; whitespace polluted in transit
    str = to-string debase s64    ;; do we get it back intact?
Group: I'm new ... Ask any question, and a helpful person will try to answer. [web-public]
Gregg:
21-Jun-2009
In my largest grammar, where incoming data may be malformed, I've 
found it invaluable to have the rule tracing built in, enabled by 
a flag. e.g. 

    TSAFE-CHAR: [
        (rule-trace "TSAFE-CHAR IN")
        copy =TSAFE-CHAR charset-21
        | charset-22
        | charset-23
        | charset-24
        | charset-25
        | NON-US-ASCII
        (rule-trace "TSAFE-CHAR OUT")
    ]

    rule-trace: func [value /local rule-name action] [
        rule-name: first parse value none

        ;print [tab rule-name tab found? find don't-trace rule-name]
        action: second parse value none
        if all [
            any [
                trace-rules? = true
                action = form trace-rules?
            ]
            not found? find don't-trace rule-name
        ][
            val: attempt [mold get to word! join "=" rule-name]
            print ["===" value  any [val ""]]
        ]
    ]


Don't-trace allows you to turn off selected rules that may get called 
a lot. You could also set tracing levels if you wanted.
joannak:
26-Dec-2009
I have no plans on jumping into R3 at this point, since there are 
so much even on R2 I need to learn. But for the future reference, 
is there any plan for a tool (or mode in Rebol itself) to help Flagging 
out those R2->3 differences... For example, I remember seeing that 
PICK works differently on R3 (right, unlike R2 which is offsetted 
by one), it'll be quite hard to spot all those from source alone, 
since parameters are often defined at runtime?


Some changes will of course be obvious (for spotting), like sockets, 
since their parameters have been canged a lot. but difference on 
data readiding/writing (ascii/binary/unicode etc) may hide itself 
quite a while.
Davide:
30-Jun-2010
>> append #{} 15
== #{3135}
>> append #{} "15"
== #{3135}


Why if I append an integer to a binary it is first converted to an 
ascii string?

IMHO it should be like this:
>> append #{} to-char 15
== #{0F}
Anton:
2-Aug-2010
Then there's ascii char 160, which you can generate in rebol with 
to-char 160. I think they call it a 'hard space' or something.
Endo:
8-Dec-2011
then I found a very simple way to convert a unicode file to ascii 
in DOS,
TYPE my-unicode-file > my-ascii-file


This line converts the file to ascii, just non-convertable characters 
looks wierd but rest is ok.
Group: Parse ... Discussion of PARSE dialect [web-public]
Chris:
22-Oct-2009
Is there any advantage in breaking up charsets that represent a large 
varied range of the 16-bit character space? For example, XML names 
are defined as below (excluding > 2 ** 16), but are most commonly 
limited to the ascii-friendly subset:

	w1: charset [

  #"A" - #"Z" #"_" #"a" - #"z" #"^(C0)" - #"^(D6)" #"^(D8)" - #"^(F6)" 
  #"^(F8)" - #"^(02FF)"

  #"^(0370)" - #"^(037D)" #"^(037F)" - #"^(1FFF)" #"^(200C)" - #"^(200D)" 
  #"^(2070)" - #"^(218F)"

  #"^(2C00)" - #"^(2FEF)" #"^(3001)" - #"^(D7FF)" #"^(f900)" - #"^(FDCF)" 
  #"^(FDF0)" - #"^(FFFD)"
	]
	w+: charset [

  #"-" #"." #"0" - #"9" #"A" - #"Z" #"_" #"a" - #"z" #"^(B7)" #"^(C0)" 
  - #"^(D6)"

  #"^(D8)" - #"^(F6)" #"^(F8)" - #"^(037D)" #"^(037F)" - #"^(1FFF)" 
  #"^(200C)" - #"^(200D)"

  #"^(203F)" - #"^(2040)" #"^(2070)" - #"^(218F)" #"^(2C00)" - #"^(2FEF)" 
  #"^(3001)" - #"^(D7FF)"
		#"^(f900)" - #"^(FDCF)" #"^(FDF0)" - #"^(FFFD)"
	]
	word: [w1 any w+]
Chris:
22-Oct-2009
Both w1 and w+ appear to be very large values.  Would it be smart 
to perhaps do:

	[[aw1 | w1] any [aw+ | w+]]

Where 'aw1 and 'aw+ are limited to ascii values?
Maxim:
20-Sep-2010
claude... so, did you try to run it as a script?


one thing I would do... since this is a strange error is to retype 
this:

 " ^-"


in your editor... to make sure its not using the wrong ascii character 
on your os... it should not be a problem... but there is something 
weird going on here.
Group: !RebGUI ... A lightweight alternative to VID [web-public]
Ashley:
17-Feb-2007
Use REBOL/Core and an ASCII interface then! ;)
Group: Tech News ... Interesting technology [web-public]
Pavel:
8-Apr-2011
in Datamatrix definition is written capacity of max 2335 bytes per 
one symbol of size 144x144 pixels, with some inbuilt compression 
it can be 3116 ascii characters (readable chars are ess than 8bit 
encoded), scanner may read mutiple symbols at once. much more importand 
characteristics is using reed-solomon self repairing code to ensure 
readability up to 30% picture damage for each symbol.
Group: !REBOL3-OLD1 ... [web-public]
Graham:
10-Oct-2007
ascii art
Pekr:
14-Dec-2007
as for UTF8 - is it compatible to current +128 char extension? I 
mean e.g. czech alphabet uses special characters above 128 ASCII 
value ....
BrianH:
14-Dec-2007
UTF-8 is a strict extention of ASCII, but ASCII is only defined between 
0 and 127. Characters 128+ are not ASCII, they are extensions, and 
their meaning depends on the codepage. The codepage of an 8-bit string 
is unknown, unless you specify it externally (or do a lot of statistical 
calculations). Strings or scripts with characters extended by a codepage 
will have to be translated by a codepage-to-utf-8 function or process 
specific to the particular codepage, ahead of time. Fortunately, 
such a process can be fast and even implemented in a byte-oriented 
language easily.
BrianH:
14-Dec-2007
ASCII characters fit in one byte, the rest take some more. It can 
progress up to 5 bytes but those are rare.
PeterWood:
7-Dec-2008
I can understand how Pekr, Graham and many others feel about the 
lack of R3 releases especially given all the early announcements 
and the first public alpha. However, I really feel that Carl is still 
prototyopng R3, he is a long way from settling on the design of R3. 
There is too much missing for the current version to be considered 
an Alpha (e.g. No modules, no threads, ASCII GUI, Host environment 
& Runtime Core in a single executable,)


This is seems to be Carl's way of working and something that he has 
to work through step by step.
BrianH:
31-Dec-2008
I would not trust non-ascii characters for now. With any luck the 
server saves the messages as binary UTF-8, don't know yet.
PeterWood:
1-Jan-2009
Not trusting non-ascii characters infers  that the current desing 
of  RebDev is "ignorant"of character encoding. If that is the case, 
it is a shame as RebDev could have been a great example of an "up-to-date" 
application built with R3.
BrianH:
2-Jan-2009
That would have to be the case with R2 clients, as the client is 
the part that handles character encoding. However, there are no R2 
clients yet. The messages appear to be UTF-8 encoded end-to-end, 
stored in binary on the server, which is encoding agnostic. Once 
we have R2 clients, they will have to handle the codepage-to-UTF-8 
encoding, or just stick to ASCII.
btiffin:
3-Jan-2009
If I was a betting man, by 2020 UTF-8 will reign and compsci grads 
will need a history book to learn about ASCII.
Chris:
4-Jan-2009
Brian -- ASCII is a subset of UTF-8...
Chris:
4-Jan-2009
With QM, I try to assume (and enforce) UTF-8 (declaring on forms, 
html escaping everything ASCII+), but it's definitely a chore.
Maxim:
7-Jan-2009
but load can also understand just about all human readable ascii 
data ALSO.
Maxim:
7-Jan-2009
then, maybe, although it does the same thing AS load, it wouldn't 
be used by the interpreter, and would explicitely allow the interpreter 
to use the loading functionality which already understands about 
95% of human readable ascii text as it is.
Maxim:
7-Jan-2009
really brian, I can't recall how many times I've had ascii files 
from sources which I could almost just load as-is.  and when the 
extra syntax, was just useless decoration, which can be ignored.
[unknown: 5]:
21-Jan-2009
It can be used on binary data as well as ascii data and will carve 
out the blocks of the buffer.
kib2:
15-Feb-2009
BrianH: ok, thanks. What about  allowing ASCII chars in user names 
until it's really finished?
Gabriele:
21-Apr-2009
Now, if your array was representing a url, you could encode it to 
UTF-8 using the % encoding as well to stay in the ascii subset. This 
is encoding, but still, it will not solve your @ problem. each @ 
in the array of integers will become an @ (which is an ascii char) 
in the final string.
Geomol:
31-Jul-2009
Some languages only allow 7-bit ascii in the source except for strings.
BrianH:
31-Jul-2009
All standard functions and syntax in REBOL fit within 7-bit ASCII, 
which is why R3 source is UTF-8.
Maxim:
11-Sep-2009
optionally encoding them in ascii first... http headers are ascii.
Maxim:
11-Sep-2009
the header MUST be printed out in ASCII.
Maxim:
11-Sep-2009
askin = ascii
Maxim:
11-Sep-2009
AFAIK unicode -> ascii is possible in R3 but don't know how... not 
having done it myself.  IIRC its on the R3 wiki or docs pages somehow.... 
googling it should give you some clues.
Pekr:
11-Sep-2009
REBOL 3.0 accepts UTF-8 encoded scripts, and because UTF-8 is a superset 
of ASCII, that standard is also accepted.

If you are not familiar 
with the UTF-8 Unicode standard, it is an 8 bit encoding that accepts 
ASCII directly (no special encoding is needed), but allows the full 
Unicode character set by encoding them with characters that have 
values 128 or greater.
Pekr:
11-Sep-2009
It should accept Ascii directly ....
Maxim:
11-Sep-2009
string! printing, to be more precise.  UTF and ASCII are  converted 
to two byte strings IIRC.  which is why you must re-encode them before 
spitting them via print.
Maxim:
11-Sep-2009
maybe peter's excellent encoding script on rebol.org could be used 
as a basis for converting between ascii -> utf8  when using R3 binary 
 as an input.  while R3 has them built-in
Maxim:
11-Sep-2009
sort of like:

print to-ascii to-binary "some text"
Pekr:
11-Sep-2009
But this is some low level issue I should not care about. It displays 
Czech codepage correctly. Also the script is said being by default 
UTF-8, which is superset to ASCII. IIRC it was said, that unless 
we will not use special chars, it will work transparently. If it 
works on input, it should work also on output, no?
Pekr:
11-Sep-2009
OK, so we have http headers, which are supposed to be in ASCII, and 
then html content, which can be encoded. Which responsibility is 
it to provide correct encoding? A coder, or an http server? Hmm, 
maybe coder, as I am issuing http content headers in my scripts?
BrianH:
11-Sep-2009
The trick is that the headers are pushed in ASCII, but the contents 
in whatever binary encoding the headers specify.
Pekr:
11-Sep-2009
how is that Linux and OS-X don't experience any problems? They do 
use UTF-8, but that is not ASCII either, no?
Maxim:
11-Sep-2009
UTF lower's 127 odes are the same as ASII and single byte.  so if 
you don't use special chars, or the null char, you are basically 
dumping ASCII... this is the reason for its existence.
Maxim:
11-Sep-2009
IIRC the whole windows API is either ASCII or UTF-16.
BrianH:
8-Oct-2009
CGI output should be binary, and the headers output in 7bit ASCII 
(not UTF-8) through that binary output.
Pekr:
30-Oct-2009
if in ascii, it will be loaded ok, no?
Maxim:
30-Oct-2009
ascii is 127 bytes... we are talking about the upper 127 chars.
Pekr:
30-Oct-2009
Ascii is 255 ;-)
Maxim:
30-Oct-2009
upper 127 are NOT ascii.
Maxim:
30-Oct-2009
http://en.wikipedia.org/wiki/ASCII
Maxim:
30-Oct-2009
if you only use ascii (lower 127 chars)  you will see no difference.
Maxim:
30-Oct-2009
hum... cause everything I use is ascii or latin-1 ?
Maxim:
30-Oct-2009
but utf-8 editors aren't rare nowadays, and using utf-8 sequences 
isn't hard... really, if you tuely want to keep using as ascii editor
Maxim:
30-Oct-2009
handling encoding is complex in any environment... I had a lot of 
"fun" handling encodings in php, which uses such a unicode datatype... 
its not really easier... cause you can't know by the text if its 
unicode or ascii or binary values unless you tell it to load a sequence 
of bytes AS one or the other.
PeterWood:
30-Oct-2009
A script cpud have two different encodings if differenlty encoded 
files were included. For example, you could use a script from Rebol.org 
in one of your scripts. You probably use Windows Code Page 1250 but 
most scripts in the library use other encodings.


This doesn't give big problems as most of the code in the Library 
is "pure" ASCII
Maxim:
1-Nov-2009
actually, it is a problem in R2.  if you store your code, and I open 
it with a different codepage version of windows... some letters will 
be skewed. 


In an application I wrote, I couldn't write out proper strings for 
the netherlands, as an example.


unicode is slowly becoming the standard for text... especially utf-8. 
 but yes, users have to be educated.  


within your apps, though, you can handle the encoding as you want... 
only the rebol sources have to be UTF-8 .  as R3 matures, more encodings 
will be most probably be included in string codecs to support 8 bit 
Extended ascii from different areas of the world.


and even high-profile applications like Apple's iweb have issues 
with text encoding... so this is a problem for the whole industry 
& users to adapt to.
Geomol:
16-Dec-2009
In R2:
>> to binary! 10000
== #{3130303030}

So we get the ascii value of each digit in the number. In R3:

>> to binary! 10000
== #{0000000000002710}


The number is seen as a 64-bit integer, and we get the binary representation 
of that.
Group: !Cheyenne ... Discussions about the Cheyenne Web Server [web-public]
Dockimbel:
17-Sep-2009
Btw, in order to forge emails to be sent, I've tried to rely on REBOL's 
builtin email support functions (big mistake!). You should know that 
they *are not* RFC compliant, the biggest issues being :


- emails produces by REBOL are using LF as EOL instead of CRLF (RFC 
2822). See http://cr.yp.to/docs/smtplf.html

- headers are not encoded for non ASCII-7bits characters (RFC 2047)


So, I've deeply patched the builtin code at runtime to workaround 
this, but, I should have better rewrote it all from scratch (that's 
what I intend to do when I'll have enough free time).
PeterWood:
7-Jan-2011
Alan, I'm logged in to AltME from Ubuntu - so many non-ascii characters 
get displayed incorrectly. In your script the closing double-quote 
after /jsontest.cgi doesn't display properly. Perhaps you could check 
that it really is a double-quote and not a "smart-quote" in the actual 
source.
Group: !REBOL2 Releases ... Discuss 2.x releases [web-public]
BrianH:
2-Jan-2010
OK, now that we have 2.7.7 released (even though there is more work 
to do, i.e. platforms and the SDK), it is time to look ahead to 2.7.8 
- which is scheduled for release in one month on February 1. The 
primary goal of this release is to migrate to REBOL's new development 
infrastructure. This means:

- Migrating the RAMBO database to a new CureCode project and retiring 
RAMBO.

- Using Carl's generation code for the manual to regenerate the R2 
manual, so we can start to get to work updating it.

- Porting the chat client to R2 using the new functions and building 
a CHAT function into R2 similar to the R3 version.


The R2 chat client might be limited to the ASCII character set, though 
support for the Latin-1 character set might be possible. Still text 
mode for now, though if anyone wants to write a GUI client (Henrik?) 
we can put it on the official RT reb site accessible from the View 
desktop. The server is accessed through a simple RPC protocol and 
is designed to be easily scriptable.


It turns out that Carl already rewrote the installer for 2.7.something, 
but it was turned off because of a couple minor bugs that we were 
able to fix in 2.7.7. With any luck, only minor fixes to the registry 
usage will be needed and we'll be good to go.


As for the rest, it's up to you. Graham seems to have a good tweak 
to the http protocol, and others may want to contribute their fixes.
Group: !REBOL3 Extensions ... REBOL 3 Extensions discussions [web-public]
Robert:
28-Nov-2009
Playing with the extension example: IMO it's done  to complicated.


- Why do I need make-ext.r? Do I always need it or just for this 
specific example?

- Why is the init block a const char array and not just a plain ASCII 
text?
Oldes:
11-Nov-2010
So with Cyphre's help I have this function:
char* rebser_to_utf8(REBSER* series) {
    char *uf8str;
    REBCHR* str;
    REBINT result = RL_GET_STRING(series, 0 , (void**)&str);
        
    if (result > 0){
        //unicode string
        int iLen = wcslen(str);
        int oLen = iLen *  sizeof(REBCHR);
        uf8str = malloc(oLen);

        int result = WideCharToMultiByte(CP_UTF8, 0, str, iLen, uf8str, oLen, 
        0, 0);
        if (result == 0) {
            int err = GetLastError();
            RL->print("err: %d\n", err);
        }
    } else if (result < 0) {
        //bytes string (ascii or latin-1)
        uf8str = malloc(strlen((char *)str));
        strcpy(uf8str, (char *)str);
    }
    return uf8str;
}

and I can than use:
..
            char *filename = rebser_to_utf8(RXA_SERIES(frm, 1));
            status=MagickReadImage(current_wand, filename);
            free(filename);
            if (status == MagickFalse) {
                ThrowWandException(current_wand);
            }
            return RXR_TRUE;
Oldes:
11-Nov-2010
This seems to be working:
char* REBSER_to_UTF8(REBSER* series) {
    char *uf8str;
    REBCHR* str;
    REBINT result = RL_GET_STRING(series, 0 , (void**)&str);
        
    if (result > 0){
        //unicode string
        int iLen = wcslen(str);
        //int oLen = iLen *  sizeof(REBCHR);

        int oLen = WideCharToMultiByte( CP_UTF8, 0, str, -1, NULL, 0,  NULL, 
        NULL);
        uf8str = malloc(oLen);

        int result = WideCharToMultiByte(CP_UTF8, 0, str, iLen, uf8str, oLen, 
        0, 0);
        if (result == 0) {
            int err = GetLastError();
            RL->print("err: %d\n", err);
        }
        uf8str[oLen] = 0;
    } else if (result < 0) {
        //bytes string (ascii or latin-1)
        uf8str = strdup((char *)str);
    }
    return uf8str;
}
Group: !REBOL3 ... [web-public]
Henrik:
26-Oct-2010
That is, I get ?? along with a few other chars that I'm not sure 
are outside the ascii range.
BrianH:
18-Nov-2010
One thing will definitely be easier though: JSON and Javascript define 
that they have Unicode source, but don't have a way to specify the 
encoding (they are text standards, not binary). They can be handled 
easily in R3 once the source is converted to a string though, since 
that conversion will handle the encoding issues. In R2 you'd have 
to either stick to ASCII data or use Gabriele's text codecs and then 
parse the UTF-8.
Pavel:
3-Dec-2010
An idea of NTP scheme, but servers comunicates only on 123 UDP port. 
overview of time services:

Daytime:  Ascii response,  Graham and Ladislav has written a scheme/tool 
already  port 13

Time: most simple possible server listening on port  37 answer 32bit 
unsigned number of second from 1-1-1900/0:00 (calculation of human 
readable date is not so trivial because of leaping seconds inserted 
to UTC with no rule at all, an Earth is dancing a Jive in fact)

HTTP: use inserted Date-time from any header returned from server 
port 80

SNTP: more precise protocol (contains also fraction of second in 
reply) subprotocol of NTP on UDP port 37

NTP: most precise available to compare more time servers, and calculate 
with computed transport delay and phase shift from evaluated couple 
of handshaking packets.  UDP port 37

The latter two use minimally 12  32bit binary packets for request 
and response, symmetric or asymetric cryptography possible (honestly 
I've no clue why this).
BrianH:
17-Feb-2011
I'm experimenting to determine the exact syntax of words in R3, and 
see whether there are any undiscovered bugs. Just sticking to ASCII 
for now - due to http://issue.cc/r3/1230- but things look promising 
so far. I'll convert the results to PARSE rules.
Group: !REBOL3 Host Kit ... [web-public]
Oldes:
10-Jan-2011
RL_GET_STRING returns number > 0 if the source is unicode and < 0 
if ascii
Group: Core ... Discuss core issues [web-public]
Ladislav:
16-Oct-2010
not to mention, that I could have put in all 127 ASCII characters
BrianH:
16-Oct-2010
Oh, and not just ASCII; full Unicode.
Gabriele:
6-Nov-2010
well... enbase just converts binary (8-bit) data to a form that is 
ascii printable. it does not say anything about what the 8-bit data 
contains.
Group: Red ... Red language group [web-public]
BrianH:
29-Mar-2011
Doc, by multibyte chars I wasn't talking about variable-size, I was 
talking about fixed-size with Unicode support. A char! would have 
a single size, but that size would either be 1, 2 or 4 bytes depending 
on whether the base plarform supports ASCII, Unicode2 or full Unicode.
Andreas:
29-Mar-2011
US ASCII only defines 128 characters.
BrianH:
29-Mar-2011
It still doesn't handle the full set of Unicode, just ASCII, but 
I can reverse the charsets to be complemented opposites and it will 
handle those too.
BrianH:
11-Oct-2011
http://issue.cc/r3/1302for the ASCII range in R3. The R3 parser 
tends to be excessively forgiving outside the ASCII range, accepting 
too much, though I haven't done the thorough test.
Group: World ... For discussion of World language [web-public]
Geomol:
2-Dec-2011
The lexer is 7 bit, so words can only hold 7-bit ascii characters. 
String and other data is 8-bit.
Oldes:
2-Dec-2011
Words are probably ok as ascii, but unicode! datatype is a must if 
you don't want to end with binary data instead which is doable like 
in R2, but ugly.
BrianH:
10-Dec-2011
I wish you luck with World. It may be a bit difficult for me to use 
it though, because of the ASCII strings. Any language that isn't 
built from scratch with Unicode strings, instead having them retrofitted 
later when the inevitible need to support users outside the the English-speaking 
world, will have a great deal of problems. Python 3 is just the latest 
example of the problems with not having a well-thought-through Unicode 
string model. One of the best parts of R3 is how well we handled 
the Unicode transition.
Geomol:
11-Dec-2011
My view is, implementing unicode everywhere will add to unnecesssary 
complexity. Each such level of complexity is a sure step to downfall. 
My first rule of development is simplicity, then performance, then 
low footprint, then maybe features.


Words in World can hold 7-bit ASCII. Chars and strings can hold 8-bit 
characters. That's the level of simplicity, I aim at.


I will have to deal with unicode, of course, and I'll do that, when 
World is a bit more mature. There could be a unicode! datatype.
Geomol:
13-Dec-2011
That's cool, Brian! :)

A note about KWATZ!, you suggest it to be text!, but it's not quite. 
It sure can be e.g. UTF-8 data:


(Setting my Terminal program to character encoding Unicode (UTF-8) 
and trying to load 3 ASCII letters, 3 danish letters and 3 greek 
letters)

w> load "abc ?????? ??????"
== [abc #{C3A6C3B8C3A5} #{CEB1CEB2CEB3}]


(Notice World isn't prepared to unicode yet, but can load it, as 
it can just be seen as bytes.)


But beside text, KWATZ! can also handle all other data, like escape 
codes or any binary format maybe combined with more understandable 
data, you wanna load.
Group: REBOL Syntax ... Discussions about REBOL syntax [web-public]
BrianH:
17-Feb-2012
All of the syntax characters in R3 fit in the ASCII range. That is 
why there are no Unicode delimiters, such as the other space characters.
BrianH:
23-Feb-2012
That's a good start! I'm really curious about whether ulrs and emails 
deal with chars over 127, especially in R3. As far as I know, the 
URI standards don't support them directly, but various internationalization 
extensions add recodings for these non-ASCII characters. It would 
be good to know exactly which chars supported in the data model, 
so we can hack the code that supports that data to match.
101 / 2011[2] 3