• Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

AltME groups: search

Help · search scripts · search articles · search mailing list

results summary

worldhits
r4wp32
r3wp11
total:43

results window for this page: [start: 1 end: 43]

world-name: r4wp

Group: #Red ... Red language group [web-public]
DocKimbel:
24-Sep-2012
Yes, Latin-1 / UCS-2 / UCS-4
DocKimbel:
8-Nov-2012
A series buffer has header, with OFFSET and TAIL pointers that define 
respectively the begin and end of series slots. The OFFSET pointer 
allow to reserve space at head of the series for optimizing insertions 
at head. Series slots size can be 1 (binary/UTF-8/Latin-1), 2 (UCS-2), 
4 (UCS-4) or 16 (value!) bytes wide.
PeterWood:
27-Mar-2013
Actually, Gregg's test works under OS X:

Schulz:Red peter$ ./console

-=== Red Console alpha version ===-
(only Latin-1 input supported)

red>> s: copy ""
== ""
red>> append/dup s #" " 10
== "          "
red>> length? s
== 10
DocKimbel:
17-Apr-2013
BTW, if you stick to Latin-1, you shouldn't have the need for any 
conversion?
DocKimbel:
17-Apr-2013
From a routine, if str is a red-string! pointer, this is the dispatch 
code you would need to use:

s: GET_BUFFER(str)
switch GET_UNIT(str) [
	Latin-1 [...conversion code...]
	UCS-2 [...conversion code...]
	UCS-4 [...conversion code...]
]
Kaj:
26-Apr-2013
I found out that not only does Red not support Unicode, it doesn't 
support Latin-1, not even on Windows
Kaj:
26-Apr-2013
Both the compile time and runtime lexers don't let Latin-1 through
Kaj:
26-Apr-2013
-=== Red Console alpha version ===-
(only Latin-1 input supported)

red>> s: "Español"
== "Espa"
red>> length? s
== 4
Kaj:
26-Apr-2013
A very similar thing happens when I paste Latin-1 into Windows
Kaj:
26-Apr-2013
Here's what happens when I try to compile Latin-1 source code:
DocKimbel:
26-Apr-2013
You can't paste UTF-8 in the console, it supports only Latin-1.
Kaj:
26-Apr-2013
Yes, so the Latin-1 promise is false
Kaj:
26-Apr-2013
You can't paste Latin-1
DocKimbel:
26-Apr-2013
Are you sure you're pasting Latin-1 and not UTF-8?
Kaj:
26-Apr-2013
string/load can only load UTF-8, so only ASCII and UTF-8 files can 
be read, not Latin-1
Kaj:
26-Apr-2013
For example, there's an internal single byte encoding that's marked 
"Latin1", but I now know there is no way to get Latin-1 data in or 
out, so I wonder if this encoding will ever be used for more than 
7-bit ASCII
Kaj:
26-Apr-2013
Actually, I did one test that confirms Andreas' statement. The only 
way to get 8-bit data in is to compile a UTF-8 string literal that 
fits into Latin-1
Kaj:
26-Apr-2013
No, the console says you can input Latin-1, and you can't, not even 
through UTF-8
Kaj:
26-Apr-2013
Neither can you compile Latin-1 nor read Latin-1 files nor other 
source data
Kaj:
26-Apr-2013
-=== Red Console alpha version ===-
(only Latin-1 input supported)

red>> s: "Español"
== "Espa"
red>> length? s
== 4
Kaj:
26-Apr-2013
Yes, and neither works, so there is no Latin-1 support at all, except 
in a corner case internally
Kaj:
26-Apr-2013
Yes, and as you say, it's mislabeled Latin1, so there were several 
thing leading me to believe that Red already had Unicode and Latin-1 
support
Kaj:
26-Apr-2013
Yes, I think it's very dangerous to claim that Red has Unicode and 
Latin-1 support
DocKimbel:
28-Apr-2013
1) "I found out that not only does Red not support Unicode, it doesn't 
support Latin-1, not even on Windows" Red *does* "support" Unicode, 
Latin-1 "support" was not claimed in Red, except for the console 
script. I've put quotes around support word, because you're mixing 
up internal representation and I/O encoding formats.
DocKimbel:
28-Apr-2013
5) "Yes, I think it's very dangerous to claim that Red has Unicode 
and Latin-1 support". Red *has* Unicode support, string! and word! 
value support Unicode, input Red scripts are Unicode, PRINT outputs 
Unicode characters. Latin-1 is used as an *internal* encoding format, 
I don't remember ever claiming that "Red supports Latin-1 for I/O" 
except for the console script (which is wrong, I agree). OTOH, I 
do remember thinking about supporting it at the beginning for printing, 
then I found it cumbersome to support in addition to Unicode mode 
and dropped it during the implementation.
DocKimbel:
28-Apr-2013
So, about the console issue, the runtime lexer is able to parse Latin-1 
input but the input string gets internalized before being passed 
to the lexer using the UTF-8 loader, which chokes on MSDOS console 
incompatible codepages. For the Unix version, the console input being 
in UTF-8 by default, it passes the internalization, but crashes the 
runtime lexer.
DocKimbel:
28-Apr-2013
Kaj, it seems to me that you were confused by a few things:
- console script banner wrong statement (my fault)

- internal "Latin-1" naming (like in Python's internals) which might 
be misleading (there's no other closer naming in Unicode for one 
byte representation AFAIK, though some people call it "UCS-1", maybe 
we should adopt that too).

- "Unicode support" seems to imply to you that *all* possible Unicode 
encodings have to be supported (with encoders/decoders). It doesn't, 
having just one encoding supporting the full Unicode range (like 
UCS-4) is enough for claiming "Unicode support".
Arnold:
29-Apr-2013
There is as I read this a different issue. Dock want Red to be as 
complete as posible, Kaj wants it to officially useable. Kaj really 
needs UTF-8 (and or Latin-1) character support, for getting this, 
I guess this has to do with the Syllable operating system amongst 
others.

I would like Red to support time and random functions as natives 
and (Gregg is one of your mezz funcs REJOIN ? I want that too) be 
able to connect to a MySQL database so I can dump PHP for some webdevelopment.

Besdies that we all love to see a VID (like) solution for display 
and creating apps. 

We have to be patient agreed 100% amongst everybody? Where the roadmap 
mentions all things to progress Red, above things are not on that 
list. I want Red to have enough to make it useable in production 
and after that expand, imho that is the way to really attrackt more 
funding/enthousiast programmers and make sure current support does 
not fade/ loose interest.
Group: Announce ... Announcements only - use Ann-reply to chat [web-public]
Kaj:
8-Jan-2013
The Red binding uses the same string marshalling as the Red console, 
so the same limitations apply. Latin-1 only and string values of 
64 bytes or longer may not work
Kaj:
27-Apr-2013
I implemented UTF-8 output support for Red. I ended up writing optimised 
versions based more on the Red print backend. I integrated them in 
my I/O routines and made heavy performance optimisations. Thanks 
to Peter for leading the way. There are the following Red/System 
encoders embedded in %common.red:

http://red.esperconsultancy.nl/Red-common/dir?ci=tip


to-UTF8: encodes a Red string into UTF-8 Red/System c-string! format.

to-local-file: encodes a Red string into Latin-1 Red/System c-string! 
format on Windows, and into UTF-8 on other systems. This yields a 
string suitable for the local file name APIs. Latin-1 can be output 
as long as it was input into Red via UTF-8. Non-Latin-1 code points 
cannot be encoded in Latin-1 and yield a NULL for the entire result.


These encoders make use of the Latin1-to-UTF8, UCS2-to-UTF8 and UCS4-to-UTF8 
encoding functions. An example of their use in the Red READ and WRITE 
functions is in %input-output.red
Kaj:
27-Apr-2013
I used the new encoding functions in all my Red bindings: those for 
the C library, input/output via files and cURL, 0MQ, SQLite and GTK+. 
In as many places as possible, data marshalled to the external libraries 
now supports UTF-8. File names on Windows support Latin-1. Files 
and URLs are always read and written as UTF-8, including on Windows. 
Red does not support loading Latin-1 strings.
Kaj:
27-Apr-2013
I've updated the binary downloads. The red console interpreters and 
all the Red examples include the above encoding support now, and 
all the latest Red features:

http://red.esperconsultancy.nl/Red-test/dir?ci=tip


For example, the Red/GTK-text-editor now supports writing UTF-8 files 
with UTF-8 or Latin-1 names.


I've added an MSDOS\Red\red-core.exe for Windows 2000, because the 
GTK+ libraries in red.exe require Windows XP+.

world-name: r3wp

Group: Core ... Discuss core issues [web-public]
DanielSz:
14-Nov-2007
BTW, I noticed that rebol.org serves pages in utf-8 encoding, but 
the scripts themselves are latin-1. This is not a problem for the 
code, but it is a problem for the comments, which may contain accented 
characters. For example, names of authors (hint: Robert Müench), 
and they consequently appear garbled. I'm not saying pages should 
be served as latin-1, on the contrary, I am an utf-8 enthusiast, 
I think rebol scripts themselves should be encoded as utf-8, (it 
is possible with python, for example). I hope Rebol3 will be an all 
encompassing utf-8 system (am I dreaming?).
BrianH:
5-Mar-2009
kib2: "Does that mean that we can use unicode encoding with the help 
of r2-forward ?"

No, I only can only spoof datatypes that don't exist in R2, and R2 
has a string! type. The code should be equivalent if the characters 
in the string are limited to the first 256 codepoints of Unicode 
(aka Latin-1), though only the first 128 codepoints (aka ASCII) can 
be converted from binary! to string and have the binary data be the 
same as minimized UTF-8.
BrianH:
30-Jan-2010
latin1?: func [

 "Returns TRUE if value or string is in Latin-1 character range (below 
 256)."

 value [string! file! email! url! tag! issue! char! integer!] ; Not 
 binary!
][ ; R2 has Latin-1 chars and strings
	either integer? value [value < 256] [true]
]

; Note: Native (and more meaningful) in R3. For forwards compatibility.
Group: Script Library ... REBOL.org: Script library and Mailing list archive [web-public]
Sunanda:
16-Mar-2009
Thanks guys.
Other scripts with the same problem.....there are a couple. 

About 10% of all scripts have at least one extended ASCII char....But 
most of them are acceptable in LATIN-1 code page / charset (eg copyright 
symbol, some accented letters). It's just a very few scripts that 
use 1/4 and similar symbols that cause the problem.


What other editors? Windows NOTEPAD is one example of a common one 
that gets this wrong.
Group: !REBOL3-OLD1 ... [web-public]
Sunanda:
31-Jul-2009
But it's R2 compatible :)

There are other edge cases -- Latin-1 chars that can be _in_ a word 
not not _start_ them, and do not serialise well.....I did a script 
and found them all once
Maxim:
11-Sep-2009
but if we need to output latin-1 afterwards (while dumping the html 
content, for example), the output encoding  should be selectable 
as a "current default", and all the --cgi would do is set that default 
to UTF-8 for example.
Maxim:
30-Oct-2009
I also think the "default" user text format should be configurable. 
  I have absolutely no desire to start using utf-8 for my code and 
data, especially when I have a lot of stuff that already is in iso 
latin-1 encoding.
Maxim:
30-Oct-2009
hum... cause everything I use is ascii or latin-1 ?
Group: !REBOL2 Releases ... Discuss 2.x releases [web-public]
BrianH:
2-Jan-2010
OK, now that we have 2.7.7 released (even though there is more work 
to do, i.e. platforms and the SDK), it is time to look ahead to 2.7.8 
- which is scheduled for release in one month on February 1. The 
primary goal of this release is to migrate to REBOL's new development 
infrastructure. This means:

- Migrating the RAMBO database to a new CureCode project and retiring 
RAMBO.

- Using Carl's generation code for the manual to regenerate the R2 
manual, so we can start to get to work updating it.

- Porting the chat client to R2 using the new functions and building 
a CHAT function into R2 similar to the R3 version.


The R2 chat client might be limited to the ASCII character set, though 
support for the Latin-1 character set might be possible. Still text 
mode for now, though if anyone wants to write a GUI client (Henrik?) 
we can put it on the official RT reb site accessible from the View 
desktop. The server is accessed through a simple RPC protocol and 
is designed to be easily scriptable.


It turns out that Carl already rewrote the installer for 2.7.something, 
but it was turned off because of a couple minor bugs that we were 
able to fix in 2.7.7. With any luck, only minor fixes to the registry 
usage will be needed and we'll be good to go.


As for the rest, it's up to you. Graham seems to have a good tweak 
to the http protocol, and others may want to contribute their fixes.
Group: !REBOL3 Extensions ... REBOL 3 Extensions discussions [web-public]
Oldes:
11-Nov-2010
So with Cyphre's help I have this function:
char* rebser_to_utf8(REBSER* series) {
    char *uf8str;
    REBCHR* str;
    REBINT result = RL_GET_STRING(series, 0 , (void**)&str);
        
    if (result > 0){
        //unicode string
        int iLen = wcslen(str);
        int oLen = iLen *  sizeof(REBCHR);
        uf8str = malloc(oLen);

        int result = WideCharToMultiByte(CP_UTF8, 0, str, iLen, uf8str, oLen, 
        0, 0);
        if (result == 0) {
            int err = GetLastError();
            RL->print("err: %d\n", err);
        }
    } else if (result < 0) {
        //bytes string (ascii or latin-1)
        uf8str = malloc(strlen((char *)str));
        strcpy(uf8str, (char *)str);
    }
    return uf8str;
}

and I can than use:
..
            char *filename = rebser_to_utf8(RXA_SERIES(frm, 1));
            status=MagickReadImage(current_wand, filename);
            free(filename);
            if (status == MagickFalse) {
                ThrowWandException(current_wand);
            }
            return RXR_TRUE;
Oldes:
11-Nov-2010
This seems to be working:
char* REBSER_to_UTF8(REBSER* series) {
    char *uf8str;
    REBCHR* str;
    REBINT result = RL_GET_STRING(series, 0 , (void**)&str);
        
    if (result > 0){
        //unicode string
        int iLen = wcslen(str);
        //int oLen = iLen *  sizeof(REBCHR);

        int oLen = WideCharToMultiByte( CP_UTF8, 0, str, -1, NULL, 0,  NULL, 
        NULL);
        uf8str = malloc(oLen);

        int result = WideCharToMultiByte(CP_UTF8, 0, str, iLen, uf8str, oLen, 
        0, 0);
        if (result == 0) {
            int err = GetLastError();
            RL->print("err: %d\n", err);
        }
        uf8str[oLen] = 0;
    } else if (result < 0) {
        //bytes string (ascii or latin-1)
        uf8str = strdup((char *)str);
    }
    return uf8str;
}