• Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

World: r4wp

[#Red] Red language group

Kaj
17-Apr-2013
[7070]
It's worse than having no Unicode, then you can at least get out 
what you put in
DocKimbel
17-Apr-2013
[7071]
For Android, java uses UTF-16, so the conversion from string! is 
(almost) trivial.
Kaj
17-Apr-2013
[7072]
But it's not there yet, is it?
DocKimbel
17-Apr-2013
[7073]
No, I will implement it when I'll need it, and I have a lot of other 
stuff to code for Android support before that.
Kaj
17-Apr-2013
[7074]
Is it wise that Red won't work on other platforms before it works 
on Android?
DocKimbel
17-Apr-2013
[7075]
The features currently implementd in Red are working.
Kaj
17-Apr-2013
[7076]
Sure, but it's pretty useless like this
DocKimbel
17-Apr-2013
[7077]
That's why it is called an alpha. ;-)
Kaj
17-Apr-2013
[7078]
I was hoping a little more could be done, but I'll have to postpone 
a lot of work
DocKimbel
17-Apr-2013
[7079]
I told you I will have a look at it once the shared libs will be 
done, just wait a few days more. If it's critical to you, you might 
want to contribute the required conversion routines?
Kaj
17-Apr-2013
[7080x2]
I could, but I know very little of Unicode, so there would be a lot 
of overhead in getting up to speed
I have no idea how long it will take you to finish the shared libraries. 
It has been a backburner project for a long time
DocKimbel
17-Apr-2013
[7082]
Not very long, I just kept it postponed since almost a year now, 
and it's getting on my way for Android support since a while, so 
I've scheduled it since a few weeks to get it done just after the 
interpreter is finished (Exit/Return support).
Kaj
17-Apr-2013
[7083]
OK, that's fine. It sounded like data out support was of undetermined 
priority
DocKimbel
17-Apr-2013
[7084x11]
The only "data out" support we need for now for building Red is the 
stdout support, and we have it since a while.
Red I/O full support is next on my list after the above mentioned 
tasks will be completed.
BTW, if you stick to Latin-1, you shouldn't have the need for any 
conversion?
Also, there might be a cheap way to achieve the conversion in the 
meantime using wsprintf() or similar function.
Hmm, it might not be enough, so you might want to have a look and 
maybe wrap libiconv:
http://www.gnu.org/software/libiconv/
For once, the API looks good and simple enough (4 functions to wrap).
From a routine, if str is a red-string! pointer, this is the dispatch 
code you would need to use:

s: GET_BUFFER(str)
switch GET_UNIT(str) [
	Latin-1 [...conversion code...]
	UCS-2 [...conversion code...]
	UCS-4 [...conversion code...]
]
Beginning of internal string buffer is given by:

	string/rs-head str	(returns a byte-ptr!)
Should be GET_UNIT(s) above, sorry for the typo/
Another typo: should be Latin1.
Anyway, you don't need any conversion for Latin1, so you just have 
to do it for the other two formats.
Kaj
17-Apr-2013
[7095x2]
Sticking to Latin1 is not much use these days. Many data such as 
web sites is in Unicode. It would be fine if it worked like R2, as 
a transparent passthrough, but Red eats your Unicode and won't give 
it back from its internal format
How does stdout support deal with that? Is there no conversion to 
the platform format there?
PeterWood
17-Apr-2013
[7097x2]
I'd be happy to look at a UCS-2 to UTF-8 conversion function but 
I don't have the time to do it at the moment.
I'm pretty sure that would be enough for Kaj's immediate needs.
Kaj
17-Apr-2013
[7099x2]
Yes
I see there are specialised platform specific print functions only 
for printing the internal format. They look like a base for the general 
purpose conversions, though
PeterWood
17-Apr-2013
[7101x3]
I've written a quick function that will take a Red char (UCS4) and 
output the equivalent UTF-8 as bytes stored in a struct!.


It can be used for the base of converting a Red sting to UTF-8. What 
is needed is to extract Red Char! s from the Red String, call the 
function and then appedn the UTF-8 to a c-string!
The function only covers the BMP at the moment.
You can find it at:


https://github.com/PeterWAWood/Red-System-Libs/blob/master/UTF-8/ucs4-utf8.reds
AdrianS
18-Apr-2013
[7104]
It's so nice to see C written that way, Peter.
Pekr
18-Apr-2013
[7105]
Yes, finally a C, that makes sense :-) Well, nothing against C, I 
am glad it is still around and going to stay ....
PeterWood
18-Apr-2013
[7106x2]
I've just committed a slightly improved version that retunrs a c-string! 
instead of  a structure.
For me the big issue of turning the function into the utf-8 string 
that Kaj's wants is "How to allocate a c-string! using the Red Memory 
Manager rather than malloc"

Any suggestions appreciated.
DocKimbel
18-Apr-2013
[7108x2]
It would be best to do the conversions on the fly, that is why I 
want to wait for I/O get done to implement such conversion routines. 


Anyway, for doing it now, you need to allocate a new string, the 
best way to do it is:

    str: as red-string! stack/push*
    str/header: TYPE_STRING
    str/head: 0
    str/node:  alloc-bytes size


The new string! value will be put on stack, so any other call to 
a Red internal native or action might destroy it. Also, keep in mind 
that the GC is not there yet, so intensive I/O might quickly eat 
up all your RAM.
Oh, you meant a c-string!, not a string!, so it's even easier, just 
use: alloc-bytes size
PeterWood
18-Apr-2013
[7110x2]
Thanks.
Is there any easy way to free the c-string?
DocKimbel
18-Apr-2013
[7112x4]
Currently no, the freeing function requires a memory frame pointer 
in addition to the buffer pointer. It is meant for internal use only 
for now.
Anyway, even freeing it won't help much as long as the GC doesn't 
do the cleanup.
Here's how your main loop would look like for retrieving every codepoint 
from a string! value:

	head: string/rs-head str
	tail: string/rs-tail str
		
	s: GET_BUFFER(str)
	unit: GET_UNIT(s)
		
	while [head < tail][
		cp: switch unit [
			Latin1 [as-integer p/value]
			UCS-2  [(as-integer p/2) << 8 + p/1]
			UCS-4  [p4: as int-ptr! p p4/value]
		]
		...emit UTF-8 char...
		head: head + unit
	]
Oops, you should replace 'head by 'p in the above code.
PeterWood
18-Apr-2013
[7116]
Many thanks.
DocKimbel
18-Apr-2013
[7117]
cp hold your codepoint as a 32-bit integer.
PeterWood
18-Apr-2013
[7118]
I should be able to turn this into a function for Kaj to include 
in his routine! where he needs UTF-8
DocKimbel
18-Apr-2013
[7119]
I guess that should be enough for his needs.