World: r4wp

[#Red] Red language group

older newer	first last
Kaj 17-Apr-2013 [7070]	It's worse than having no Unicode, then you can at least get out what you put in
DocKimbel 17-Apr-2013 [7071]	For Android, java uses UTF-16, so the conversion from string! is (almost) trivial.
Kaj 17-Apr-2013 [7072]	But it's not there yet, is it?
DocKimbel 17-Apr-2013 [7073]	No, I will implement it when I'll need it, and I have a lot of other stuff to code for Android support before that.
Kaj 17-Apr-2013 [7074]	Is it wise that Red won't work on other platforms before it works on Android?
DocKimbel 17-Apr-2013 [7075]	The features currently implementd in Red are working.
Kaj 17-Apr-2013 [7076]	Sure, but it's pretty useless like this
DocKimbel 17-Apr-2013 [7077]	That's why it is called an alpha. ;-)
Kaj 17-Apr-2013 [7078]	I was hoping a little more could be done, but I'll have to postpone a lot of work
DocKimbel 17-Apr-2013 [7079]	I told you I will have a look at it once the shared libs will be done, just wait a few days more. If it's critical to you, you might want to contribute the required conversion routines?
Kaj 17-Apr-2013 [7080x2]	I could, but I know very little of Unicode, so there would be a lot of overhead in getting up to speed
Kaj 17-Apr-2013 [7080x2]	I have no idea how long it will take you to finish the shared libraries. It has been a backburner project for a long time
DocKimbel 17-Apr-2013 [7082]	Not very long, I just kept it postponed since almost a year now, and it's getting on my way for Android support since a while, so I've scheduled it since a few weeks to get it done just after the interpreter is finished (Exit/Return support).
Kaj 17-Apr-2013 [7083]	OK, that's fine. It sounded like data out support was of undetermined priority
DocKimbel 17-Apr-2013 [7084x11]	The only "data out" support we need for now for building Red is the stdout support, and we have it since a while.
	Red I/O full support is next on my list after the above mentioned tasks will be completed.
	BTW, if you stick to Latin-1, you shouldn't have the need for any conversion?
	Also, there might be a cheap way to achieve the conversion in the meantime using wsprintf() or similar function.
	Hmm, it might not be enough, so you might want to have a look and maybe wrap libiconv: http://www.gnu.org/software/libiconv/
	For once, the API looks good and simple enough (4 functions to wrap).
	From a routine, if str is a red-string! pointer, this is the dispatch code you would need to use: s: GET_BUFFER(str) switch GET_UNIT(str) [ Latin-1 [...conversion code...] UCS-2 [...conversion code...] UCS-4 [...conversion code...] ]
	Beginning of internal string buffer is given by: string/rs-head str (returns a byte-ptr!)
	Should be GET_UNIT(s) above, sorry for the typo/
	Another typo: should be Latin1.
	Anyway, you don't need any conversion for Latin1, so you just have to do it for the other two formats.
Kaj 17-Apr-2013 [7095x2]	Sticking to Latin1 is not much use these days. Many data such as web sites is in Unicode. It would be fine if it worked like R2, as a transparent passthrough, but Red eats your Unicode and won't give it back from its internal format
Kaj 17-Apr-2013 [7095x2]	How does stdout support deal with that? Is there no conversion to the platform format there?
PeterWood 17-Apr-2013 [7097x2]	I'd be happy to look at a UCS-2 to UTF-8 conversion function but I don't have the time to do it at the moment.
PeterWood 17-Apr-2013 [7097x2]	I'm pretty sure that would be enough for Kaj's immediate needs.
Kaj 17-Apr-2013 [7099x2]	Yes
Kaj 17-Apr-2013 [7099x2]	I see there are specialised platform specific print functions only for printing the internal format. They look like a base for the general purpose conversions, though
PeterWood 17-Apr-2013 [7101x3]	I've written a quick function that will take a Red char (UCS4) and output the equivalent UTF-8 as bytes stored in a struct!. It can be used for the base of converting a Red sting to UTF-8. What is needed is to extract Red Char! s from the Red String, call the function and then appedn the UTF-8 to a c-string!
	The function only covers the BMP at the moment.
	You can find it at: https://github.com/PeterWAWood/Red-System-Libs/blob/master/UTF-8/ucs4-utf8.reds
AdrianS 18-Apr-2013 [7104]	It's so nice to see C written that way, Peter.
Pekr 18-Apr-2013 [7105]	Yes, finally a C, that makes sense :-) Well, nothing against C, I am glad it is still around and going to stay ....
PeterWood 18-Apr-2013 [7106x2]	I've just committed a slightly improved version that retunrs a c-string! instead of a structure.
PeterWood 18-Apr-2013 [7106x2]	For me the big issue of turning the function into the utf-8 string that Kaj's wants is "How to allocate a c-string! using the Red Memory Manager rather than malloc" Any suggestions appreciated.
DocKimbel 18-Apr-2013 [7108x2]	It would be best to do the conversions on the fly, that is why I want to wait for I/O get done to implement such conversion routines. Anyway, for doing it now, you need to allocate a new string, the best way to do it is: str: as red-string! stack/push* str/header: TYPE_STRING str/head: 0 str/node: alloc-bytes size The new string! value will be put on stack, so any other call to a Red internal native or action might destroy it. Also, keep in mind that the GC is not there yet, so intensive I/O might quickly eat up all your RAM.
DocKimbel 18-Apr-2013 [7108x2]	Oh, you meant a c-string!, not a string!, so it's even easier, just use: alloc-bytes size
PeterWood 18-Apr-2013 [7110x2]	Thanks.
PeterWood 18-Apr-2013 [7110x2]	Is there any easy way to free the c-string?
DocKimbel 18-Apr-2013 [7112x4]	Currently no, the freeing function requires a memory frame pointer in addition to the buffer pointer. It is meant for internal use only for now.
	Anyway, even freeing it won't help much as long as the GC doesn't do the cleanup.
	Here's how your main loop would look like for retrieving every codepoint from a string! value: head: string/rs-head str tail: string/rs-tail str s: GET_BUFFER(str) unit: GET_UNIT(s) while [head < tail][ cp: switch unit [ Latin1 [as-integer p/value] UCS-2 [(as-integer p/2) << 8 + p/1] UCS-4 [p4: as int-ptr! p p4/value] ] ...emit UTF-8 char... head: head + unit ]
	Oops, you should replace 'head by 'p in the above code.
PeterWood 18-Apr-2013 [7116]	Many thanks.
DocKimbel 18-Apr-2013 [7117]	cp hold your codepoint as a 32-bit integer.
PeterWood 18-Apr-2013 [7118]	I should be able to turn this into a function for Kaj to include in his routine! where he needs UTF-8
DocKimbel 18-Apr-2013 [7119]	I guess that should be enough for his needs.
older newer	first last