World: r4wp

[Rebol School] REBOL School

older newer	first last
Endo 8-Aug-2012 [732x2]	I wrote a run length encoding function, may be useful for someone else too: rle: func ["Run length encode" b /local v r i j] [ v: 1 r: copy [] j: next i: b unless empty? b [ until [ either all [not tail? j equal? first i first j] [ v: v + 1 j: next j ] [ append r reduce [v first i] v: 1 i: ++ j ] tail? i ] ] r ]
Endo 8-Aug-2012 [732x2]	here is the tests: >> rle "aaabbcx" == [3 #"a" 2 #"b" 1 #"c" 1 #"x"] >> >> rle [] == [] >> >> rle "" == [] >> >> rle [a] == [1 a] >> >> rle [a a a a a] == [5 a] >> rle [a a a a a b b] == [5 a 2 b]
DocKimbel 8-Aug-2012 [734]	Endo: I think a much faster version could be coded using PARSE.
Endo 8-Aug-2012 [735x2]	But don't I need to write two different set of rules for strings and blocks?
Endo 8-Aug-2012 [735x2]	And I'm not that good using PARSE, my PARSE expriments usually stuck with infinite loops :(
DocKimbel 8-Aug-2012 [737]	I think it should be doable with one set for both datatypes.
BrianH 8-Aug-2012 [738x7]	Not quite. You can almost do it in R3, but you need the QUOTE operation for blocks, and QUOTE doesn't work on strings.
	Here's a version for R3 parse, with some optimizations: rle2: funct ["Run length encode" b [series!]] [ output: copy [] x: none r: either any-block? :b [qr: copy [quote 1] [(qr/2: :x) any qr]] [[any x]] parse :b [any [pos1: set x skip r pos2: ( reduce/into [subtract index? :pos2 index? :pos1 :x] tail output )]] output ]
	The biggest overhead there comes from not preallocating the output block, so there's some reallocation. I don't know how to estimate the size of the output though.
	You should really have it be case-sensitive though.
	rle2: funct ["Run length encode" b [series!]] [ output: copy [] x: none r: either any-block? :b [qr: copy [quote 1] [(qr/2: :x) any qr]] [[any x]] parse/case :b [any [pos1: set x skip r pos2: ( reduce/into [subtract index? :pos2 index? :pos1 :x] tail output )]] output ] >> rle2 [a a A b b c d D d d d] == [2 a 1 A 2 b 1 c 1 d 1 D 3 d]
	I tried to come up with a more optimal R3 version without using parse, but I got blocked by case-sensitivity, without considering binding. I suppose I could just consider binding too, or unbind the results. It also doesn't do structural comparison of functions, either in the parse or procedural version.
	Can someone come up with an equivalent R2 parse version? Of course words aren't case-preserving in R2, nor can they be compared case-sensitively.
DocKimbel 8-Aug-2012 [745]	Here's a R2 solution with same rules for string! and block! series: rle: func [s [series!] /local out c i][ out: make block! 1 parse/case/all s [ any [ [end \| c: ( c: either word? c/1 [to-lit-word c/1][c/1] i: 1 )] skip some [ c (i: i + 1) \| (repend out [i c]) break ] ] ] out ] >> rle "aaabbcx" == [3 #"a" 2 #"b" 1 #"c" 1 #"x"] >> rle [a a a a a] == [5 a] >> rle [a a a a a b b] == [5 a 2 b] >> rle [a a A b b c d D d d d] == [3 a 2 b 1 c 5 d]
BrianH 8-Aug-2012 [746]	Put in an exception for blocks, parens and integers too. In your code above, blocks get treated like sub-rules, parens get executed, and integers can result in an endless loop.
DocKimbel 8-Aug-2012 [747x2]	Another version of 'rle for R2 that uses two pointers (like your R3 version) instead of a counter: rle: func [s [series!] /local out c pos1 pos2][ out: make block! 1 parse/case/all s [ any [ [end \| c: ( c: either word? c/1 [to-lit-word c/1][c/1] )] pos1: skip some [c \| pos2: (repend out [offset? pos1 pos2 c]) break] ] ] out ]
DocKimbel 8-Aug-2012 [747x2]	Brian: I'm just giving a solution that matches the requirements from Endo. The block example he provided contains only words.
BrianH 8-Aug-2012 [749x10]	Ah, yeah, I overspecced the R3 version by using QUOTE. Although there's a standatd workaround for integers (precede it with 1 1), for any-blocks and functions it might be best to do the comparison using REBOL code. The R2 equivalent of the IF operation in R3 parse would help here: http://www.rebol.net/wiki/Parse_Project#IF_.28condition.29
	The replacement for IF (condition) would be something like this: (cont: unless condition [[end skip]]) cont
	Oh right, I forgot OFFSET? so I recreated it :(
	Here's a version of Doc's with all datatypes handled, even unset. It even avoids accidentally converting lit-words to words and lit-paths to paths: rle: func [s [series!] /local out pos1 pos2 cont][ out: make block! 1 parse/all s [ any [ pos1: skip some [ pos2: skip (cont: if any [ not-equal? unset? :pos1/1 unset? :pos2/1 strict-not-equal? :pos1/1 :pos2/1 ] [[end skip]]) cont \| pos2: (repend out [offset? :pos1 :pos2 :pos1/1]) break ] ] ] out ]
	There was no point in using parse/case since the comparisons were being done by strict-not-equal?, not parse.
	Don't know how the speed compares to Endo's original though.
	Whoops, it doesn't handle runs of unset values. More adjustment needed.
	This version handles unsets too: rle: func [s [series!] /local out emit pos1 pos2 cont][ out: make block! 1 emit: [(repend out [offset? :pos1 :pos2 :pos1/1])] parse/all s [ any [ pos1: unset! some [pos2: unset! \| pos2: emit break] \| pos1: skip some [ pos2: unset! :pos2 emit break \| pos2: skip ( cont: if strict-not-equal? :pos1/1 :pos2/1 [[end skip]] ) cont \| pos2: emit break ] ] ] out ]
	Slight improvement: rle: func [s [series!] /local out emit pos1 pos2 cont][ out: make block! 1 emit: [(repend out [offset? :pos1 :pos2 :pos1/1])] parse/all s [any [ pos1: unset! any unset! pos2: emit \| pos1: skip some [ pos2: unset! :pos2 emit break \| pos2: skip ( cont: if strict-not-equal? :pos1/1 :pos2/1 [[end skip]] ) cont \| pos2: emit break ] ]] out ]
	For old versions of R2, you might want to change IF STRICT-NOT-EQUAL? to UNLESS STRICT-EQUAL?. There used to be a bug in STRICT-NOT-EQUAL? in R2, but I forget how far back it was fixed, one of the last couple versions.
Endo 9-Aug-2012 [759x2]	The last version crash on string! values: rle "" rle "xxx" both crashes the View 2.7.8.3.1
Endo 9-Aug-2012 [759x2]	Here is the benchmark results (execution time for 1.000.000 calls) >> benchmark [rle [a a a b b c A A a a]] ;BrianH (the old one, not the crashing ones) == 0:00:29.765 >> benchmark [rle [a a a b b c A A a a]] ;endo == 0:00:32.953
DocKimbel 9-Aug-2012 [761]	Endo: you should rather bench on one long series rather than 1M times on a small one in order to avoid function calls overhead and get a more fair comparison. When I try with a 1M size string with random a,b,c chars, my parse solution is twice faster than the mezz one (Brian's one is crashing so can't test it). I was expecting a greater difference though.
Steeve 9-Aug-2012 [762]	... test
Sunanda 9-Aug-2012 [763]	Talking of test, I am trying to write a simple function that checks if a data item matches a rebol datatype, so for example: print is-it-a? "number?" "12.5" == true print is-it-a? "number?" "xxx" == false print is-it-a? "number?" "?" == false Except my function goes bad on that third example -- it prints the console help text. Any thoughts on how to check incoming values without executing them as code? Thanks is-it-a?: func [ data-type [string!] value [string!] ][ data-type: first load/all data-type error? try [ value: first load/all value return do reduce [data-type value] ] false ]
Steeve 9-Aug-2012 [764]	is-a: func [f v][ not not all [ f: get/any load f any-function? :f f load v ] ]
Sunanda 9-Aug-2012 [765]	Thanks Steeve, that's much more robust than my code :) Just for info ..... ....It needs some error trapping to handle un-loadable values, eg: >> is-a "number?" "33.e" ** Syntax Error: Invalid decimal -- 33.e ....And (like my code) it's not so good with 'true and 'false owing to the way REBOL works: >> is-a "logic?" "true" == false But it'll do the job!
BrianH 9-Aug-2012 [766x4]	Weird, I found the bug in R2 parse that causes the crash: >> parse "" [integer!] == false >> parse "" [unset!] == true ; should be false
	Two rules it is then. This doesn't crash, and is optimized for strings while we're at it. It's probably slower when doing blocks than Doc's, but it handles all datatypes: rle: func [s [series!] /local out emit pos1 pos2 cont][ out: make block! 1 emit: [(repend out [offset? :pos1 :pos2 first :pos1])] parse/case/all :s pick [[ any [pos1: skip (cont: first :pos1) any cont pos2: emit] ] [ any [ pos1: unset! any unset! pos2: emit \| pos1: skip some [ end pos2: emit break \| pos2: unset! :pos2 emit break \| pos2: skip ( cont: unless strict-equal? first :pos1 first :pos2 [[end skip]] ) cont \| pos2: emit break ] ] ]] any-string? :s out ] It also works around the strict-not-equal? bug in pre-2.7.7 R2, and using FIRST instead of path access is another speedup in R2 (path access is faster in R3).
	Change the repend to chained inserts and it gets noticably faster, due to less mezzanine overhead: rle: func [s [series!] /local out emit pos1 pos2 cont][ out: make block! 2 emit: [(out: insert/only insert out offset? :pos1 :pos2 first :pos1)] parse/case/all :s pick [[ any [pos1: skip (cont: first :pos1) any cont pos2: emit] ] [ any [ pos1: unset! any unset! pos2: emit \| pos1: skip some [ end pos2: emit break \| pos2: unset! :pos2 emit break \| pos2: skip ( cont: unless strict-equal? first :pos1 first :pos2 [[end skip]] ) cont \| pos2: emit break ] ] ]] any-string? :s head out ]
	You can get rid of this line too for a slight speedup: end pos2: emit break \| It's a leftover when I was trying to work around the bug in PARSE.
Maxim 9-Aug-2012 [770]	providing an optional output buffer (which the user can pre-allocate to some ideal size) would make a VERY big difference on large inputs. usually, when it goes into the hundreds of thousands, repetitive series re-allocation on growing mutable series, will kill any kind of optimisation you can dream of. rle: func [s [series!] /into out [block!] /local out emit pos1 pos2 cont][ out: any [ out make block! 2 ] ... ] this is especially effective on repetitive calls to the above function and using clear on the given buffer so that it auto-grows to an optimal size and is fast on later calls. just today, I was doing some encryption benchmarking and when I hit strings larger than 1MB it was taking several minutes... thats until I realized that it was my dataset generator (a looped string insert) which was taking 98% of the cpu time. !
BrianH 9-Aug-2012 [771x3]	If you follow the /into option standard you can do chained calls to RLE too: rle: func [s [series!] /into out [any-block!] /local emit pos1 pos2 cont][ unless into [out: make block! 2] emit: [(out: insert/only insert :out offset? :pos1 :pos2 first :pos1)] parse/case/all :s pick [[ any [pos1: skip (cont: first :pos1) any cont pos2: emit] ] [ any [ pos1: unset! any unset! pos2: emit \| pos1: skip some [ pos2: unset! :pos2 emit break \| pos2: skip ( cont: unless strict-equal? first :pos1 first :pos2 [[end skip]] ) cont \| pos2: emit break ] ] ]] any-string? :s either into [:out] [head :out] ]
	>> head rle/into "ddeeee" rle/into "aaabbc" make block! 10 == [3 #"a" 2 #"b" 1 #"c" 2 #"d" 4 #"e"]
	Functions with /into work like INSERT when chained together. This works with other block types too, though OFFSET? will be slow with the list! type.
Maxim 9-Aug-2012 [774]	now that's a pretty nice RLE encoder. :-)
BrianH 9-Aug-2012 [775x3]	R3 version, same /into option: rle: funct ["Run length encode" s [series!] /into output [any-block!]] [ unless into [output: make block! 2] x: none r: either any-block? :s [qr: copy [quote 1] [(qr/2: :x) any qr]] [[any x]] parse/case :s [any [pos1: set x skip r pos2: ( output: reduce/into [subtract index? :pos2 index? :pos1 :x] :output )]] either into [:output] [head :output] ]
	Sorry, same unset problems, have to use POKE: rle: funct ["Run length encode" s [series!] /into output [any-block!]] [ unless into [output: make block! 2] x: none r: either any-block? :s [qr: copy [quote 1] [(poke qr 2 :x) any qr]] [[any x]] parse/case :s [any [pos1: set x skip r pos2: ( output: reduce/into [subtract index? :pos2 index? :pos1 :x] :output )]] either into [:output] [head :output] ]
	>> mold/all rle reduce [() () 'a () 1] == "[2 #[unset!] 1 a 1 #[unset!] 1 1]"
Endo 10-Aug-2012 [778x3]	It is what I like about this community :) I knew that when I write a RLE function, BrianH will come up a much better version. Doc and others joined as well and now we have a very good function. Just like the CSV tools. Thanks.
	Ehm.. what about the decoder? how do I decode unset! values? I was using somthing like: decode-rle: func [b /local r] [r: copy [] foreach [x y] b [loop x [append r y]]]
	decode-rle: func [b /local r i] [ i: 0 r: make block! foreach [x y] b [i: i + x] ;better for big blocks? foreach [x y] b [loop x [append r y]] ]
BrianH 10-Aug-2012 [781]	In mezzanine style: decode-rle: func [ "Decode a run length encoded block" rle [any-block!] "Block of [integer value]" /into "Insert into a buffer instead (returns position after insert)" output [series!] "The output buffer (modified)" /local x ] [ unless into [ x: 0 foreach [i v] :rle [x: x + :i] output: make block! x ] foreach [i v] :rle [output: insert/only/dup :output get/any 'v :i] either into [:output] [head :output] ] Instead of testing for strict format compliance of the input block, it uses get-words to keep people from sneaking in functions and then passes the length value to + and INSERT/dup, counting on the type tests of those functions to do the screening for us.
older newer	first last