World: r4wp
[Rebol School] REBOL School
older newer | first last |
BrianH 1-Aug-2012 [730] | The "in some cases faster" depends on whether you're going in reverse, and how much data you have. REVERSE can have a lot of overhead if you have a lot of data, so with enough data it would be faster to just work in reverse using a WHILE loop, or possibly better yet a REPEAT with an index that you subtract. |
Arnold 2-Aug-2012 [731] | Time-out! :D I have enough information to make my script working with reasonable speed. After I publish it on rebol.org we can write out a competition to have it gain speed ;) |
Endo 8-Aug-2012 [732x2] | I wrote a run length encoding function, may be useful for someone else too: rle: func ["Run length encode" b /local v r i j] [ v: 1 r: copy [] j: next i: b unless empty? b [ until [ either all [not tail? j equal? first i first j] [ v: v + 1 j: next j ] [ append r reduce [v first i] v: 1 i: ++ j ] tail? i ] ] r ] |
here is the tests: >> rle "aaabbcx" == [3 #"a" 2 #"b" 1 #"c" 1 #"x"] >> >> rle [] == [] >> >> rle "" == [] >> >> rle [a] == [1 a] >> >> rle [a a a a a] == [5 a] >> rle [a a a a a b b] == [5 a 2 b] | |
DocKimbel 8-Aug-2012 [734] | Endo: I think a much faster version could be coded using PARSE. |
Endo 8-Aug-2012 [735x2] | But don't I need to write two different set of rules for strings and blocks? |
And I'm not that good using PARSE, my PARSE expriments usually stuck with infinite loops :( | |
DocKimbel 8-Aug-2012 [737] | I think it should be doable with one set for both datatypes. |
BrianH 8-Aug-2012 [738x7] | Not quite. You can almost do it in R3, but you need the QUOTE operation for blocks, and QUOTE doesn't work on strings. |
Here's a version for R3 parse, with some optimizations: rle2: funct ["Run length encode" b [series!]] [ output: copy [] x: none r: either any-block? :b [qr: copy [quote 1] [(qr/2: :x) any qr]] [[any x]] parse :b [any [pos1: set x skip r pos2: ( reduce/into [subtract index? :pos2 index? :pos1 :x] tail output )]] output ] | |
The biggest overhead there comes from not preallocating the output block, so there's some reallocation. I don't know how to estimate the size of the output though. | |
You should really have it be case-sensitive though. | |
rle2: funct ["Run length encode" b [series!]] [ output: copy [] x: none r: either any-block? :b [qr: copy [quote 1] [(qr/2: :x) any qr]] [[any x]] parse/case :b [any [pos1: set x skip r pos2: ( reduce/into [subtract index? :pos2 index? :pos1 :x] tail output )]] output ] >> rle2 [a a A b b c d D d d d] == [2 a 1 A 2 b 1 c 1 d 1 D 3 d] | |
I tried to come up with a more optimal R3 version without using parse, but I got blocked by case-sensitivity, without considering binding. I suppose I could just consider binding too, or unbind the results. It also doesn't do structural comparison of functions, either in the parse or procedural version. | |
Can someone come up with an equivalent R2 parse version? Of course words aren't case-preserving in R2, nor can they be compared case-sensitively. | |
DocKimbel 8-Aug-2012 [745] | Here's a R2 solution with same rules for string! and block! series: rle: func [s [series!] /local out c i][ out: make block! 1 parse/case/all s [ any [ [end | c: ( c: either word? c/1 [to-lit-word c/1][c/1] i: 1 )] skip some [ c (i: i + 1) | (repend out [i c]) break ] ] ] out ] >> rle "aaabbcx" == [3 #"a" 2 #"b" 1 #"c" 1 #"x"] >> rle [a a a a a] == [5 a] >> rle [a a a a a b b] == [5 a 2 b] >> rle [a a A b b c d D d d d] == [3 a 2 b 1 c 5 d] |
BrianH 8-Aug-2012 [746] | Put in an exception for blocks, parens and integers too. In your code above, blocks get treated like sub-rules, parens get executed, and integers can result in an endless loop. |
DocKimbel 8-Aug-2012 [747x2] | Another version of 'rle for R2 that uses two pointers (like your R3 version) instead of a counter: rle: func [s [series!] /local out c pos1 pos2][ out: make block! 1 parse/case/all s [ any [ [end | c: ( c: either word? c/1 [to-lit-word c/1][c/1] )] pos1: skip some [c | pos2: (repend out [offset? pos1 pos2 c]) break] ] ] out ] |
Brian: I'm just giving a solution that matches the requirements from Endo. The block example he provided contains only words. | |
BrianH 8-Aug-2012 [749x10] | Ah, yeah, I overspecced the R3 version by using QUOTE. Although there's a standatd workaround for integers (precede it with 1 1), for any-blocks and functions it might be best to do the comparison using REBOL code. The R2 equivalent of the IF operation in R3 parse would help here: http://www.rebol.net/wiki/Parse_Project#IF_.28condition.29 |
The replacement for IF (condition) would be something like this: (cont: unless condition [[end skip]]) cont | |
Oh right, I forgot OFFSET? so I recreated it :( | |
Here's a version of Doc's with all datatypes handled, even unset. It even avoids accidentally converting lit-words to words and lit-paths to paths: rle: func [s [series!] /local out pos1 pos2 cont][ out: make block! 1 parse/all s [ any [ pos1: skip some [ pos2: skip (cont: if any [ not-equal? unset? :pos1/1 unset? :pos2/1 strict-not-equal? :pos1/1 :pos2/1 ] [[end skip]]) cont | pos2: (repend out [offset? :pos1 :pos2 :pos1/1]) break ] ] ] out ] | |
There was no point in using parse/case since the comparisons were being done by strict-not-equal?, not parse. | |
Don't know how the speed compares to Endo's original though. | |
Whoops, it doesn't handle runs of unset values. More adjustment needed. | |
This version handles unsets too: rle: func [s [series!] /local out emit pos1 pos2 cont][ out: make block! 1 emit: [(repend out [offset? :pos1 :pos2 :pos1/1])] parse/all s [ any [ pos1: unset! some [pos2: unset! | pos2: emit break] | pos1: skip some [ pos2: unset! :pos2 emit break | pos2: skip ( cont: if strict-not-equal? :pos1/1 :pos2/1 [[end skip]] ) cont | pos2: emit break ] ] ] out ] | |
Slight improvement: rle: func [s [series!] /local out emit pos1 pos2 cont][ out: make block! 1 emit: [(repend out [offset? :pos1 :pos2 :pos1/1])] parse/all s [any [ pos1: unset! any unset! pos2: emit | pos1: skip some [ pos2: unset! :pos2 emit break | pos2: skip ( cont: if strict-not-equal? :pos1/1 :pos2/1 [[end skip]] ) cont | pos2: emit break ] ]] out ] | |
For old versions of R2, you might want to change IF STRICT-NOT-EQUAL? to UNLESS STRICT-EQUAL?. There used to be a bug in STRICT-NOT-EQUAL? in R2, but I forget how far back it was fixed, one of the last couple versions. | |
Endo 9-Aug-2012 [759x2] | The last version crash on string! values: rle "" rle "xxx" both crashes the View 2.7.8.3.1 |
Here is the benchmark results (execution time for 1.000.000 calls) >> benchmark [rle [a a a b b c A A a a]] ;BrianH (the old one, not the crashing ones) == 0:00:29.765 >> benchmark [rle [a a a b b c A A a a]] ;endo == 0:00:32.953 | |
DocKimbel 9-Aug-2012 [761] | Endo: you should rather bench on one long series rather than 1M times on a small one in order to avoid function calls overhead and get a more fair comparison. When I try with a 1M size string with random a,b,c chars, my parse solution is twice faster than the mezz one (Brian's one is crashing so can't test it). I was expecting a greater difference though. |
Steeve 9-Aug-2012 [762] | ... test |
Sunanda 9-Aug-2012 [763] | Talking of test, I am trying to write a simple function that checks if a data item matches a rebol datatype, so for example: print is-it-a? "number?" "12.5" == true print is-it-a? "number?" "xxx" == false print is-it-a? "number?" "?" == false Except my function goes bad on that third example -- it prints the console help text. Any thoughts on how to check incoming values without executing them as code? Thanks is-it-a?: func [ data-type [string!] value [string!] ][ data-type: first load/all data-type error? try [ value: first load/all value return do reduce [data-type value] ] false ] |
Steeve 9-Aug-2012 [764] | is-a: func [f v][ not not all [ f: get/any load f any-function? :f f load v ] ] |
Sunanda 9-Aug-2012 [765] | Thanks Steeve, that's much more robust than my code :) Just for info ..... ....It needs some error trapping to handle un-loadable values, eg: >> is-a "number?" "33.e" ** Syntax Error: Invalid decimal -- 33.e ....And (like my code) it's not so good with 'true and 'false owing to the way REBOL works: >> is-a "logic?" "true" == false But it'll do the job! |
BrianH 9-Aug-2012 [766x4] | Weird, I found the bug in R2 parse that causes the crash: >> parse "" [integer!] == false >> parse "" [unset!] == true ; should be false |
Two rules it is then. This doesn't crash, and is optimized for strings while we're at it. It's probably slower when doing blocks than Doc's, but it handles all datatypes: rle: func [s [series!] /local out emit pos1 pos2 cont][ out: make block! 1 emit: [(repend out [offset? :pos1 :pos2 first :pos1])] parse/case/all :s pick [[ any [pos1: skip (cont: first :pos1) any cont pos2: emit] ] [ any [ pos1: unset! any unset! pos2: emit | pos1: skip some [ end pos2: emit break | pos2: unset! :pos2 emit break | pos2: skip ( cont: unless strict-equal? first :pos1 first :pos2 [[end skip]] ) cont | pos2: emit break ] ] ]] any-string? :s out ] It also works around the strict-not-equal? bug in pre-2.7.7 R2, and using FIRST instead of path access is another speedup in R2 (path access is faster in R3). | |
Change the repend to chained inserts and it gets noticably faster, due to less mezzanine overhead: rle: func [s [series!] /local out emit pos1 pos2 cont][ out: make block! 2 emit: [(out: insert/only insert out offset? :pos1 :pos2 first :pos1)] parse/case/all :s pick [[ any [pos1: skip (cont: first :pos1) any cont pos2: emit] ] [ any [ pos1: unset! any unset! pos2: emit | pos1: skip some [ end pos2: emit break | pos2: unset! :pos2 emit break | pos2: skip ( cont: unless strict-equal? first :pos1 first :pos2 [[end skip]] ) cont | pos2: emit break ] ] ]] any-string? :s head out ] | |
You can get rid of this line too for a slight speedup: end pos2: emit break | It's a leftover when I was trying to work around the bug in PARSE. | |
Maxim 9-Aug-2012 [770] | providing an optional output buffer (which the user can pre-allocate to some ideal size) would make a VERY big difference on large inputs. usually, when it goes into the hundreds of thousands, repetitive series re-allocation on growing mutable series, will kill any kind of optimisation you can dream of. rle: func [s [series!] /into out [block!] /local out emit pos1 pos2 cont][ out: any [ out make block! 2 ] ... ] this is especially effective on repetitive calls to the above function and using clear on the given buffer so that it auto-grows to an optimal size and is fast on later calls. just today, I was doing some encryption benchmarking and when I hit strings larger than 1MB it was taking several minutes... thats until I realized that it was my dataset generator (a looped string insert) which was taking 98% of the cpu time. ! |
BrianH 9-Aug-2012 [771x3] | If you follow the /into option standard you can do chained calls to RLE too: rle: func [s [series!] /into out [any-block!] /local emit pos1 pos2 cont][ unless into [out: make block! 2] emit: [(out: insert/only insert :out offset? :pos1 :pos2 first :pos1)] parse/case/all :s pick [[ any [pos1: skip (cont: first :pos1) any cont pos2: emit] ] [ any [ pos1: unset! any unset! pos2: emit | pos1: skip some [ pos2: unset! :pos2 emit break | pos2: skip ( cont: unless strict-equal? first :pos1 first :pos2 [[end skip]] ) cont | pos2: emit break ] ] ]] any-string? :s either into [:out] [head :out] ] |
>> head rle/into "ddeeee" rle/into "aaabbc" make block! 10 == [3 #"a" 2 #"b" 1 #"c" 2 #"d" 4 #"e"] | |
Functions with /into work like INSERT when chained together. This works with other block types too, though OFFSET? will be slow with the list! type. | |
Maxim 9-Aug-2012 [774] | now that's a pretty nice RLE encoder. :-) |
BrianH 9-Aug-2012 [775x3] | R3 version, same /into option: rle: funct ["Run length encode" s [series!] /into output [any-block!]] [ unless into [output: make block! 2] x: none r: either any-block? :s [qr: copy [quote 1] [(qr/2: :x) any qr]] [[any x]] parse/case :s [any [pos1: set x skip r pos2: ( output: reduce/into [subtract index? :pos2 index? :pos1 :x] :output )]] either into [:output] [head :output] ] |
Sorry, same unset problems, have to use POKE: rle: funct ["Run length encode" s [series!] /into output [any-block!]] [ unless into [output: make block! 2] x: none r: either any-block? :s [qr: copy [quote 1] [(poke qr 2 :x) any qr]] [[any x]] parse/case :s [any [pos1: set x skip r pos2: ( output: reduce/into [subtract index? :pos2 index? :pos1 :x] :output )]] either into [:output] [head :output] ] | |
>> mold/all rle reduce [() () 'a () 1] == "[2 #[unset!] 1 a 1 #[unset!] 1 1]" | |
Endo 10-Aug-2012 [778x2] | It is what I like about this community :) I knew that when I write a RLE function, BrianH will come up a much better version. Doc and others joined as well and now we have a very good function. Just like the CSV tools. Thanks. |
Ehm.. what about the decoder? how do I decode unset! values? I was using somthing like: decode-rle: func [b /local r] [r: copy [] foreach [x y] b [loop x [append r y]]] | |
older newer | first last |