Series comparison techniques

[1/8] from: atruter::labyrinth::net::au at: 14-Aug-2003 13:56

Given the following code: <code> search-series: func [series [series!] operator [string!] value] [ clear rowids switch operator [ "=" [operator: :equal?] "<>" [operator: :not-equal?] "<" [operator: :lesser?] ">" [operator: :greater?] "<=" [operator: :lesser-or-equal?] ">=" [operator: :greater-or-equal?] "like" [operator: :find] ] start: now/time/precise repeat pos length? series [ if operator series/:pos value [insert tail rowids pos] ] print now/time/precise - start rowids ] max-rows: 100000 ;max-rows: 1000000 ;max-rows: 10000000 rowids: make block! max-rows block: make block! max-rows loop max-rows [insert tail block random 1000] </code> I was reasonably happy with the time taken to evaluate: search-series block "=" 1 for 100,000 1,000,000 and 10,000,000 values respectively (taking .14, 1.375 and 14 seconds on my Win2000 AMD1800XP 256Mb system). Increasing the data volumes by a factor of 10 increased the time taken by a similar amount. Since tests for equality are important, I added the following function to do this: <code> find-value: func [series [series!] value] [ clear rowids start: now/time/precise while [not none? series: find series value][ insert tail rowids index? series series: next series ] print now/time/precise - start rowids ] </code> and was [pleasantly] surprised to see that: find-value block 1 returned results in .015, .063 and .59 seconds respectively. Which got me thinking about *how* REBOL could do this given: 1) The series is not sorted 2) Values are not unique 3) Runtime increases at a lesser rate than data volume I am now wondering what other "clever" techniques / algorithms are available to handle other operators (eg. <>, <, >, etc) Thoughts? Regards, Ashley

[2/8] from: andrew:martin:colenso:school at: 14-Aug-2003 17:12

Ashley wrote (in two different functions):

> if equal? series/:pos value [insert tail rowids pos] > series: find series value

I think you'll find that you're measuring the difference in time between if equal? series/:pos value and the internal operation of "find series value". Native! values are faster than mezzanine functions. Andrew J Martin Attendance Officer & Information Systems Trouble Shooter Colenso High School Arnold Street, Napier. Tel: 64-6-8310180 ext 826 Fax: 64-6-8336759 http://colenso.net/scripts/Wiki.r?AJM http://www.colenso.school.nz/ DISCLAIMER: Colenso High School and its Board of Trustees is not responsible (or legally liable) for materials distributed to or acquired from user e-mail accounts. You can report any misuse of an e-mail account to our ICT Manager and the complaint will be investigated. (Misuse can come in many forms, but can be viewed as any material sent/received that indicate or suggest pornography, unethical or illegal solicitation, racism, sexism, inappropriate language and/or other issues described in our Acceptable Use Policy.) All outgoing messages are certified virus-free by McAfee GroupShield Exchange 5.10.285.0 Phone: +64 6 843 5095 or Fax: +64 6 833 6759 or E-mail: [postmaster--colenso--school--nz]

[3/8] from: atruter:labyrinth:au at: 14-Aug-2003 16:11

Hi Andrew,

> I think you'll find that you're measuring the difference in time between > "if equal? series/:pos value" and the internal operation of "find series > value". Native! values are faster than mezzanine functions.

How so? EQUAL?, IF and REPEAT are all native! functions. Cutting the search-series function down to a bare minimum, as in: <code> find-value2: func [series [series!] value] [ clear rowids start: now/time/precise repeat pos length? series [ if series/:pos = value [insert tail rowids pos] ] print now/time/precise - start rowids ] </code> is about 10% faster (.125, 1.219 and 12.156 respectively) but still not in the same league as find. Find seems able to iterate over a series of values faster than a loop (which I understand) but it also seems to take less time [per value] as more values are present (which is what I do not understand). More is less, but how? Regards, Ashley

[4/8] from: g:santilli:tiscalinet:it at: 14-Aug-2003 14:44

Hi Ashley, On Thursday, August 14, 2003, 5:56:03 AM, you wrote: AT> returned results in .015, .063 and .59 seconds respectively. Which got me AT> thinking about *how* REBOL could do this given: I think what you see there is the overhead time. If you try with a bigger series so that the overhead (function call, etc.) is not noticeable you'll probably find that the search is just linear. :) Regards, Gabriele. -- Gabriele Santilli <[g--santilli--tiscalinet--it]> -- REBOL Programmer Amiga Group Italia sez. L'Aquila --- SOON: http://www.rebol.it/

[5/8] from: nitsch-lists:netcologne at: 14-Aug-2003 16:26

Gabriele Santilli wrote:

>Hi Ashley, > >On Thursday, August 14, 2003, 5:56:03 AM, you wrote: > >AT> returned results in .015, .063 and .59 seconds respectively. Which got me >

to short ~ 0.063 * 10 AFAIK most multitaskers do 50 or so switches a second. so their resolution is ~0.01. if the first example needs really 0.006 seconds, that would be not visible.

[6/8] from: greggirwin:mindspring at: 14-Aug-2003 9:09

Hi Ashley, AT> search-series: func [series [series!] operator [string!] value] [ AT> clear rowids AT> switch operator [ AT> "=" [operator: :equal?] AT> "<>" [operator: :not-equal?] AT> "<" [operator: :lesser?] AT> ">" [operator: :greater?] AT> "<=" [operator: :lesser-or-equal?] AT> ">=" [operator: :greater-or-equal?] AT> "like" [operator: :find] AT> ] Did you consider passing the operators in directly as the argument, rather than passing a string and mapping it? Re: FIND AT> ...returned results in .015, .063 and .59 seconds respectively. Which got me AT> thinking about *how* REBOL could do this given: AT> 1) The series is not sorted AT> 2) Values are not unique AT> 3) Runtime increases at a lesser rate than data volume Natives, like FIND and SELECT, are always preferred for performance reasons. If you look at the line inside your loop: if operator series/:pos value [insert tail rowids pos] You're calling IF on every iteration. Even if it's a native, that's still a function call with overhead. OPERATOR will also require a call, so that's two. Then SERIES/:POS has to be evaluated, which means path evaluation, bounds checking, and lookup. Finally VALUE has to be evaluated as well. And while REPEAT is pretty efficient, it's not going to compete with a native machine code loop. I still find it instructive to test various constructs to see how fast they are. Sometimes things will surprise you. -- Gregg

[7/8] from: nitsch-lists:netcologne at: 14-Aug-2003 18:44

Volker Nitsch wrote:

>Gabriele Santilli wrote: >>Hi Ashley,

<<quoted lines omitted: 5>>

> to short > ~ 0.063 * 10

I messed with spaces.. .015 (to short), .063 and .59 (~ 10 * .063) seconds. which ifts with your *10-increments.

>AFAIK most multitaskers do 50 or so switches a second. so their >resolution is ~0.01. >if the first example needs really 0.006 seconds, that would be not visible. >

Also long ago PC-clocks had such poor resolution. Do not now if that has improved.

[8/8] from: maximo:meteorstudios at: 14-Aug-2003 13:02

> >AFAIK most multitaskers do 50 or so switches a second. so their > >resolution is ~0.01.

I've read once that: windows switches in the tens of switches per second unix switches in the hundreds of switches per second amiga switches in the thousands of switches per second which seems quite right according to my useage of each one ... but I couldn't prove it... of course. -MAx

Notes

Quoted lines have been omitted from some messages.
View the message alone to see the lines that have been omitted