[algo] Intervals and parsing
[1/7] from: didec:tiscali at: 11-Mar-2004 18:24
[algo] Intervals and parsing
Hi all,
For one of my apps, I have to manage a very long list of integer!
I need to know if an integer! is in the list, and i need to add it to the list.
To avoid an infinite block of integer! that will overflow the memory for nothing, and
considering the fact that most of the values will follow themselve,
I use an intervals dialect and some funcs.
The values are stored in a block, but consecutives values are replaced by first and last
values separate by a minus sign "-".
I.e. : [0 1 2 3 4 5 10 11 12 13 15 19 20 21 22]
become [0 - 5 10 - 13 15 19 - 22]
I use 2 functions, one to check if a value exists in list, and one to add the value to
the list.
I post the funcs here for 2 reasons :
1) Maybe someone would need it
2) Maybe someone can do better or simpler
Anyway, comments are welcome
DideC
----- SCRIPT -----
REBOL [
title: "Manage list of integer intervals"
author: "Didier Cadieu"
date: 11-march-2004
version: 1.0
]
intervals: context [
where: none
included?: func [
"Return true if Val is inside an intervals block."
itv [block!] "Block of integers / intervals (x - y)"
val [integer!] "Value to search"
/local pos brk
] [
res: where: brk: none
parse itv [
any [
pos: set i1 integer! '- set i2 integer! (
if any [
all [i1 <= val val <= i2 res: true]
all [any [i1 - val = 1 val - i2 = 1] where: pos]
all [val < i1 not brk where: pos]
] [brk: true]
) |
pos: set i1 integer! (
if any [
all [i1 = val res: true]
all [1 = abs i1 - val where: pos]
all [val < i1 not brk where: pos]
] [brk: true]
)
] pos: end (any [brk res where: pos])
]
res
]
include: func [
"Change intervals block to include Val."
itv [block!] "Block of integers / intervals (x - y)"
val [integer!] "Value to include"
/local pos brk
] [
brk: none
parse itv [
any [
pos: set i1 integer! '- set i2 integer! (
any [
brk
all [i1 <= val val <= i2 brk: true]
all [i1 - val = 1 change pos val brk: true]
all [val - i2 = 1 change next next pos val brk: true]
all [val < i1 insert pos val brk: true]
]
) |
pos: set i1 integer! (
any [
brk
all [i1 = val brk: true]
all [i1 - val = 1 insert pos reduce [val '-] brk: true]
all [val - i1 = 1 insert next pos reduce ['- val] brk: true]
all [val < i1 insert pos val brk: true]
]
)
] pos: end (any [brk insert pos val])
]
regroup itv
]
regroup: func [
"Simplify intervals block if possible"
itv [block!] "Block of integers / intervals (x - y)"
/local pos brk
] [
brk: none
parse itv [
any [
pos: set i1 integer! set i2 integer! (
any [
brk
all [i2 - i1 = 1 (
remove pos
if all [not tail? next pos '- = second pos] [remove/part pos 2]
brk: true)
]
]
) | skip
]
]
]
]
;***** some manual tests
; Type "ii num" to test a num in IT interval
; Type "ato num" to test inclusion of num in IT. IT is never changed.
it: [-1 1 4 - 5 7 - 10 12 20 - 22 30 ]
ii: func [val] [
intervals/included? it val
prin res prin " "
probe intervals/where
]
ato: func [val /local it2] [
intervals/include it2: copy it val
probe head it2
]
;***** Some automatic Tests
random/seed now/time/precise
i: copy []
test1: does [
print "--- Test1 : includes values ---"
loop 50 [intervals/include i v: random 40 print [v "->" mold i]]
ask "Press enter to continue"
print "^/"
]
test2: does [
print "--- Test2 : check if values are included?---"
probe i
loop 50 [print [v: (random 50) - 5 "?" either intervals/included? i v ["yes"][" no"]]]
ask "Press enter to continue"
print "^/"
]
test1 test2
print {Type "test1" or "test2" to redo a test}
halt
[2/7] from: tomc:darkwing:uoregon at: 12-Mar-2004 0:10
On Thu, 11 Mar 2004, Didec wrote:
> [algo] Intervals and parsing
> Hi all,
<<quoted lines omitted: 11>>
> Anyway, comments are welcome
> DideC
Hi DideC,
as much as I do like parse I am unconvinced it is the ideal solution for
managing a list of integers.
I changed your interval datastructure to be a list of pairs!,
each representing an interval, and used binary search instead of parse
to locate solutions/insertion-points.
it has not been tested beyond a few runs of your automatic testing and it
is incomplete as it does not merge adjecent intervals when they meet.
but I would be curious as to how it fairs in comparison to the parse
version on your large data sets.
--- script ---
REBOL [
Title: "binary interval search on pairs"
Author: "Tom Conlin"
Date: 2004-Mar-11
Version: 0.0.1
File: %binary-interval-search.r
Purpose: {DideC: I have to manage a very long list of integer!
I need to know if an integer! is in the list, and i need to add it to the
list.
To avoid an infinite block of integer! that will overflow the memory for
nothing, and considering the fact that most of the values will follow,
I use an intervals dialect and some funcs.
The values are stored in a block, but consecutives values are replaced by
first and last values separate by a minus sign "-".
I.e. : [0 1 2 3 4 5 10 11 12 13 15 19 20 21 22]
become [0 - 5 10 - 13 15 19 - 22]
I use 2 functions, one to check if a value exists in list, and one to add
the value to the list.
}
Comment: {}
History: []
]
;;; adapted from
;;;
http://www.rebol.org/cgi-bin/cgiwrap/rebol/download-a-script.r?script-name=binary-search.r
binary-interval-search: func [
interval[block!] {smaller to bigger sequence of disjoint pairs}
key [integer!] "integer to find in the intervals"
/local lo mid hi tmp
][
lo: 1
hi: length? interval
if hi <= lo [return 0]
mid: to-integer (hi / 2)
while[hi >= lo][
tmp: pick interval mid
either (key >= tmp/1) and (key <= tmp/2)
[return mid]
[either key > tmp/2 [lo: mid + 1][hi: mid - 1]]
mid: to-integer (lo + ((hi - lo) / 2))
]
mid
]
intervals: context[
where: 1
included?: func[interval [block!] value [integer!]][
where: binary-interval-search interval value
;true all[not zero? where value >= interval/:where/1 value <interval/:where/2]
]
include: func[interval [block!] value [integer!] /local here][
if included? interval value[return]
here: at interval where
either tail? here[insert here 1x1 * value]
[;print [value 'found first here 'at where]
either value > here/1/2
[either 1 + here/1/2 = value
[here/1/2: value]
[here: next here
either all [not tail? here -1 + here/1/1 value]
[here/1/1: value]
[insert here 1x1 * value]
]
]
[either -1 + here/1/1 = value
[here/1/1: value]
[here: back here
either 1 + here/1/2 = value
[here/1/2: value]
[insert here 1x1 * value]
]
]
]
;;; mearging adjecent pairs is left as an exercise
]
]
;;;;;;;;;;;;;;;;;;;;;;;;
it: [-1x-1 1x1 4x5 7x10 12x12 20x22 30x30 ]
ii: func [val] [
prin intervals/included? it val
prin " "
probe intervals/interval/where
]
ato: func [val /local it2] [
intervals/include it2: copy it val
probe head it2
]
;***** Some automatic Tests
random/seed now/time/precise
i: copy []
test1: does [
print "--- Test1 : includes values ---"
loop 50 [intervals/include i v: random 40 print [v "->" mold i]]
ask "Press enter to continue"
print "^/"
]
test2: does [
print "--- Test2 : check if values are included?---"
probe i
loop 50 [print [v: (random 50) - 5 "?" either intervals/included?
i v ["yes"][" no"]]]
ask "Press enter to continue"
print "^/"
]
test1 test2
print {Type "test1" or "test2" to redo a test}
halt
[3/7] from: didec:tiscali at: 12-Mar-2004 14:07
Re: [algo] Intervals and parsing
See at bottom
> On Thu, 11 Mar 2004, Didec wrote:
> > [algo] Intervals and parsing
<<quoted lines omitted: 31>>
> but I would be curious as to how it fairs in comparison to the parse
> version on your large data sets.
Thank you very much Tom.
I have not make benchmark to compare, but I like the pair! idea and the dichotomical
search. It will be surely faster as the most search occurs on the high value (end of
the series.
Sure, for performance, the best is to reverse the order of the series or to search from
the end (in my case)
But performance is not the main problem, as this system reduce the series from thousand
of values to about 15 intervals.
But I keep your idea in mind.
FOR PARSE GURUS !
How can I stop the parsing if I find the correct value (and considering the test is done
in the code next to the rule, not by the rule himself) ?
DideC
[4/7] from: maarten:vrijheid at: 12-Mar-2004 14:56
> FOR PARSE GURUS !
>
> How can I stop the parsing if I find the correct value (and considering the test is
done in the code next to the rule, not by the rule himself) ?
>
use the 'break keyword from the parse dialect, core 2.5.3 and up
http://www.rebol.com/docs/changes.html#section-3.6
--Maarten
[5/7] from: nitsch-lists:netcologne at: 12-Mar-2004 16:02
On Freitag, 12. M=E4rz 2004 14:07, Didec wrote:
> Re: [algo] Intervals and parsing
>
>
> FOR PARSE GURUS !
>
> How can I stop the parsing if I find the correct value (and considering the
> test is done in the code next to the rule, not by the rule himself) ?
That means
parse s[
copy something (if xy = something [i-want-to-break])
]
and now you don't know how to tell the parser to stop?
I use this trick:
parse s[
copy something (
break-or-not: either xy = something [ 'break ][ [] ]
) break-or-not
]
so setting a dummy-rule following the paren to 'break.
Looks ugly but works.
> DideC
-Volker
[6/7] from: pwawood::mango::net::my at: 13-Mar-2004 8:27
Hi DideC
As your list of integers seems to be very densely populated, have you
considered storing those present rather than those missing ?
Peter
On Friday, Mar 12, 2004, at 21:07 Asia/Kuala_Lumpur, Didec wrote:
[7/7] from: pwawood:mango:my at: 13-Mar-2004 8:40
Oops, I meant storing those missing rather than those presenting.
Peter
On Saturday, Mar 13, 2004, at 08:27 Asia/Kuala_Lumpur, Peter WA Wood
wrote:
Notes
- Quoted lines have been omitted from some messages.
View the message alone to see the lines that have been omitted