Mailing List Archive: 49091 messages
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

newlines

 [1/47] from: arolls:idatam:au at: 8-Nov-2001 12:22


Joel, I am shocked that you would repeat yourself so much in rebol. You should have been immediately suspicious that there should be an easier way as soon as you wrote the same thing twice. ;) I am assuming you must have been too busy to spend much time on it. :) As I wrote in my other post on this thread the one line answer is: while [p: find s "^/^/^/"][remove p] where s is the string to process, and the result should have no more than two newlines in a row. Anton.

 [2/47] from: joel:neely:fedex at: 7-Nov-2001 17:07


Hi, Anton, Anton Rolls wrote:
> As I wrote in my other post on this thread > the one line answer is: > > while [p: find s "^/^/^/"][remove p] >
In a word, "speed". In a wide range of test scenarios, that is slower that some alternatives. (However, it has a cousin that is one of the fastest...) I'll post some benchmarks and explanations tomorrow. -jn- -- What a distressing contrast there is between the radiant intelligence of the child and the feeble mentality of the average adult. -- Sigmund Freud joel0dot0FIX0PUNCTUATION0neely0at0fedex0dot0com

 [3/47] from: joel:neely:fedex at: 7-Nov-2001 18:09


Hello, again, Anton (and all)
> Anton Rolls wrote: > >
<<quoted lines omitted: 7>>
> is one of the fastest...) > I'll post some benchmarks and explanations tomorrow.
OK, so I got done and decided to go ahead and post. Quick summary: There are several options for squishing runs of newlines down to no more than two consecutive newlines. (For sake of visibility, I've used dots in my tests instead of newlines, but the principle is obviously the same.) Which approach is fastest depends greatly on the length of the string being processed and the probability of various run lengths. However, some approaches are less sensitive to those variations than others. Candidates: There are several ways to collapse dot-runs in S (a string). The candidates are given short labels, used in the benchmark results table later on. FIND: while [p: find s "..."][remove p] Advantages - short, simple code. Disadvantages - bogs down *badly* with longer strings and higher frequency of collapsable runs. REPL: newlen: oldlen: length? s until [ oldlen: newlen oldlen = newlen: length? replace/all s "..." ".." ] Advantages - OK performance unless many collapsable runs. Disadvantages - Slightly longer code. BINR: newlen: length? s until [ oldlen: newlen oldlen = newlen: length? replace/all s ".................." ".." ] replace/all s ".........." ".." replace/all s "......" ".." replace/all s "...." ".." replace/all s "..." ".." Advantages - Fastest on longer strings and/or more frequent collapsable runs, but also competes well in other cases. Disadvantages - Design not obvious (until you think about it ;-). COPY: t: make string! length? s nb: 0 foreach ch s [ either ch = #"." [ if 2 >= nb: nb + 1 [ insert tail t ch ] ][ insert tail t ch nb: 0 ] ] t Advantages - Reasonably clear and easy to redevelop; one of the faster choices when many collapsable runs are present. Disadvantages - Poor performance when little collapsing can be done. PASS: p: s while [p: find p "..."][remove p] Advantages - Short and sweet, works best when few collapsing can be done. Disadvantages - Loses when lots of collapsing on long strings is called for. Benchmarks: Randomly-generated strings containing only #"x" and #"." are constructed and identical copies are given to each candidate for processing. Lengths of 10000, 20000, and 30000 characters are tested, with order 0 probability of dots (the collapsed character) set at 10%, 50%, and 90%. Each mini-table is preceded by the number of tests averaged together for that case (always 10), the generated string length, and a string showing relative proportions of #"x" and #"." characters. The label for a candidate is followed by an average time and a percentage relative to the longest time in that test. 10 10000 xxxxxxxxx. pass 0.006044 1 repl 0.0089181 1.5 binr 0.0120582 2.1 find 0.015253 2.6 copy 0.5802009 100 10 10000 xxxxx..... pass 0.1136382 7.2 binr 0.118034 7.5 repl 0.1985032 12.6 copy 0.5821541 36.9 find 1.5794435 100 10 10000 x......... binr 0.1779586 4.4 copy 0.4265818 10.6 pass 0.6116223 15.2 repl 0.8901854 22.1 find 4.0279091 100 10 20000 xxxxxxxxx. pass 0.0107016 1 repl 0.0152895 1.4 binr 0.0295117 2.6 find 0.0326937 2.9 copy 1.1235145 100 10 20000 xxxxx..... binr 0.3812319 6.2 pass 0.4503699 7.4 repl 0.5350547 8.7 copy 1.161134 19 find 6.1152957 100 10 20000 x......... binr 0.4514966 2.8 copy 0.7919637 4.8 repl 2.3939795 14.7 pass 2.6035071 15.9 find 16.3293515 100 10 30000 xxxxxxxxx. pass 0.0153792 1 binr 0.0297831 1.9 repl 0.0326177 2.1 find 0.084752 5.3 copy 1.5871371 100 10 30000 xxxxx..... binr 0.7744618 5.6 pass 0.957589 7 repl 1.1838233 8.6 copy 1.7033773 12.4 find 13.7269263 100 10 30000 x......... binr 0.9456252 2.6 copy 1.1857443 3.3 repl 5.2805771 14.8 pass 5.7582673 16.1 find 35.7992586 100 Conclusions: FIND is the worst of the lot. Most cases have BINR and PASS as neck and neck for best, with COPY a close contender for large strings with long runs. However, when PASS loses ground, it loses big. Likewise (only more so) for REPL. BINR is arguably the best all-purpose solution, as it does reasonably well in a wide variety of scenarios, and even when not the fastest, it doesn't bog down too badly. -jn- -- There is no such thing as exact type measurement. -- Just van Rossum joel]FIX]PUNCTUATION]dot]neely]at]fedex]dot]com

 [4/47] from: brett:codeconscious at: 8-Nov-2001 20:00


Hi Joel, Thanks for another interesting benchmarking post. For curiosity, could you add these late candidates to your benchmark tests? method1: [any [".." mark1: any #"." mark2: (remove/part mark1 mark2) | skip ]] parse/all input method1 other-char: complement charset {.} method2: [any [some other-char | ".." mark1: any #"." mark2: (remove/part mark1 mark2) | skip ] ] parse/all input method2 Brett.

 [5/47] from: lmecir:mbox:vol:cz at: 8-Nov-2001 10:38


Hi Joel, I will try to add my 0,02 CZK. a) there is another disadvantage to the COPY algorithm you didn't mention. It is not an in-place algorithm and that is why it needs more memory. b) Some algorithms are complicated, that is why a different approach (not using REPLACE) should be considered too: while [s: find s #"."] [ parse next s [any #"." t:] s: either (index? s) + 2 < (index? t) [ remove/part skip s 2 t ] [t] ] Joel: << OK, so I got done and decided to go ahead and post. Quick summary: There are several options for squishing runs of newlines down to no more than two consecutive newlines. (For sake of visibility, I've used dots in my tests instead of newlines, but the principle is obviously the same.) Which approach is fastest depends greatly on the length of the string being processed and the probability of various run lengths. However, some approaches are less sensitive to those variations than others. Candidates: There are several ways to collapse dot-runs in S (a string). The candidates are given short labels, used in the benchmark results table later on. FIND: while [p: find s "..."][remove p] Advantages - short, simple code. Disadvantages - bogs down *badly* with longer strings and higher frequency of collapsable runs. REPL: newlen: oldlen: length? s until [ oldlen: newlen oldlen = newlen: length? replace/all s "..." ".." ] Advantages - OK performance unless many collapsable runs. Disadvantages - Slightly longer code. BINR: newlen: length? s until [ oldlen: newlen oldlen = newlen: length? replace/all s ".................." ".." ] replace/all s ".........." ".." replace/all s "......" ".." replace/all s "...." ".." replace/all s "..." ".." Advantages - Fastest on longer strings and/or more frequent collapsable runs, but also competes well in other cases. Disadvantages - Design not obvious (until you think about it ;-). COPY: t: make string! length? s nb: 0 foreach ch s [ either ch = #"." [ if 2 >= nb: nb + 1 [ insert tail t ch ] ][ insert tail t ch nb: 0 ] ] t Advantages - Reasonably clear and easy to redevelop; one of the faster choices when many collapsable runs are present. Disadvantages - Poor performance when little collapsing can be done. PASS: p: s while [p: find p "..."][remove p] Advantages - Short and sweet, works best when few collapsing can be done. Disadvantages - Loses when lots of collapsing on long strings is called for. Benchmarks: Randomly-generated strings containing only #"x" and #"." are constructed and identical copies are given to each candidate for processing. Lengths of 10000, 20000, and 30000 characters are tested, with order 0 probability of dots (the collapsed character) set at 10%, 50%, and 90%. Each mini-table is preceded by the number of tests averaged together for that case (always 10), the generated string length, and a string showing relative proportions of #"x" and #"." characters. The label for a candidate is followed by an average time and a percentage relative to the longest time in that test. 10 10000 xxxxxxxxx. pass 0.006044 1 repl 0.0089181 1.5 binr 0.0120582 2.1 find 0.015253 2.6 copy 0.5802009 100 10 10000 xxxxx..... pass 0.1136382 7.2 binr 0.118034 7.5 repl 0.1985032 12.6 copy 0.5821541 36.9 find 1.5794435 100 10 10000 x......... binr 0.1779586 4.4 copy 0.4265818 10.6 pass 0.6116223 15.2 repl 0.8901854 22.1 find 4.0279091 100 10 20000 xxxxxxxxx. pass 0.0107016 1 repl 0.0152895 1.4 binr 0.0295117 2.6 find 0.0326937 2.9 copy 1.1235145 100 10 20000 xxxxx..... binr 0.3812319 6.2 pass 0.4503699 7.4 repl 0.5350547 8.7 copy 1.161134 19 find 6.1152957 100 10 20000 x......... binr 0.4514966 2.8 copy 0.7919637 4.8 repl 2.3939795 14.7 pass 2.6035071 15.9 find 16.3293515 100 10 30000 xxxxxxxxx. pass 0.0153792 1 binr 0.0297831 1.9 repl 0.0326177 2.1 find 0.084752 5.3 copy 1.5871371 100 10 30000 xxxxx..... binr 0.7744618 5.6 pass 0.957589 7 repl 1.1838233 8.6 copy 1.7033773 12.4 find 13.7269263 100 10 30000 x......... binr 0.9456252 2.6 copy 1.1857443 3.3 repl 5.2805771 14.8 pass 5.7582673 16.1 find 35.7992586 100 Conclusions: FIND is the worst of the lot. Most cases have BINR and PASS as neck and neck for best, with COPY a close contender for large strings with long runs. However, when PASS loses ground, it loses big. Likewise (only more so) for REPL. BINR is arguably the best all-purpose solution, as it does reasonably well in a wide variety of scenarios, and even when not the fastest, it doesn't bog down too badly. -jn- -- There is no such thing as exact type measurement. -- Just van Rossum joel]FIX]PUNCTUATION]dot]neely]at]fedex]dot]com

 [6/47] from: lmecir:mbox:vol:cz at: 8-Nov-2001 11:25


Hi Brett, your methods don't work, see this:
>> input: "..................a...bcd.s.f.g.k.l.m.n.o.p.............."
== {..................a...bcd.s.f.g.k.l.m.n.o.p..............}
>> parse/all input method1
== true
>> input
== "..a...bcd.s.f.g.k.l.m.n.o.p.." Regards Ladislav Brett: <<Hi Joel, Thanks for another interesting benchmarking post. For curiosity, could you add these late candidates to your benchmark tests? method1: [any [".." mark1: any #"." mark2: (remove/part mark1 mark2) | skip ]] parse/all input method1 other-char: complement charset {.} method2: [any [some other-char | ".." mark1: any #"." mark2: (remove/part mark1 mark2) | skip ] ] parse/all input method2 Brett.

 [7/47] from: lmecir:mbox:vol:cz at: 8-Nov-2001 11:38


Hi myself, how about: while [s: find/tail s "..."] [ parse s [any #"." t:] s: remove/part back s t ] b) Some algorithms are complicated, that is why a different approach (not using REPLACE) should be considered too: while [s: find s #"."] [ parse next s [any #"." t:] s: either (index? s) + 2 < (index? t) [ remove/part skip s 2 t ] [t] ]

 [8/47] from: brett:codeconscious at: 8-Nov-2001 22:47


Hi Ladislav,
> == "..a...bcd.s.f.g.k.l.m.n.o.p.."
Well that was a bit of a blow. Thanks for picking it up. I had forgotten that I had made parse's index into the string out of whack when I did the remove. Here is the fixed (and no doubt slightly slower) versions: method1: [ any [ ".." mark1: any #"." mark2: (remove/part mark1 mark2) :mark1 | skip] ] parse/all s method1 other-char: complement charset {.} method2: [any [some other-char | ".." mark1: any #"." mark2: (remove/part mark1 mark2) :mark1 | skip ] ] parse/all s method2 Regards, Brett

 [9/47] from: lmecir:mbox:vol:cz at: 8-Nov-2001 13:39


I offer another one: method3: [any [thru ".." t: any #"." u: (remove/part t u) :t] to end] parse/all s method3 ----- Original Message ----- From: "Brett Handley" <[brett--codeconscious--com]> To: <[rebol-list--rebol--com]> Sent: Thursday, November 08, 2001 12:47 PM Subject: [REBOL] Re: newlines Hi Ladislav,
> == "..a...bcd.s.f.g.k.l.m.n.o.p.."
Well that was a bit of a blow. Thanks for picking it up. I had forgotten that I had made parse's index into the string out of whack when I did the remove. Here is the fixed (and no doubt slightly slower) versions: method1: [ any [ ".." mark1: any #"." mark2: (remove/part mark1 mark2) :mark1 | skip] ] parse/all s method1 other-char: complement charset {.} method2: [any [some other-char | ".." mark1: any #"." mark2: (remove/part mark1 mark2) :mark1 | skip ] ] parse/all s method2 Regards, Brett

 [10/47] from: joel:neely:fedex at: 8-Nov-2001 0:54


Hi, Ladislav, Ladislav Mecir wrote:
> Hi myself, > > how about: > > while [s: find/tail s "..."] [ > parse s [any #"." t:] > s: remove/part back s t > ] >
Ermmm... I don't think so. What am I missing?
>> s: copy packnewlines/s
== {x......x........x.xx......xx........x......x.x....x...... x..x.........x.x.x....x..x.........xx..............x.x....... ......x.....
>> while [s: find/tail s "..."] [
[ parse s [any #"." t:] [ s: remove/part back s t [ ] == "x.."
>> >> head s
** Script Error: head expected series argument of type: series port ** Near: head s
>> s
== none -jn- -- We say "gestalt" when things combine to act in ways we can't explain. -- Marvin Minsky joel/dot/neely/at/fedex/FIX/PUNCTUATION/dot/com

 [11/47] from: lmecir:mbox:vol:cz at: 8-Nov-2001 14:01


Yet another one: method4: [any [thru "..." t: any #"." u: (remove/part t: back t u) :t] to end] parse/all s method4

 [12/47] from: brett:codeconscious at: 9-Nov-2001 0:08


> I offer another one: > > method3: [any [thru ".." t: any #"." u: (remove/part t u) :t] to end] > parse/all s method3
It funny, I've got another two as well built around parse. Problem is they fail with exactly the same wrong result (as far as I can tell) as your proposal above. Note that they usually work but on some strings they dont. I have included a compressed version of one such string. Try running the code below. I sincerely hope it comes through the mail properly. For comparison I add my own two new functions at the bottom. I'm at a loss to understand where the problem is. BTW, I've just seen Joel's message re Ladislav's code [ Joel, substitute every s with a p and put a p: s at the start like your code examples. ] Back to my stuff... test-string2: decompress #{ 789C9D590196A53008BB91E7CBF1F7BDD9B10D90008E6FF7AB2D054A93589DE7 C1730E7DF97317EF43D7FF9FF32FF4148F2826DE6B6C81B3B04E4E7F1E8ADBD5 0FCF43717E7C0B574438D41E4E4FE98228621DBD984A88A33BAF495D241E7EAE 81DC7FFCA14D3B86FE7574FCBEE674860A13FC3421E8AAABC213C280272BEC71 52F5CE72F473BAFFC35CC1FDF152C5F17346C5C34413E9A50D7C7FBFFB8E55AD 143599984897288940383F4AA14400D286480A615A32EA21DA88A81A27A065A0 90C82A6A25E4944E53623A0D2D24F7CA09CD1BB219EF5552992B6C245EDD4BD0 183CC1435DE8E8E3E247E6AE214A2002A765207ED6C34600A5F1CE27CBAD3E62 E84A855085962CD673A084A3B4118CABA74ABD14DD3A6FCE06077B78E7E73816 D6BF13A0FAD05E24724C795916552F89487DC2AA3041383428BAC8DA44F8B8A9 6E0055469799E4125E5DB9039C339AE8B452E1422D9DB656DA731F4431454B8A 1D9B97FC5431B5FEBA315FB3201A4B03AA93A6E14FECD09E56044FDA35CE79FE 7AE5D349202E2D806860C320F4A3075F1C2789AB6EFA09BAEC6CED173EB38F8D C0398F2A3F5FCBC119F5D7ED19D5D4D716A6FDD3516A8B3B4F0F6AB3227D69C4 7D2565CD473D00488356D2F7E5B0833A0541F9550EF1A095212B20B9D9789924 0E2744C58EC0DB06E96DF7626D2AACFE8663F3FE6A39C22C8A4F1347B052F3AF F7B11508D0D689CA39451BC1DC64E5B707D86D637C505DAADA1A40A5063BD197 88A850955FB7E8322542AF4E13624A37F79B6DC1CA675BFA855B9A973483BE55 712A2B2654ACF4FFCF07BDAD0C3A377E6D494D609568D529F2F22DB6910699CC DDF46972A85BDC71B115D73FC295DD0A1B29E8A213A60AEFBC3EE90369FB48B5 A45978061FFF0B32116ABF20302D2F2BB3CEA35947113A710BD4523E0B735053 87EE3E6DD5A41D9272797F0944F101929DF057A83EE7616D085117E1EDE89214 6A01F4E0AF2819E23E94AF25A0790534BEDD527A082EE674EB63E16A25CE113E 112882E142C7EAC5C0D9C45199132F7F86AB8CE902D5220CB02E6BD27D0A3ED6 B40D8F46FA6966101D3D6CF8F784729A4C3B80BECAE5CACBE91E1ED3F97FA3FC BB69713B534436747F1693294D1106ABCACAE605CFADC868C491129824271AED 946B97454B6B1ED35B4730FB353325F92EBC52C93715976DE7D35A284D54538D 6E8597D21B18D7E461A76D24A09155BB16D97A87F5DBAB15F851E5CB9BD1E165 93EB332D969DB79D9B8BDF15E3AC873042DC7AB77A7246127DE447D9029E1632 CD5CBDD2E026321BC786B53EDF82E44FC8394B81F8A61AC553FDC3D74E3E92AF E45419559A484CE8919C4F56811A3F3AB390EDCB9FBDDB067B647C8A96A36BD5 764A6C08BDB7A4B3913293B674472B64FBC959706DC4C82DC10452AF2853717E FBFF017F5639C710270000 } LAD3: function [s][t u method3][ method3: [any [thru ".." t: any #"." u: (remove/part t u) :t] to end] parse/all s method3 RETURN s ] r: lad3 test-string2 if find r "..." [print "Oops. Problem."] BDH3: function [s][p a b][ p: s while [probe index? p parse/all p [ to "..." 2 skip a: some #"." b: (remove/part a b) to end]][ p: a ] RETURN s ] BDH4: function [s][p a b][ parse/all s [ any [to "..." 2 skip a: some #"." b: (remove/part a b) :a] ] RETURN s ]

 [13/47] from: lmecir:mbox:vol:cz at: 8-Nov-2001 14:15


Hi Joel, Joel:<< Ermmm... I don't think so. What am I missing?
>>s: copy packnewlines/s
== {x......x........x.xx......xx........x......x.x....x...... x..x.........x.x.x....x..x.........xx..............x.x....... ......x.....
>>
Simple: input: s while [s: find/tail s "..."] [ parse s [any #"." t:] s: remove/part back s t ] input
>> s: {x......x........x.xx......xx........x......x.x....x......
{ x..x.........x.x.x....x..x.........xx..............x.x....... { ......x.....} == {x......x........x.xx......xx........x......x.x....x...... x..x.........x.x.x....x..x.........xx..............x.x....... ...
>> while [s: find/tail s "..."] [
[ parse s [any #"." t:] [ s: remove/part back s t [ ] == ""
>> input
== "..a..bcd.s.f.g.k.l.m.n.o.p.." Cheers Ladislav

 [14/47] from: joel:neely:fedex at: 8-Nov-2001 1:02


Ladislav Mecir wrote:
> Hi Joel, > > I will try to add my 0,02 CZK. >
~LOL! Pardon my ignorance, but can you expand the acronym?
> a) there is another disadvantage to the COPY algorithm you > didn't mention. It is not an in-place algorithm and that > is why it needs more memory. >
Thanks for pointing that out. I usually work on big-memory boxen and don't think of that often enough. Yet another classic time-vs-space tradeoff!
> b) Some algorithms are complicated, that is why a different > approach (not using REPLACE) should be considered too:
<<quoted lines omitted: 4>>
> ] [t] > ]
Thanks! I'll add this to the benchmark, along with Brett's latest versions, and report results when done. -jn- -- We say "gestalt" when things combine to act in ways we can't explain. -- Marvin Minsky joel/dot/neely/at/fedex/FIX/PUNCTUATION/dot/com

 [15/47] from: joel:neely:fedex at: 8-Nov-2001 1:46


Ladislav Mecir wrote:
> I offer another one: > > method3: [any [thru ".." t: any #"." u: (remove/part t u) :t] to end] > parse/all s method3 >
There's still a bug (which I don't have attention span to hunt down at this point -- need more coffee...) Please check to make sure that I didn't mess it up in transcription! PRS3: remprs3: func [s [string!] /local t u] [ parse/all s [ any [ thru ".." t: any #"." u: (remove/part t u) :t ] to end ] ] This still intermittently leaves runs of dots near the end of the string, as shown by the following test output by my benchmarking harness. The second line of each failure message gives the length of the result, the length of the correct result, then the position and first few characters of a segment containing an uncompressed run of dots. FAILURE: prs3 2683 2680 @ 2679 ..... FAILURE: prs3 2708 2706 @ 2705 .... FAILURE: prs3 2664 2661 @ 2660 ..... FAILURE: prs3 5403 5397 @ 5396 ........ FAILURE: prs3 5341 5336 @ 5335 ....... FAILURE: prs3 5367 5353 @ 5346 ........x.......x... FAILURE: prs3 5409 5406 @ 5398 .....xxx.x.x FAILURE: prs3 5521 5518 @ 5512 ...x.x.... FAILURE: prs3 5290 5284 @ 5279 .....xx..... FAILURE: prs3 7923 7922 @ 7921 ... FAILURE: prs3 8306 8292 @ 8284 ...x.x.............. FAILURE: prs3 8208 8202 @ 8201 ........ I'll be glad to look at it again later, but am running short on time now. I'll post the results of what I have working and can rerun the benchmarks as needed. -jn- -- I finally understood that the derogatory term "Indian giver" referred, not to the Indians themselves -- but rather, to our treatment of the Indians. -- Stephen Samuel joel"FIX"PUNCTUATION"dot"neely"at"fedex"dot"com

 [16/47] from: joel:neely:fedex at: 8-Nov-2001 2:09


Hi, all, Incorporating two from Brett (PRS1 and PRS2) and one from Ladislav (PANY) and pruning out the also-rans from before, here's a new set of benchmarks. I should have pointed out that I owe Anton thanks for the idea that led to PASS; it was based on FIND, but stayed within the same string to avoid the quadratic search time. My rendering of the new versions (so that Brett and Ladislav can catch any mistakes I made in copying their ideas) are: PANY: while [s: find/tail s "..."] [ parse s [any #"." t:] s: remove/part back s t ] PRS1: parse/all s [ any [ ".." mark1: any #"." mark2: (remove/part mark1 mark2) :mark1 | skip ] ] PRS2: nondot: complement charset {.} parse/all s [ any [ some nondot | ".." mark1: any #"." mark2: (remove/part mark1 mark2) :mark1 | skip ] ] The format of these results is similar to the prior benchmarks, so I won't repeat the format explanations. Here we go! 10 10000 xxxxxxxxx. pass 0.0069106 17 pany 0.0076094 18.7 binr 0.0132569 32.6 prs2 0.0191155 47.1 prs1 0.0406086 100 10 10000 xxxxx..... prs1 0.0871603 74.6 prs2 0.0914357 78.3 pany 0.0934973 80 pass 0.1151389 98.6 binr 0.1168064 100 10 10000 x......... prs1 0.0696221 11.8 prs2 0.0700051 11.9 pany 0.1082873 18.3 binr 0.1837931 31.1 pass 0.5906865 100 10 20000 xxxxxxxxx. pany 0.0100858 13.3 pass 0.0108703 14.3 binr 0.0318814 41.9 prs2 0.035362 46.5 prs1 0.0760936 100 10 20000 xxxxx..... prs2 0.2665442 72.3 pany 0.2693659 73.1 prs1 0.2765645 75 binr 0.3451197 93.6 pass 0.3687329 100 10 20000 x......... prs1 0.2450383 11.4 prs2 0.2724432 12.7 pany 0.3112488 14.5 binr 0.4685498 21.9 pass 2.1415553 100 10 30000 xxxxxxxxx. pass 0.0140541 12.6 pany 0.0152802 13.7 binr 0.0286012 25.6 prs2 0.0626263 56 prs1 0.1117849 100 10 30000 xxxxx..... prs1 0.4763552 54.6 pany 0.5014254 57.5 prs2 0.544646 62.4 binr 0.635237 72.8 pass 0.8727375 100 10 30000 x......... prs1 0.4685633 8.7 prs2 0.4988843 9.3 pany 0.6283518 11.7 binr 0.8068603 15 pass 5.3884561 100 -jn- -- Then anyone who leaves behind him a written manual, and likewise anyone who receives it, in the belief that such writing will be clear and certain, must be exceedingly simple-minded... -- Plato FIX^PUNCTUATION^joel^dot^neely^at^fedex^dot^com

 [17/47] from: lmecir:mbox:vol:cz at: 8-Nov-2001 15:35


Hi Joel and Brett, the bug is a result of unexpected (for me) behaviour of PARSE:
>> parse s: "0123456789" [8 skip p: (print p remove/part s 3) :s (print
OK ) to end] 89 == false Joel: <<There's still a bug (which I don't have attention span to hunt down at this point -- need more coffee...)>> Cheers Ladislav

 [18/47] from: brett:codeconscious at: 9-Nov-2001 1:38


Thanks for the updated numbers Joel. An enlightening exercise. I would have liked to have included the last two from my stable, but unfortunately they are not guaranteed to finish the race. Both are suffering from the same problem that you encountered with Ladislav's I suspect. To which I'll start a new thread. Thanks Brett.

 [19/47] from: lmecir:mbox:vol:cz at: 8-Nov-2001 15:26


Ladislav Mecir wrote:
> Hi Joel, > > I will try to add my 0,02 CZK. >
~LOL! Pardon my ignorance, but can you expand the acronym? CZK = is an international (AFAIK) shortcut for Czech Crown (Ceska Koruna, alias Kc)
> a) there is another disadvantage to the COPY algorithm you > didn't mention. It is not an in-place algorithm and that > is why it needs more memory. >
Thanks for pointing that out. I usually work on big-memory boxen and don't think of that often enough. Yet another classic time-vs-space tradeoff! Actually, there is an easy way how to change it to an in-place algorithm. A Finite State Machine might be ideally suited for that.
> b) Some algorithms are complicated, that is why a different > approach (not using REPLACE) should be considered too:
<<quoted lines omitted: 4>>
> ] [t] > ]
Thanks! I'll add this to the benchmark, along with Brett's latest versions, and report results when done. -jn- -- We say "gestalt" when things combine to act in ways we can't explain. -- Marvin Minsky joel/dot/neely/at/fedex/FIX/PUNCTUATION/dot/com

 [20/47] from: brett:codeconscious at: 9-Nov-2001 1:59

Parse failure (was newlines)


> the bug is a result of unexpected (for me) behaviour of PARSE: > > >> parse s: "0123456789" [8 skip p: (print p remove/part s 3) :s (print > "OK") to end] > 89 > == false
I narrowed down the problem similarly (your example is more concise). I believe remove/part is the trigger for the problem. Most likely it leaves Parse with an obsolete series reference to the input. The index of this reference is beyond the tail of the input. I think that due to this, Parse fails early - just after your first ")" and before the ":s". So I think the real problem is that we do not have an opportunity early enough to correct Parse's reference. Which if confirmed, is a real issue because we lose some wonderul manipulation opportunities that I thought we had! Brett.

 [21/47] from: dockimbel:free at: 8-Nov-2001 16:39

Re: newlines


Hi Ladislav, It seems that 'parse is sometimes just too lazy to finish the job. :) Here's the result of my own tests using your method3 rule :
>> method3: [any [thru ".." t: any #"." u: (remove/part t u) :t] to end] >> s: "x........x....." ; x, 8 dots, x, 5 dots
== "x.......x....."
>> parse/all s method3
== true
>> s
== "x..x.." ; it works !
>> s: "x.........x....." ; x, 9 dots, x, 5 dots >> parse/all s method3
== true
>> s
== "x..x....." ; it fails !?! If you run these two tests with 'trace set to 'on, you'll see in the second test, that 'parse seems to give up when it reachs the second 'x, without any logical reason. IMO, it's probably a bug in 'parse. I've found a workaround to this issue :
>> method3: [any [thru ".." t: opt [any #"." u: (remove/part t u)] :t] to end] >> s: "x.........x....." ; x, 9 dots, x, 5 dots >> parse/all s method3
== true
>> s
== "x..x.." ; it works now ! This modified rule also works well with Brett's test-string2. Cheers, -DocKimbel Ladislav Mecir wrote:

 [22/47] from: brett:codeconscious at: 9-Nov-2001 3:14


Doc, I applied your medicine to my functions too and they've been healed. Thanks! Your little workaround worked a treat. Here's my final contribution then for small variations on squishing down repeated newlines: BDH3: function [s][p a b][ p: s while [parse/all p [ to "..." 2 skip a: opt [some #"." b: (remove/part a b)] to end]][p: a] RETURN s ] BDH4: function [s][p a b][ parse/all s [ any [to "..." 2 skip a: opt [some #"." b: (remove/part a b)] :a] ] RETURN s ] Based on my rough testing (in a box that has not been rebooted in a while), they are a little bit faster than my previous efforts. Brett.

 [23/47] from: brett:codeconscious at: 9-Nov-2001 3:14

Re: Parse failure (was newlines)


Interesting how the Doc's medicine fixed this one up too:
>> parse s: "0123456789" [8 skip p: [(print p remove/part s 3)] :s (print
OK ) to end] 89 == false
>> parse s: "0123456789" [8 skip p: opt [(print p remove/part s 3)] :s
(print "OK") to end] 89 OK == true Time for sleep. Brett.

 [24/47] from: lmecir:mbox:vol:cz at: 8-Nov-2001 17:08

Re: newlines


Hi Joel, perfect. Could you try this one? method5: [ any [thru "..." t: any #"." u: [(remove/part t: back t u) :t | :t]] to end ] parse/all s method5 Joel: <<Hi, all, Incorporating two from Brett (PRS1 and PRS2) and one from Ladislav (PANY) and pruning out the also-rans from before, here's a new set of benchmarks. I should have pointed out that I owe Anton thanks for the idea that led to PASS; it was based on FIND, but stayed within the same string to avoid the quadratic search time. My rendering of the new versions (so that Brett and Ladislav can catch any mistakes I made in copying their ideas) are: PANY: while [s: find/tail s "..."] [ parse s [any #"." t:] s: remove/part back s t ] PRS1: parse/all s [ any [ ".." mark1: any #"." mark2: (remove/part mark1 mark2) :mark1 | skip ] ] PRS2: nondot: complement charset {.} parse/all s [ any [ some nondot | ".." mark1: any #"." mark2: (remove/part mark1 mark2) :mark1 | skip ] ] The format of these results is similar to the prior benchmarks, so I won't repeat the format explanations. Here we go! 10 10000 xxxxxxxxx. pass 0.0069106 17 pany 0.0076094 18.7 binr 0.0132569 32.6 prs2 0.0191155 47.1 prs1 0.0406086 100 10 10000 xxxxx..... prs1 0.0871603 74.6 prs2 0.0914357 78.3 pany 0.0934973 80 pass 0.1151389 98.6 binr 0.1168064 100 10 10000 x......... prs1 0.0696221 11.8 prs2 0.0700051 11.9 pany 0.1082873 18.3 binr 0.1837931 31.1 pass 0.5906865 100 10 20000 xxxxxxxxx. pany 0.0100858 13.3 pass 0.0108703 14.3 binr 0.0318814 41.9 prs2 0.035362 46.5 prs1 0.0760936 100 10 20000 xxxxx..... prs2 0.2665442 72.3 pany 0.2693659 73.1 prs1 0.2765645 75 binr 0.3451197 93.6 pass 0.3687329 100 10 20000 x......... prs1 0.2450383 11.4 prs2 0.2724432 12.7 pany 0.3112488 14.5 binr 0.4685498 21.9 pass 2.1415553 100 10 30000 xxxxxxxxx. pass 0.0140541 12.6 pany 0.0152802 13.7 binr 0.0286012 25.6 prs2 0.0626263 56 prs1 0.1117849 100 10 30000 xxxxx..... prs1 0.4763552 54.6 pany 0.5014254 57.5 prs2 0.544646 62.4 binr 0.635237 72.8 pass 0.8727375 100 10 30000 x......... prs1 0.4685633 8.7 prs2 0.4988843 9.3 pany 0.6283518 11.7 binr 0.8068603 15 pass 5.3884561 100 -jn- -- Then anyone who leaves behind him a written manual, and likewise anyone who receives it, in the belief that such writing will be clear and certain, must be exceedingly simple-minded... -- Plato FIX^PUNCTUATION^joel^dot^neely^at^fedex^dot^com

 [25/47] from: rotenca:telvia:it at: 8-Nov-2001 17:46

Re: Parse failure (was newlines)


In the past, I've often had problems (that I did not undertand) when the paren is the middle of the rule and not at the end. I do not know if it is the same problem you have. Always the fix was to move the paren at the end of the rule. --- Ciao Romano

 [26/47] from: lmecir:mbox:vol:cz at: 8-Nov-2001 23:01

Re: newlines


Hi, yet another algorithm: method6: [ opt [to "..." 2 skip t: sub] to end ] sub: [ any #"." copy u [ to "..." 2 skip (w: [(t: change t u) sub]) | end (w: [(clear t)]) | to end (w: [(clear change t u)]) ] w ] parse/all s method6

 [27/47] from: al:bri:xtra at: 9-Nov-2001 14:56


> There are several options for squishing runs of newlines down to
I quite like: trim/lines MyStringOfText Andrew Martin ICQ: 26227169 http://valley.150m.com/

 [28/47] from: joel:neely:fedex at: 8-Nov-2001 15:33


Hi, Andrew, Andrew Martin wrote:
> > There are several options for squishing runs of newlines down to > > I quite like: > trim/lines MyStringOfText >
Well, that's a nice doohickey, but it doesn't do what the original post asked for, which was to reduce all runs of more than two consecutive newlines down to just two newlines (i.e., preserve paragraph-like chunking with a minimum of wasted vertical space). -jn- -- Your mouse has moved. Windows NT must be restarted for this change to take effect. -- Anonymous joel)dot)neely)at)fedex)FIX)PUNCTUATION)dot)com

 [29/47] from: hijim:pronet at: 8-Nov-2001 19:40


Newlines Hello Anton and Joel, Thanks for the help. The code below works perfectly on the pages I've tested, though it takes a second or two on long pages. It strips all the html and newlines and extraneous whitepsace. Joel was right about my original problem. I had a space between two newlines, so now I'm removing those too. I hope this gets through. The last two emails I've sent came back as undeliverable. I'm sending this one via Rebol instead of Netscape. Jim btn silver / 1.4 "Strip html" [ marked-up: copy my-area/text parse my-area/text [any [to "<style" begin: thru "</style>" ending: (remove/part begin ending) :begin] ] parse my-area/text [any [to "<script" begin: thru "</script>" ending: (remove/part begin ending) :begin] ] foreach [search-string replace-string][ "<li>" "* " "</" "<" "<p>" "^/" "<h1>" "^/^/" "<h2>" "^/^/" "<h3>" "^/^/" "<h4>" "^/^/" "^-" " " "<hr>" "^/----------------------------------^/" "&nbsp;" " " "&copy;" "(C) " "&quot;" {"} "&amp;" "&"][ replace/all my-area/text search-string replace-string ] parse my-area/text [any [to "<" begin: thru ">" ending: (remove/part begin ending) :begin] ] foreach [search-string][" " "^-" " ^/" "^/^/^/"] [ while [p: find my-area/text search-string] [remove p] ] replace/all my-area/text ">" ">" replace/all my-area/text "<" "<" trim my-area/text show my-area get-lines ] btn silver / 1.4 "Restore" [ if marked-up = "" [return ] my-area/text: marked-up show my-area get-lines ]

 [30/47] from: nitsch-lists:netcologne at: 9-Nov-2001 10:30


RE: [REBOL] Re: newlines Hi Joel, another copy-style to benchmark :-) out: clear "" parse/all s [any [ start: thru "..." end1: any "." (insert/part tail out start back end1) ]] insert tail out start out ; or [head insert clear s out] or [copy out] here is also an in-place solution, but that is 20 times slower?! out: s parse s [any [ start: thru "..." end1: any "." ( change/part out start back end1 out: skip out subtract index? back end1 index? start ) ]] change out start clear skip out length? start s -Volker [lmecir--mbox--vol--cz] wrote:

 [31/47] from: lmecir:mbox:vol:cz at: 10-Nov-2001 2:33


Hi Volker, << another copy-style to benchmark :-) out: clear "" parse/all s [any [ start: thru "..." end1: any "." (insert/part tail out start back end1) ]] insert tail out start out ; or [head insert clear s out] or [copy out] here is also an in-place solution, but that is 20 times slower?! out: s parse s [any [ start: thru "..." end1: any "." ( change/part out start back end1 out: skip out subtract index? back end1 index? start ) ]] change out start clear skip out length? start s -Volker
>>
The way how CHANGE/PART works really puzzles me.

 [32/47] from: lmecir:mbox:vol:cz at: 13-Nov-2001 10:36


Hi all, I was curious, how the speed of PARSE-based algorithms compares with the speed of some loop-based algorithms. I used my TIME-BLOCK function (can be found at http://www.sweb.cz/LMecir/) to compute the results, my system was an AMD 600 based 128 MB RAM Windows 98. General results: 1) PARSE based versions of algorithms are faster than their loop-based counterparts 2) Copy based algorithms are the fastest for high workload 3) CHANGE/PART work looks very strange, slow and unnatural. I would suggest some revisions from RT. (any comments?) 4) Recursive algorithms don't work for texts of this size 5) My results slightly differ from Joel's algorithms: [ PANY: [ while [s: find/tail s "..."] [ parse s [any #"." t:] s: remove/part back s t ] ] PRS1: [ parse/all s [ any [ ".." mark1: any #"." mark2: (remove/part mark1 mark2) :mark1 | skip ] ] ] PRS2: [ nondot: complement charset {.} parse/all s [ any [ some nondot | ".." mark1: any #"." mark2: (remove/part mark1 mark2) :mark1 | skip ] ] ] BINR: [ newlen: length? s until [ oldlen: newlen oldlen = newlen: length? replace/all s ".................." ".." ] replace/all s ".........." ".." replace/all s "......" ".." replace/all s "...." ".." replace/all s "..." ".." ] PASS: [ while [s: find s "..."][remove s] ] comment {a PARSE based version of the PASS algorithm} PARPASS: [ m: [any [to "..." s: (remove s)] to end] parse/all s m ] comment {a copy-like algorithm} PARCHAN: [ method6: [ to "..." 2 skip t: sub | to end ] sub: [ any [ any #"." copy u [ to "..." 2 skip (w: [(t: change t u)]) | end (w: [(clear t) end skip]) | to end (w: [(clear change t u) end skip]) ] w ] ] parse/all s method6 ] comment {a PARSE-based version of PANY, a cross of my and Brett's algorithms} PARANY: [ method5: [ any [to "..." 2 skip t: any #"." u: :t (remove/part t u)] to end ] parse/all s method5 ] comment {Volker's COPY-based algorithm} CS: [ out: clear "" parse/all s [any [ start: thru "..." end1: any "." (insert/part tail out start back end1) ]] insert tail out start out ] comment {Volker's CHANGE/PART-based algorithm} IPC: [ out: s parse s [any [ start: thru "..." end1: any "." ( change/part out start back end1 out: skip out subtract index? back end1 index? start ) ]] change out start clear skip out length? start out: s ] ] The results: The last column contains the speed of the algorithm in per cent relative to the fastest algorithm 10000 xxxxxxxxx. CS 3E-4 100 PARANY 3.06E-4 98.01 PARPASS 3.1E-4 96.85 PASS 3.2E-4 93.71 PANY 3.41E-4 88.17 PARCHAN 6.15E-4 48.86 BINR 1.373E-3 21.88 IPC 2.338E-3 12.84 PRS2 2.419E-3 12.41 PRS1 5.915E-3 5.08 10000 xxxxx..... CS 3.4E-3 100 PARCHAN 3.71E-3 91.64 PARANY 4.662E-3 72.92 PANY 6.727E-3 50.54 PARPASS 7.821E-3 43.47 PASS 9.11E-3 37.32 PRS2 9.325E-3 36.46 PRS1 9.491E-3 35.82 BINR 1.0829E-2 31.39 IPC 0.102694 3.31 10000 x......... PARCHAN 4.414E-3 100 CS 4.497E-3 98.15 PARANY 5.786E-3 76.29 PRS1 6.563E-3 67.26 PRS2 6.851E-3 64.43 PANY 8.403E-3 52.53 BINR 1.8491E-2 23.87 PARPASS 4.2417E-2 10.41 PASS 4.9487E-2 8.92 IPC 0.131909 3.35 20000 xxxxxxxxx. CS 6.3E-4 100 PARPASS 7.11E-4 88.67 PARANY 7.11E-4 88.67 PASS 7.62E-4 82.71 PANY 8.04E-4 78.44 PARCHAN 1.287E-3 48.98 BINR 2.818E-3 22.37 PRS2 5.123E-3 12.31 IPC 6.915E-3 9.12 PRS1 1.212E-2 5.2 20000 xxxxx..... CS 6.918E-3 100 PARCHAN 7.562E-3 91.48 PARANY 1.4603E-2 47.37 PANY 1.8998E-2 36.41 PRS2 2.4037E-2 28.78 PRS1 2.4681E-2 28.03 PARPASS 2.6615E-2 25.99 BINR 2.7357E-2 25.29 PASS 2.931E-2 23.6 IPC 0.378842 1.83 20000 x......... PARCHAN 8.588E-3 100 CS 8.862E-3 96.91 PARANY 1.6967E-2 50.62 PRS1 1.8334E-2 46.84 PRS2 1.8881E-2 45.49 PANY 2.2397E-2 38.35 BINR 4.1928E-2 20.48 PARPASS 0.144467 5.94 PASS 0.160795 5.34 IPC 0.459467 1.87 30000 xxxxxxxxx. CS 1.04E-3 100 PARANY 1.179E-3 88.2 PARPASS 1.208E-3 86.06 PASS 1.248E-3 83.37 PANY 1.299E-3 80.08 PARCHAN 1.985E-3 52.4 BINR 4.307E-3 24.15 PRS2 7.568E-3 13.74 IPC 1.3799E-2 7.54 PRS1 1.8096E-2 5.75 30000 xxxxx..... CS 1.0313E-2 100 PARCHAN 1.1387E-2 90.57 PARANY 2.8916E-2 35.66 PANY 3.5654E-2 28.92 PRS2 4.3857E-2 23.51 PRS1 4.4287E-2 23.29 BINR 4.9365E-2 20.89 PARPASS 5.5381E-2 18.62 PASS 5.9287E-2 17.39 IPC 0.809209 1.27 30000 x......... PARCHAN 1.2888E-2 100 CS 1.322E-2 97.49 PARANY 3.3942E-2 37.97 PRS1 3.6325E-2 35.48 PRS2 3.7419E-2 34.44 PANY 4.1911E-2 30.75 BINR 7.0934E-2 18.17 PARPASS 0.304841 4.23 PASS 0.330466 3.9 IPC 0.974216 1.32

 [33/47] from: joel:neely:fedex at: 13-Nov-2001 1:37


Hello, all (again), Here's my latest round of benchmarks with all submissions (I think) but Ladislav's METHOD6 (called PRS6 below). I was getting errors from PRS6, but may have made a transcription error. Source for all new versions (PRS3 thru PRS6, and INS1) is included for anyone to check my typing... remprs3: func [s [string!] /local p a b] [ p: s while [ parse/all p [ to "..." 2 skip a: opt [some #"." b: (remove/part a b)] to end ] ][ p: a ] ] remprs4: func [s [string!] /local p a b] [ parse/all s [ any [ to "..." 2 skip a: opt [some #"." b: (remove/part a b)] :a ] ] ] remprs5: func [s [string!] /local t u] [ parse/all s [ any [ thru "..." t: any #"." u: [ (remove/part t: back t u) :t | :t ] ] to end ] ] remprs6: func [s [string!] /local t u w sub] [ sub: [ any #"." copy u [ to "..." 2 skip (w: [(t: change t u) sub]) | end (w: [(clear t)]) | to end (w: [(clear change t u)]) ] w ] parse/all s [opt [to "..." 2 skip t: sub] to end] ] remins1: func [s [string!] /local out start end1] [ out: clear "" parse/all s [ any [ start: thru "..." end1: any "." ( insert/part tail out start back end1 ) ] ] insert tail out start out ] Benchmark results from the same test harness as used before are as follows: 10 10000 xxxxxxxxx. ins1 0.0055324 15.7 prs4 0.0056476 16.1 prs5 0.0056954 16.2 pass 0.0057106 16.2 prs3 0.0058941 16.8 pany 0.0060686 17.3 binr 0.0109558 31.2 prs2 0.0170093 48.4 prs1 0.0351644 100 10 10000 xxxxx..... ins1 0.0263463 26.2 prs4 0.0521243 51.9 prs5 0.057153 56.9 prs3 0.0660708 65.8 pany 0.0735148 73.2 prs1 0.0783158 78 prs2 0.0814463 81.1 pass 0.0882192 87.8 binr 0.100443 100 10 10000 x......... ins1 0.0317953 6.1 prs4 0.0582377 11.2 prs5 0.0650405 12.5 prs1 0.0658213 12.6 prs2 0.0665257 12.7 prs3 0.0787273 15.1 pany 0.0864575 16.6 binr 0.1375208 26.3 pass 0.5222772 100 10 20000 xxxxxxxxx. ins1 0.0071699 10.6 prs4 0.0086906 12.8 prs5 0.0088449 13 prs3 0.0090889 13.4 pany 0.0095072 14 pass 0.0169069 24.9 binr 0.0187407 27.6 prs2 0.0309285 45.6 prs1 0.0678235 100 10 20000 xxxxx..... ins1 0.0490884 12.6 prs5 0.2070957 53.2 prs4 0.2101422 54 prs3 0.2295755 59 prs1 0.2382468 61.2 prs2 0.2507145 64.4 pany 0.2563284 65.8 binr 0.2993995 76.9 pass 0.3894038 100 10 20000 x......... ins1 0.0597987 2.6 prs5 0.2347959 10.3 prs4 0.2426656 10.6 prs2 0.2470482 10.8 prs1 0.2519563 11 prs3 0.2538821 11.1 pany 0.2876505 12.6 binr 0.3857012 16.9 pass 2.2865924 100 10 30000 xxxxxxxxx. ins1 0.0088604 8.7 prs3 0.013247 13.1 prs5 0.0132796 13.1 prs4 0.0133496 13.2 pass 0.0135603 13.4 pany 0.0139156 13.7 binr 0.0284606 28.1 prs2 0.0465389 45.9 prs1 0.1014025 100 10 30000 xxxxx..... ins1 0.0728014 8.7 prs5 0.4326869 51.9 prs4 0.4359396 52.3 pany 0.4537192 54.5 prs3 0.4660354 55.9 prs1 0.5437336 65.3 prs2 0.5491291 65.9 binr 0.638432 76.6 pass 0.8332273 100 10 30000 x......... ins1 0.0882684 1.8 prs4 0.4752613 9.8 prs2 0.5063892 10.4 prs5 0.5120834 10.6 prs3 0.5325967 11 prs1 0.5439077 11.2 pany 0.5661401 11.7 binr 0.7072614 14.6 pass 4.8524088 100 -jn- -- If it works, it's obsolete. -- Eric Marten joel:dot:neely:at:fedex:dot:FIX:PUNCTUATION:com

 [34/47] from: rotenca:telvia:it at: 13-Nov-2001 18:30


Hi, Joel Perhaps best result would come from the combination of BINR with Parse and/or Copy (INS). Don't you think? What is your system? --- Ciao Romano

 [35/47] from: lmecir:mbox:vol:cz at: 13-Nov-2001 18:18


Hi Joel, Volker & all, I would like to comment the speed of the INS1 algorithm: << out: clear "" parse/all s [ any [ start: thru "..." end1: any "." ( insert/part tail out start back end1 ) ] ] insert tail out start out
>>
The way the algorithm is designed gives it an unfair advantage when it is timed in a loop where it is executed more than once. To receive a fair value for one pass, we would have to modify the first line as follows: out: copy "" Then the algorithm would be slower than its preallocated variant, which is: out: make string! length? s Even the preallocated variant is slower than the "artificial" variant used if the speed is measured executing the code in a loop as described. That is why I think, that the results for this algorithm are "less valid" than the results of other algorithms in this arrangement. Cheers Ladislav

 [36/47] from: nitsch-lists:netcologne at: 13-Nov-2001 20:31


RE: [REBOL] Re: newlines Hi Ladislav, yes, i know, i forgot some lines.. a bit dicussion: - i was surprised to have the fastest algorithm. i tried to be simple, and to reduce the lots of large movements of remove-solutions. - i think the handling is in response of the user. 'copy yourself if needed is rebol-style.. because often there is something parsed and completely handled, before the function is called again. used that way there is no need to copy the result. if there is need, [copy INS1 text] is short. OTOH similar one could claim the remove-solutions destroy the string, so they should copy first? ;-) - instead of post-allocation like [copy out], i could do [insert clear s out] without allocations. which may trigger some gc when lots of benchmark-loops. could you benchmark this two solutions too please? -Volker - [lmecir--mbox--vol--cz] wrote:

 [37/47] from: nitsch-lists:netcologne at: 13-Nov-2001 20:29


RE: [REBOL] Re: newlines yes, i know. i forgot some lines.. a bit dicussion: - i was surprised to have the fastest algorithm. i tried to be simple, and to reduce the lots of large movements of remove-solutions. - i think the handling is in response of the user. 'copy yourself if needed is rebol-style.. because often there is something parsed and completely handled, before the function is called again. used that way there is no need to copy the result. if there is need, [copy INS1 text] is short. OTOH similar one could claim the remove-solutions destroy the string, so they should copy first? ;-) - instead of post-allocation like [copy out], i could do [insert clear s out] without allocations. which may trigger some gc when lots of benchmark-loops. could you benchmark this two solutions too please? -Volker - [lmecir--mbox--vol--cz] wrote:

 [38/47] from: lmecir:mbox:vol:cz at: 13-Nov-2001 21:33


Hi Volker, Volker: << a bit dicussion: - i was surprised to have the fastest algorithm.
>>
My comment was saying, that your algorithm wasn't the fastest one. Your algorithm was only measured as being the fastest, because of the nature of the measurement. (Your algorithm is faster when executed in a loop than it is when executed once.) Volker: << i tried to be simple, and to reduce the lots of large movements of remove-solutions.
>>
Yes, that is the correct approach Volker: << - i think the handling is in response of the user. 'copy yourself if needed is rebol-style.. because often there is something parsed and completely handled, before the function is called again. used that way there is no need to copy the result. if there is need, [copy INS1 text] is short.
>>
I wasn't saying there was any need for copying the data. What I tried to tell was, that the time we received from the measurement was incorrect, because the measurement used a false assumption, that the speed of the algorithm didn't change. Indeed, the speed measurement using that assumption must be inaccurate, because for your algorithm that assumption doesn't hold. Volker: << OTOH similar one could claim the remove-solutions destroy the string, so they should copy first? ;-)
>> >From the measurement POV the remove-solutions don't change their speed, that
is why the measured time is accurate. Volker: << - instead of post-allocation like [copy out], i could do [insert clear s out] without allocations. which may trigger some gc when lots of benchmark-loops. could you benchmark this two solutions too please? -Volker
>>
Could you be more specific, please? Cheers Ladislav

 [39/47] from: joel:neely:fedex at: 13-Nov-2001 18:29


Hi, Ladislav, Ladislav Mecir wrote:
> I would like to comment the speed of the INS1 algorithm: > <<
<<quoted lines omitted: 11>>
> The way the algorithm is designed gives it an unfair advantage > when it is timed in a loop where it is executed more than once...
Good observation for follow-up; thanks! I'll be glad to make a version that uses COPY "" and re-run the benchmarks with both. -jn- -- This sentence contradicts itself -- no actually it doesn't. -- Doug Hofstadter joel<dot>neely<at>fedex<dot>com

 [40/47] from: nitsch-lists:netcologne at: 14-Nov-2001 9:14


RE: [REBOL] Re: newlines Hi Joel, Ladislav, keep in mind not to trigger unfair gc's because of a non-real-world amount of loops and therefore garbage. it looks the most time competing algorithm at 10000 slows down heavily with 30000 without good reason. it uses parse too, but with copy instead of markers, so it produces more garbage. could go over a limit with 30000 so it starts triggering gc's. so benchmarking with different loop-counts (=different amount of garbage) is a good idea. also use a reasonable amout of pre-alocation, like out: make string! 50000 without producing garbage.. wrong pre-allocation slows down heavy, AFAIK a bug.. i doubt you will see a slowdown by a single allocation, even if large. btw you know the trick of using[out: clear [] ] for static buffers and using [return copy out]? Gabriele noted it while ago, pretty for self-adjusting programms. after you have done this, switch to garbage-free [insert clear s out] which copies the result back to the input string, instead of allocating something. this should run at native speed. IIRC there was no notable slowdown, but i benchmarked a bit lazy. as i said, i was not aware of having the fastest looking algorithm.. -Volker [joel--neely--fedex--com] wrote:

 [41/47] from: nitsch-lists:netcologne at: 14-Nov-2001 9:53


RE: [REBOL] Re: newlines regarding my last mail: ups, misreaded benchmark-list. kill the gc-notes: << keep in mind not to trigger unfair gc's because of a non-real-world amount of loops and therefore garbage. it looks the most time competing algorithm at 10000 slows down heavily with 30000 without good reason. it uses parse too, but with copy instead of markers, so it produces more garbage. could go over a limit with 30000 so it starts triggering gc's. so benchmarking with different loop-counts (=different amount of garbage) is a good idea.
>>
as fast as CS is PARCHAN with every amount, with xxxxx..... and x......... . with xxxxxxxxx. its half as fast. it uses [copy u], which fills heap too. so gc is no problem. and allocation should not slow down. hmm, could both be combined? until change/part speeds up :-) for real-world problem, there is one replacement each paragraph, so there are a lot more x compared to ".". this should be favourable for remove-versions, because they have a lot fewer to do. so a benchmark with lots of "x" too? this where PARCHAN and CS : comment {a copy-like algorithm} PARCHAN: [ method6: [ to "..." 2 skip t: sub | to end ] sub: [ any [ any #"." copy u [ to "..." 2 skip (w: [(t: change t u)]) | end (w: [(clear t) end skip]) | to end (w: [(clear change t u) end skip]) ] w ] ] parse/all s method6 ] comment {Volker's COPY-based algorithm} CS: [ out: clear "" parse/all s [any [ start: thru "..." end1: any "." (insert/part tail out start back end1) ]] insert tail out start out ] -Volker [joel--neely--fedex--com] wrote:

 [42/47] from: lmecir:mbox:vol:cz at: 15-Nov-2001 1:57


Hi all, according to my measurements the CP function defined as follows: use [b] [ b: make string! 1'000 cp: func [ string [any-string!] value [any-string!] range [any-string! integer!] ] [ change string head insert/part clear b value range ] ] Where: sample: insert/dup "" #"x" 30'000 Used as: cp sample skip sample 1'000 skip sample 2'000 is almost 100 times faster than: change/part sample skip sample 1'000 skip sample 2'000 Any comments? Cheers Ladislav

 [43/47] from: brett:codeconscious at: 15-Nov-2001 14:10


The CP function code being far faster than the CHANGE native function is interesting. Was having B declared outside of CP crucial or was it just to avoid a potential unfair function overhead on CP? Both seem very quick. How did you measure the difference in speed? As you increase the size of the sample string or change the reference points to change/copy, is there changes in the results? What of blocks instead of strings? Brett.

 [44/47] from: lmecir:mbox:vol:cz at: 15-Nov-2001 8:13


Hi Brett, I declared B so, that no garbage collection occurred in this case, (see the 1'000 in MAKE STRING!) or in any case where not more than 1'000 elements are changed. That is why I declared it outside of CP. To measure the time I used my TIME-BLOCK function (can be found at my site http://www.sweb.cz/LMecir under benchmarks). It seems that (strangely) the speed of CHANGE/PART depends on the total size of SAMPLE only. My guess is, that similar results can be obtained using blocks too. Cheers Ladislav << The CP function code being far faster than the CHANGE native function is interesting. Was having B declared outside of CP crucial or was it just to avoid a potential unfair function overhead on CP? Both seem very quick. How did you measure the difference in speed? As you increase the size of the sample string or change the reference points to change/copy, is there changes in the results? What of blocks instead of strings? Brett.

 [45/47] from: hijim:pronet at: 6-Nov-2001 20:20


Is there any reason this should not work to be sure there are no more than two newlines in a row in the text? It works fine except for the top of the page. It won't even work if I change it to 'loop 200. It leaves one place with 4 newlines in a row. Maybe there's another character that forces a line feed? loop 20 [replace/all my-area/text "^/^/^/" "^/^/"] Jim ------------------ the whole strip code below foreach [search-string replace-string][ "</" "<" "<p>" "^/" "<h1>" "^/^/" "<h2>" "^/^/" "<h3>" "^/^/" "<h4>" "^/^/" "<li>" "* " "^-" " " "<hr>" ^/----------------------------------^/ "&nbsp;" " " "&copy;" "(C) " "&quot;" {"} "&amp;" "&" ][ replace/all my-area/text search-string replace-string ] parse my-area/text [any [to "<" begin: thru ">" ending: (remove/part begin ending) :begin]] loop 5 [replace/all my-area/text " " " "] loop 20 [replace/all my-area/text "^/^/^/" "^/^/"] replace/all my-area/text ">" ">" replace/all my-area/text "<" "<"

 [46/47] from: arolls:idatam:au at: 7-Nov-2001 16:12


I reckon you should check for any whitespace characters after your first two lines. Actually, that's not the problem. Try this instead of that useless knee-jerk reaction. s: "^/^/^/^/hello there^/^/^/^/^/" while [p: find s "^/^/^/"][remove p] It removes character by character, if it keeps finding three in a row, instead of replacing three with two and placing the index after the two. -- Explanation -- Pretend N is a newline in the original string, and n is a new newline inserted by our function. Here's the string at the beginning, with the index at the first position: s: "NNNN" ^ Now, after we do [replace s "NNN" "nn"]: s == "nnN" ^ The index is after the inserted replace string! That means a subsequent REPLACE will only see ONE newline character (the final "N"). But we really need to go back to the beginning and see if the two ok-sized bits have joined forces into a over-sized monster. Anyway, the while loop above seems to solve it. Anton.

 [47/47] from: joel:neely:fedex at: 7-Nov-2001 0:29


Jim Clatfelter wrote:
> Is there any reason this should not work to be sure there are > no more than two newlines in a row in the text? It works fine > except for the top of the page. It won't even work if I change > it to 'loop 200. It leaves one place with 4 newlines in a row. > Maybe there's another character that forces a line feed? > > loop 20 [replace/all my-area/text "^/^/^/" "^/^/"] >
I just did a quick test that seems to confirm that it should work. Here are some suggestions: 1) Verify that there isn't some other nonprinting character (e.g., a tab, carriage-return, etc.) within a run of newlines. For example: replace/all my-area/text "^/ ^/" "^/^/" replace/all my-area/text "^/^-^/" "^/^/" ;... 2) Use TRIM and TRIM/ALL to remove leading and trailing whitespace. 3) Speed up the process by eliminating long runs first. replace/all my-area/text "^/^/^/^/^/^/^/^/^/^/" "^/^/" replace/all my-area/text "^/^/^/^/^/^/" "^/^/" replace/all my-area/text "^/^/^/^/" "^/^/" replace/all my-area/text "^/^/^/" "^/^/" 4) Only repeat the REPLACE/ALL enough to get the job done. The code below illustrates (using dots instead of newlines to make the action visible). First we construct a test string with lots of runs of dots:
>> s: ""
== ""
>> loop 200 [append s either 1 = random 20 ["x"] ["."]]
== {.x.x...........x.x.................x....xx.x.......... .......................................................... ............x..... then we take out runs of more than two dots:
>> newlen: oldlen: length? s
== 200
>> until [
[ oldlen: newlen [ oldlen = newlen: length? replace/all s "..." ".." [ ] == true
>> s
== ".x.x..x.x..x..xx.x..x..x..x.x..x.." by repeatedly applying REPLACE/ALL until the result has the same length as the previous length. Hope this helps! -jn- -- 'Tis an ill wind that blows no minds. -- Anonymous joel[dot[neely[at[fedex[dot[FIX[PUNCTUATION[com

Notes
  • Quoted lines have been omitted from some messages.
    View the message alone to see the lines that have been omitted