Trimming values from a series?

[1/21] from: edanaii:cox at: 23-Jul-2002 17:28

How do I do it? I know I can use the sort command to put them in order, how can I trim out the duplicates? Let's say I have a series, that I've sorted as follows: [ "a" "b" "b" c "d" "e" "e" ]. Is there a command I can us to reduce it to: [ "a" "b" "c" "d" "e" ]? -- Sincerely, | Don't part with your illusions. When they are gone Ed Dana | you may still exist, but you have ceased to live. Software Developer | -- Mark Twain 1Ghz Athlon Amiga |

[2/21] from: louisaturk:coxinet at: 23-Jul-2002 20:11

Hi Ed, Type help unique at the console for the answer. Louis At 05:28 PM 7/23/2002 -0700, you wrote:

[3/21] from: brett:codeconscious at: 24-Jul-2002 11:26

Louis has already pointed out Unique If you need your series sorted, it is probably better to sort after the Unique. This way you don't need to hope that Unique keeps the sorting intact. sort unique [ "a" "b" "b" "c" "d" "e" "e" ] Regards, Brett.

[4/21] from: louisaturk:coxinet at: 23-Jul-2002 20:16

Hi again Ed, Here is an example of use:

>> x: [ "a" "b" "b" "c" "d" "e" "e" ]

== ["a" "b" "b" "c" "d" "e" "e"]

>> unique sort x

== ["a" "b" "c" "d" "e"]

Louis At 05:28 PM 7/23/2002 -0700, you wrote:

[5/21] from: atruter:hih:au at: 24-Jul-2002 11:45

>If you need your series sorted, it is probably better to sort after the >Unique. This way you don't need to hope that Unique keeps the sorting >intact. > > sort unique [ "a" "b" "b" "c" "d" "e" "e" ] > >Regards, >Brett.

Also, at least conceptually, less for sort to do so more efficient . . . Regards, Ashley

[6/21] from: edanaii:cox at: 23-Jul-2002 18:26

Ed Dana wrote:

> How do I do it? > > I know I can use the sort command to put them in order, how can I trim > out the duplicates? > > Let's say I have a series, that I've sorted as follows: [ "a" "b" "b" > "c" "d" "e" "e" ]. > > Is there a command I can us to reduce it to: [ "a" "b" "c" "d" "e" ]? >

OK, Nevermind. I found it. Looks like the UNIQUE command will do the job. So many functions, so little time... :) -- Sincerely, | The problems of two little people don't amount to Ed Dana | a hill of beans in this crazy mixed-up world! But Software Developer | this is OUR hill, and these are OUR beans! 1Ghz Athlon Amiga | -- Naked Gun via Casablanca.

[7/21] from: louisaturk:coxinet at: 23-Jul-2002 22:29

Brett and Ashley, Thanks. I have learned something new. For some reason I thought that the order had to be: unique sort x but a little experimentation proves you right. My misconception probably stems from the fact that I once wrote a c function similar to unique that wouldn't work unless the input was sorted first. By the way, everybody, it is good to be back on my favorite list after a fire (actually burned our cable in two) and other problems cut me off for about 2 months. I have received more help on this list than on any other list I have been on. The regulars on this list are not only brilliant, but also patient and quick to help those in need (often me :>)). Thanks to you all! Louis At 11:45 AM 7/24/2002 +1000, you wrote:

[8/21] from: carl:cybercraft at: 24-Jul-2002 17:39

On 24-Jul-02, Brett Handley wrote:

> Louis has already pointed out Unique > If you need your series sorted, it is probably better to sort after > the Unique. This way you don't need to hope that Unique keeps the > sorting intact. > sort unique [ "a" "b" "b" "c" "d" "e" "e" ]

Hmmm. Are there cases where unique can change the order, as well as stripping out duplicates? -- Carl Read

[9/21] from: brett:codeconscious at: 24-Jul-2002 17:47

Hi Carl,

> > If you need your series sorted, it is probably better to sort after > > the Unique. This way you don't need to hope that Unique keeps the > > sorting intact. > > > sort unique [ "a" "b" "b" "c" "d" "e" "e" ] > > Hmmm. Are there cases where unique can change the order, as well as > stripping out duplicates?

I don't really know, I haven't tested for it. But even if it is stable in this version it could perhaps change in later versions and because the help for Unique does not suggest otherwise, I figure a bit of defensive programming is warranted. My habit is to think: "if I need it like that - then I should ensure it is like that". Regards, Brett.

[10/21] from: rotenca:telvia:it at: 24-Jul-2002 11:33

Hi Brett, I hope that these functions (unique intersect exclude) do not mess the list. Sometime, the order has value, but it is not the result of sort or of any other function, so can't be recreated. --- Ciao Romano

[11/21] from: anton:lexicon at: 25-Jul-2002 0:09

I think it should be faster this way, too, although it may not matter. sort will have to rearrange less items after unique has removed some. I don't think unique would work any faster on data that's already sorted, though I could be wrong. Not a big issue for me at the moment.. :) Anton.

[12/21] from: anton:lexicon at: 25-Jul-2002 1:09

I can't help you with fire, I am sorry. Too far away. There might be a fireman around your place somewhere in Georgia, though. Anton.

[13/21] from: edanaii:cox at: 24-Jul-2002 17:32

Carl Read wrote:

>On 24-Jul-02, Brett Handley wrote: >>Louis has already pointed out Unique

<<quoted lines omitted: 7>>

>Hmmm. Are there cases where unique can change the order, as well as >stripping out duplicates?

I would expect that UNIQUE first sorts the data in order to trim out the duplicates. It's the most efficient way to find and remove them. Which is why I went looking for it in SORT first. I doubt that issuing SORT and UNIQUE in any combination is faster because of the redundancy. I could be wrong, though... -- Sincerely, | For long you live and high you fly. Ed Dana | And smiles you'll give and tears you'll cry. Software Developer | And all you touch and all you see, 1Ghz Athlon Amiga | Is all your life will ever be. | -- Pink Floyd, Breathe.

[14/21] from: carl:cybercraft at: 25-Jul-2002 13:46

On 25-Jul-02, Ed Dana wrote:

>> Hmmm. Are there cases where unique can change the order, as well as >> stripping out duplicates?

<<quoted lines omitted: 4>>

> because of the redundancy. > I could be wrong, though...

I think you are. Unique is much faster than sort, at least on Amiga. Umm, well, at least on the block of 1s and 2s in a block I've just tried. (: This suggests unique doesn't use sort, or at least not the sort available to us. -- Carl Read

[15/21] from: edanaii:cox at: 24-Jul-2002 19:47

Carl Read wrote:

>On 25-Jul-02, Ed Dana wrote: >>>Hmmm. Are there cases where unique can change the order, as well as

<<quoted lines omitted: 13>>

>tried. (: This suggests unique doesn't use sort, or at least not the >sort available to us.

I probably am. :) Sorting this string, for example, produces the following results:

>> UNIQUE "Now is the time for all good men to come to the aid of their

country" == "Now isthemfralgdcuy" Which suggests that either UNIQUE uses some other algorithm, or it sorts it and then puts it back to its original form. The latter, would be very inefficient, of course, so I vote for the former. Maybe the algorithm is something similar to what's used for some compression schemes? Just a thought... -- Sincerely, | Ed Dana | Courage is fear holding on a minute longer. Software Developer | -- General George S. Patton 1Ghz Athlon Amiga |

[16/21] from: carl:cybercraft at: 25-Jul-2002 16:24

On 25-Jul-02, Ed Dana wrote:

> Sorting this string, for example, produces the following results: >>> UNIQUE "Now is the time for all good men to come to the aid of

<<quoted lines omitted: 5>>

> Maybe the algorithm is something similar to what's used for some > compression schemes?

Getting the unique values from a string would be simple. Just have an array of 256 flags and tick them as found based on the char values as you parse once through the string. Blocks of mixed datatypes wouldn't be quite so easy though... -- Carl Read

[17/21] from: reffy:ulrich at: 25-Jul-2002 0:21

unique v[sort v] -- heterogenous list chars Is split str unique chars[sort chars]

[18/21] from: ingo:2b1 at: 24-Jul-2002 18:42

Anton Rolls wrote:

> I think it should be faster this way, too, > although it may not matter.

Just for those interested ...

>> a: [ a d e be e sn js am xmed dms d d a s e d s a s e s sa s d f de

e s s dd fa s d d x cfas sd sd fa sd fas df asdf sd ] == [a d e be e sn js am xmed dms d d a s e d s a s e s sa s d f de e s s dd fa s d d x cfas sd sd fa sd fas df asdf sd]

>> profiler/test [ unique sort copy a ] 10000

== [0:00:04.271148]

>> profiler/test [ unique sort copy a ] 10000

== [0:00:04.794179]

>> profiler/test [ unique sort copy a ] 10000

== [0:00:04.29976]

>> profiler/test [ sort unique copy a ] 10000

== [0:00:03.604301]

>> profiler/test [ sort unique copy a ] 10000

== [0:00:03.57617]

>> profiler/test [ sort unique copy a ] 10000

== [0:00:03.555364]

>> profiler/test [ sort unique copy a ] 10000

== [0:00:03.681827] Of course this isn't really excessive testing, but it hints that you may be right. Kind regards, Ingo

[19/21] from: greggirwin:mindspring at: 25-Jul-2002 9:06

Try this out. The datatype affects the performance greatly. b: make block! 10000 h: make hash! 10000 l: make list! 10000 repeat i 10000 [append b random i] repeat i 10000 [append h random i] repeat i 10000 [append l random i] t: now/time/precise loop 20 [unique b] loop 20 [unique b] loop 20 [unique b] print now/time/precise - t t: now/time/precise loop 20 [unique h] loop 20 [unique h] loop 20 [unique h] print now/time/precise - t t: now/time/precise loop 20 [unique l] loop 20 [unique l] loop 20 [unique l] print now/time/precise - t --Gregg

[20/21] from: edanaii:cox at: 25-Jul-2002 17:32

Carl Read wrote:

>On 25-Jul-02, Ed Dana wrote: >>Maybe the algorithm is something similar to what's used for some

<<quoted lines omitted: 4>>

>you parse once through the string. Blocks of mixed datatypes >wouldn't be quite so easy though...

Of course, but somehow I think I'll need to sort more than 256 things, some day. :) -- Sincerely, | Life is pain, Highness. Anyone who says Ed Dana | differently is selling something. Software Developer | -- The Princess Bride. 1Ghz Athlon Amiga |

[21/21] from: joel:neely:fedex at: 29-Jul-2002 14:36

Hi, all, Just back from a week camping in the Great Smoky Mountains National Park, and my inbox prompts me to do a quick benchmark... Ed Dana wrote:

> Carl Read wrote: > >On 25-Jul-02, Ed Dana wrote:

<<quoted lines omitted: 10>>

> Of course, but somehow I think I'll need to sort more than 256 things, > some day. :)

For the test below, both BSU and BUS are initialized as copies of a series that contains 500 copies of the integers from 1 to 500 (making 250000 integer values in all). (Console transcript wrapped for email transmission...)

>> t0: now/time/precise rbus: unique sort bus

t1: now/time/precise rbsu: sort unique bsu t2: now/time/precise print [to-decimal t1 - t0 to-decimal t2 - t1] 4.347 0.871 I got consistent results over multiple trials, so UNIQUEing first and then SORTing appears faster for blocks of integers than the other order. Independent tests showed that applying only UNIQUE to such a block is faster than applying SORT, so I conclude that UNIQUE is not simply sort-and-sweep. -jn-

Notes

Quoted lines have been omitted from some messages.
View the message alone to see the lines that have been omitted