Mailing List Archive: 49091 messages
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

[REBOL] Re: Sort by first part of line

From: joel:neely:fedex at: 8-Sep-2002 7:29

Hi, Sunanda, Very nice summary! May I pick one small nit and reminisce for a moment? [SunandaDH--aol--com] wrote:
> ... thread has developed into two parallel tracks, with > a third on the way. > > TRACK 1 > Has several of us looking at increasingly faster ways to > pre-process the lines to get stable sort keys. >
I should have been more clear about the goal of my submission. I meant to imply (but I didn't state well) that we need to be careful about the force of habit, as it is VERY powerful. We all jumped to the conclusion, based on Louis's first question, that we needed to sort the data. We've even found a way to call the solution I proposed a kind of sort. Actually, I began thinking about the fact that most of the order that Louis requested was already present (i.e., the lines with identical prefixes were already in the desired order, relative to each other), which led me simply to group the data by the one field that had actionable differences, the prefix. Ergo, a solution that actually doesn't involve sorting at all. Now for the reminiscence. I didn't remember it (consiously... ;-) until writing this particular email, but there's a really lovely book titled _Programming_Pearls_ by Jon Bentley of AT&T Bell Labs (the copy in front of me is the first edition by Addison Wesley, (c) 1986, but I believe there's a newer version out). The book is a collection of eponymous columns Bentley wrote for CACM. The very first column, titled "Cracking the Oyster", opens as follows: The programmer's question was simple: "How do I sort on disk?" Before I tell you how I made my first mistake, let me give you a chance to do better than I did. What would you have said? DON'T SCROLL DOWN UNTIL YOU HAVE THOUGHT ABOUT YOUR ANSWER TO BENTLEY'S QUESTION , PLEASE ! He then continues: My mistake was to answer his question. I gave him a thumbnail sketch on how to sort on disk. My suggestion that he dig into Knuth's classic _Sorting_and_Searching_ met with less than enthusiasm -- he was more concerned about solving the problem than with furthering his education. I then told him about the disk sorting program in Chapter 4 of Kernighan and Plaugers's _Software_Tools_. Their program consists of about two hundred lines of Ratfor code in twelve procedures; translating that into several hundred lines of FORTRAN and testing the code would have taken about a week. (Nothing like dating yourself, right? ;-) Bentley goes on to describe how he (fortunately) kept asking questions until the *REAL* nature of the problem became clear, as summarized below: - The system was used for re-organizing political districts; the data were the precinct numbers that made up a district. - Each value was an integer (a precinct number). - It was illegal for a value to appear more than once. - There would be up to 27,000 values (in the range 1..27,000). - The system only had 1K-2K sixteen-bit words of free memory at that point. (I told you I was dating myself! ;-) Given this restatement of the problem, *now* how would you advise the programmer? (other than offering him a ride in your time machine, of course! ;-) -jn- -- ; Joel Neely joeldotneelyatfedexdotcom REBOL [] do [ do func [s] [ foreach [a b] s [prin b] ] sort/skip do function [s] [t] [ t: "" foreach [a b] s [repend t [b a]] t ] { | e s m!zauafBpcvekexEohthjJakwLrngohOqrlryRnsctdtiub} 2 ]