[REBOL] Re: Sort by first part of line
From: joel:neely:fedex at: 8-Sep-2002 7:29
Hi, Sunanda,
Very nice summary!
May I pick one small nit and reminisce for a moment?
[SunandaDH--aol--com] wrote:
> ... thread has developed into two parallel tracks, with
> a third on the way.
>
> TRACK 1
> Has several of us looking at increasingly faster ways to
> pre-process the lines to get stable sort keys.
>
I should have been more clear about the goal of my submission.
I meant to imply (but I didn't state well) that we need to be
careful about the force of habit, as it is VERY powerful. We
all jumped to the conclusion, based on Louis's first question,
that we needed to sort the data.
We've even found a way to call the solution I proposed a kind
of sort.
Actually, I began thinking about the fact that most of the
order that Louis requested was already present (i.e., the lines
with identical prefixes were already in the desired order,
relative to each other), which led me simply to group the data
by the one field that had actionable differences, the prefix.
Ergo, a solution that actually doesn't involve sorting at all.
Now for the reminiscence.
I didn't remember it (consiously... ;-) until writing this
particular email, but there's a really lovely book titled
_Programming_Pearls_ by Jon Bentley of AT&T Bell Labs (the
copy in front of me is the first edition by Addison Wesley,
(c) 1986, but I believe there's a newer version out). The
book is a collection of eponymous columns Bentley wrote for
CACM.
The very first column, titled "Cracking the Oyster", opens
as follows:
The programmer's question was simple: "How do I sort
on disk?" Before I tell you how I made my first
mistake, let me give you a chance to do better than
I did. What would you have said?
DON'T SCROLL DOWN
UNTIL
YOU
HAVE
THOUGHT
ABOUT
YOUR
ANSWER
TO
BENTLEY'S
QUESTION
,
PLEASE
!
He then continues:
My mistake was to answer his question. I gave him a
thumbnail sketch on how to sort on disk. My suggestion
that he dig into Knuth's classic _Sorting_and_Searching_
met with less than enthusiasm -- he was more concerned
about solving the problem than with furthering his
education. I then told him about the disk sorting
program in Chapter 4 of Kernighan and Plaugers's
_Software_Tools_. Their program consists of about two
hundred lines of Ratfor code in twelve procedures;
translating that into several hundred lines of FORTRAN
and testing the code would have taken about a week.
(Nothing like dating yourself, right? ;-)
Bentley goes on to describe how he (fortunately) kept asking
questions until the *REAL* nature of the problem became clear,
as summarized below:
- The system was used for re-organizing political districts;
the data were the precinct numbers that made up a district.
- Each value was an integer (a precinct number).
- It was illegal for a value to appear more than once.
- There would be up to 27,000 values (in the range 1..27,000).
- The system only had 1K-2K sixteen-bit words of free memory
at that point. (I told you I was dating myself! ;-)
Given this restatement of the problem, *now* how would you
advise the programmer? (other than offering him a ride in your
time machine, of course! ;-)
-jn-
--
; Joel Neely joeldotneelyatfedexdotcom
REBOL [] do [ do func [s] [ foreach [a b] s [prin b] ] sort/skip
do function [s] [t] [ t: "" foreach [a b] s [repend t [b a]] t ] {
| e s m!zauafBpcvekexEohthjJakwLrngohOqrlryRnsctdtiub} 2 ]