Interleaving of strings question
[1/7] from: grantwparks:yaho:o at: 25-Sep-2000 4:04
I was thinking about the earlier postings about
sorting coordinates by interleaving digits of the 2
axis values. The interleaving, it would seem, is to
make the longitude as significant as the latitude, in
terms of distance from other points. Ignoring the
need to have the lat/long pairs stored that way for
later processing, and ignoring stated expected values
for the coordinates I thought:
1. Wouldn't it be more precise to interleave each
digit?
2. In terms of a regular x,y system, even that would
be incorrect sometimes. E.G.: [ 0,0 1,5 2,2 5,1 ] (a
sorted set of simplified coordinates) isn't correct.
2,2 is closer to 0,0; 1,5 and 5,1 are the same
distance from 0,0.
3. Could pairs or tuples be used? Answer: pairs
didn't sort that way and I have no reason to assume
sort would think of tuples as coordinates.
4. Wouldn't it be the least convoluted to have a
/compare function for sort that understood that all
components of the coordinate were equally significant?
That way there's no manipulating the coordinates, and
it should be the most precise.
Would a Pythagorean type thing work, because it looks
like simply summing the absolute differences between
each coordinate component isn't right:
1,1 to 1,5 = 4 ((1-1)+(5-1))
1,1 to 2,4 = 4 ((2-1)+(4-1))
yet 2,4 is closer to 1,1 than is 1,5. Until the real
distance is needed, I don't think you'd have to take
the square root of the sum (and how do you do that for
3-dimensions, does it become cubed and cube root -
aha)
(ABS(x1-x2) to the nth) + (ABS(y1-y2) to the nth) ...
+ (ABS(n1-n2) to the nth)
Comments? Is this right?
[2/7] from: grantwparks:y:ahoo at: 25-Sep-2000 5:33
Since we want them sorted, we're talking about their
relative distance from an origin ( 0,0,... ). So it
became (x1 to the nth + y1 to the nth) < (x2 to the
nth + y2 to the nth)...
If one of you great REBOLs would show how to do this
using all the niceties of series! traversal I would
appreciate it; I know my version is a pretty sad
procedural version:
coord-sort: func [ a b /local sum1 sum2 ] [
sum1: 0
sum2: 0
dimension: length? a
while [ not tail? a ] [
sum1: add sum1 power first a dimension
a: next a
]
while [ not tail? b ] [
sum2: add sum2 power first b dimension
b: next b
]
sum1 < sum2
]
--- [grantwparks--yahoo--com] wrote:
[3/7] from: joel:neely:fedex at: 25-Sep-2000 8:23
Hi, Grant!
1) A couple of minor tweaks to coord-sort yields (*WARNING*
*UNTESTED CODE FOLLOWS* *TEST BEFORE USING*)
coord-sort: func [a b /local val dim sum] [
dim: length? a
sum: 0
foreach val a [sum: sum + power val dim]
foreach val b [sum: sum - power val dim]
sum > 0
]
2) Your generalization over dimensionality appeals to me as a
mathematician, but as a "software engineer" I am compelled
to point out that if you're dealing with a performance bottle-
neck, the following is faster:
coord-sort-2d: func [a b /local val dim sum] [
dim: length? a
sum: 0
foreach val a [sum: sum + (val * val)]
foreach val b [sum: sum - (val * val)]
sum > 0
]
3) Putting my mathematics hat back on, I am compelled to
point out that if these are really lat/lon coordinates,
then both the taxicab metric (absolute deltas) and squared
Euclidean metric (sum squared deltas) can give wrong answers
to the general case of "which points are nearest to a specific
target point".
Near the poles, lattitude deltas make MUCH more difference
than comparable longitude deltas. For small deltas, a GREAT
improvement in accuracy results from weighting longitude
delta by cos avg lat in the sum. Some additional fiddling
may be required due to the singularites at the poles, but
pole crossings involve large longitude deltas anyway...
4) Putting my s/w eng'ing hat back on, I should add that the
performance of the whole search issue (assuming you might
need to search for a "nearest point" as in the previous
comment) is dominated by the number of points in the entire
search space. If the space is large, with substantial
dispersion of points, consider using the following speedup trick:
The maximum error of the taxicab metric vs. the Euclidean
metric is the square root of two (e.g., the point at (1,1)
has a taxicab metric of 2, instead of the Euclidean metric
of 1.141... -- it's too large by a factor of sqrt 2. Ergo...
Make a prepass across the search space using the taxicab
metric, retaining only those points whose distance from the
target is at most 1.414... times the least distance-
from-target found thusfar. Then make a second pass across
only the set collected in the prepass, using the correct-
but-more-expensive real metric.
Hope some of this is helpful!
-jn-
[grantwparks--yahoo--com] wrote:
[4/7] from: grantwparks::yahoo::com at: 25-Sep-2000 6:42
Yow! I guess it's a lot safer to play in the
theoretical space ;) I was wondering tho' if you
imagine a 3-dimensional space, does changing the power
to 3 from 2 work for Pythagoras?
I recently talked to some GIS guys who pointed out the
inherent errors in GPS because the tectonic plates
move up to 51 cm / year, or something. They got a big
kick out of that, but somehow I missed the punchline.
I did think, though, that this was a definitively more
accurate method than the original discussion (don't
know if you saw it) that was going to interleave the
long digits into the latitude...
--- [joel--neely--fedex--com] wrote:
[5/7] from: joel:neely:fedex at: 25-Sep-2000 10:18
[grantwparks--yahoo--com] wrote:
> Yow! I guess it's a lot safer to play in the
> theoretical space ;) I was wondering tho' if you
> imagine a 3-dimensional space, does changing the power
> to 3 from 2 work for Pythagoras?
>
Yes, if your 3-space is Euclidean.
> I recently talked to some GIS guys who pointed out the
> inherent errors in GPS because the tectonic plates
> move up to 51 cm / year, or something. They got a big
> kick out of that, but somehow I missed the punchline.
>
ROTFL! You should respond, "Yeah, but they more than
compensated for that when they turned of SA!"
> I did think, though, that this was a definitively more
> accurate method than the original discussion (don't
> know if you saw it) that was going to interleave the
> long digits into the latitude...
>
Well, I saw it, but didn't understand what problem was
being solved (and still don't).
Is the task is to:
1) take a given target location and return the point
in the database that is geographically nearest;
2) take a given target location and return a list of
all points from the database that are "near enough";
3) sort a collection of points and present a human-
readable display based on nearest-to-farthest; or
4) something else?
In addition, I'm unclear as to whether:
1) the target is an arbitrary point or is constrained to
be one of the points in the database;
2) how many points are in the database;
3) what resolution is required; and
4) the dispersion of the points in the database:
4a) how wide an area is covered, and
4b) how "clumpy" the distribution is.
All of these would influence my choice of strategy. For
example, if the task is (1) or (2), and the points are
fairly uniformly distributed over a large range of lat/lon
values, I'd consider a two-level strategy of partitioning
the data into coarse-grained lat/lon cells (one-time,
front-end work), then searching only the cell that should
contain the target point (or nearby cells as needed for
near enough
or "empty cell" constraints).
For example -- assuming two-degree cells for the sake of
illustration -- to locate the point 37.4968N x 89.9935W,
look first in the cell containing 36-37N x 88-89W. As
needed, expand the search into the neighboring cells:
...... ...... ...... ...... ......
...... ...... ...... ...... ......
...... 38-39N 36-37N 34-35N ......
...... 90-91W 90-91W 90-91W ......
...... 38-39N 36-37N 34-35N ......
...... 88-89W 88-89W 88-89W ......
...... 38-39N 36-37N 34-35N ......
...... 86-87W 86-87W 86-87W ......
...... ...... ...... ...... ......
...... ...... ...... ...... ......
This strategy lets you "zoom in" quickly to the area
of relevance, then do the more complex calculations
only for a small subset of the space.
Another strategy would be to pre-compute the true
distance-from-origin for each point (storing that value
with each point), then use that as a figure of merit
for limiting the search space. Points near the target
will obviously have a similar distance-from-origin to
the target's value. Of course, the converse is not
true (90Nx0W is the same distance from 0Nx0W as 0Wx90N),
but the point of this trick is just to reduce the nunber
of points that must be individually evaluated.
Working with real data on real computers with real
memory and performance limits can DEFINITELY get
hairy!
Have fun!
-jn-
[6/7] from: lmecir:geocities at: 26-Sep-2000 0:21
Hi,
although your question has a little in common with Rebol, my answers are as
follows:
1. It would be more precise to interleave each bit - ie. each binary digit -
as the smallest unit of information
2. in that case the bit - sort yields:
[0,0 2,2 1,5 5,1]
3. as you pointed out, the result of the sort doesn't depend on the
representation, but on the comparison function, any suitable representation
can do
4. your suggestion doesn't work, because you need close things in the terms
of distance to remain close in the terms of sort. Your suggestion works only
in the neighborhood of the 0,0 point
Best regards
Ladislav
[7/7] from: grantwparks:yaho:o at: 26-Sep-2000 13:33
I think it is an interesting programming topic (and I
did not mean my earlier comments to be absolute - it
was directed at the seemingly unresolvable debates
about whether or not do such and such). Most sorting
is done, it seems, on some sort of scalar level, and I
was interested in the idea of using each sort item as
a description of something else, in this case a point
on a plane. (I'm not sure why your #4 comment means
my function is wrong. Unless you mean that the
resulting sorted list is in order according to "its
distance from a point 0,0 that defines the most
'lower-left' point of an arbitrary plane". I.E. it
only works in quadrant II of that old x-y graph.
Please explain, if not. Maybe what's more useful
would be the function sorting in terms of distance
from any point, where I think the deltas come back
in.) To me it could relate to some sort of
abstraction on sort (using objects, blocks whatever)
that might be useful. I guess I knew I was opening
myself up to a "look who's talking" attitude, but I
was thinking about a real-world problem in terms of
some abstraction and trying to really understand what
the original poster's interleave method was getting
at. Often that leads to a cool generalization that
can be used elsewhere. That's why I considered it to
be a programming topic. You could argue it belonged
in a different list, but this is where the original
post was, and I'm trying to learn REBOL, so it's where
I posted. Is that the same as a thread discussing why
MS is successful, or why they suck? Again, on some
level you could argue it is. Up to you, I guess.
--- [lmecir--geocities--com] wrote: