Interleaving of strings question

[1/7] from: grantwparks:yaho:o at: 25-Sep-2000 4:04

I was thinking about the earlier postings about sorting coordinates by interleaving digits of the 2 axis values. The interleaving, it would seem, is to make the longitude as significant as the latitude, in terms of distance from other points. Ignoring the need to have the lat/long pairs stored that way for later processing, and ignoring stated expected values for the coordinates I thought: 1. Wouldn't it be more precise to interleave each digit? 2. In terms of a regular x,y system, even that would be incorrect sometimes. E.G.: [ 0,0 1,5 2,2 5,1 ] (a sorted set of simplified coordinates) isn't correct. 2,2 is closer to 0,0; 1,5 and 5,1 are the same distance from 0,0. 3. Could pairs or tuples be used? Answer: pairs didn't sort that way and I have no reason to assume sort would think of tuples as coordinates. 4. Wouldn't it be the least convoluted to have a /compare function for sort that understood that all components of the coordinate were equally significant? That way there's no manipulating the coordinates, and it should be the most precise. Would a Pythagorean type thing work, because it looks like simply summing the absolute differences between each coordinate component isn't right: 1,1 to 1,5 = 4 ((1-1)+(5-1)) 1,1 to 2,4 = 4 ((2-1)+(4-1)) yet 2,4 is closer to 1,1 than is 1,5. Until the real distance is needed, I don't think you'd have to take the square root of the sum (and how do you do that for 3-dimensions, does it become cubed and cube root - aha) (ABS(x1-x2) to the nth) + (ABS(y1-y2) to the nth) ... + (ABS(n1-n2) to the nth) Comments? Is this right?

[2/7] from: grantwparks:y:ahoo at: 25-Sep-2000 5:33

Since we want them sorted, we're talking about their relative distance from an origin ( 0,0,... ). So it became (x1 to the nth + y1 to the nth) < (x2 to the nth + y2 to the nth)... If one of you great REBOLs would show how to do this using all the niceties of series! traversal I would appreciate it; I know my version is a pretty sad procedural version: coord-sort: func [ a b /local sum1 sum2 ] [ sum1: 0 sum2: 0 dimension: length? a while [ not tail? a ] [ sum1: add sum1 power first a dimension a: next a ] while [ not tail? b ] [ sum2: add sum2 power first b dimension b: next b ] sum1 < sum2 ] --- [grantwparks--yahoo--com] wrote:

[3/7] from: joel:neely:fedex at: 25-Sep-2000 8:23

Hi, Grant! 1) A couple of minor tweaks to coord-sort yields (*WARNING* *UNTESTED CODE FOLLOWS* *TEST BEFORE USING*) coord-sort: func [a b /local val dim sum] [ dim: length? a sum: 0 foreach val a [sum: sum + power val dim] foreach val b [sum: sum - power val dim] sum > 0 ] 2) Your generalization over dimensionality appeals to me as a mathematician, but as a "software engineer" I am compelled to point out that if you're dealing with a performance bottle- neck, the following is faster: coord-sort-2d: func [a b /local val dim sum] [ dim: length? a sum: 0 foreach val a [sum: sum + (val * val)] foreach val b [sum: sum - (val * val)] sum > 0 ] 3) Putting my mathematics hat back on, I am compelled to point out that if these are really lat/lon coordinates, then both the taxicab metric (absolute deltas) and squared Euclidean metric (sum squared deltas) can give wrong answers to the general case of "which points are nearest to a specific target point". Near the poles, lattitude deltas make MUCH more difference than comparable longitude deltas. For small deltas, a GREAT improvement in accuracy results from weighting longitude delta by cos avg lat in the sum. Some additional fiddling may be required due to the singularites at the poles, but pole crossings involve large longitude deltas anyway... 4) Putting my s/w eng'ing hat back on, I should add that the performance of the whole search issue (assuming you might need to search for a "nearest point" as in the previous comment) is dominated by the number of points in the entire search space. If the space is large, with substantial dispersion of points, consider using the following speedup trick: The maximum error of the taxicab metric vs. the Euclidean metric is the square root of two (e.g., the point at (1,1) has a taxicab metric of 2, instead of the Euclidean metric of 1.141... -- it's too large by a factor of sqrt 2. Ergo... Make a prepass across the search space using the taxicab metric, retaining only those points whose distance from the target is at most 1.414... times the least distance- from-target found thusfar. Then make a second pass across only the set collected in the prepass, using the correct- but-more-expensive real metric. Hope some of this is helpful! -jn- [grantwparks--yahoo--com] wrote:

[4/7] from: grantwparks::yahoo::com at: 25-Sep-2000 6:42

Yow! I guess it's a lot safer to play in the theoretical space ;) I was wondering tho' if you imagine a 3-dimensional space, does changing the power to 3 from 2 work for Pythagoras? I recently talked to some GIS guys who pointed out the inherent errors in GPS because the tectonic plates move up to 51 cm / year, or something. They got a big kick out of that, but somehow I missed the punchline. I did think, though, that this was a definitively more accurate method than the original discussion (don't know if you saw it) that was going to interleave the long digits into the latitude... --- [joel--neely--fedex--com] wrote:

[5/7] from: joel:neely:fedex at: 25-Sep-2000 10:18

[grantwparks--yahoo--com] wrote:

> Yow! I guess it's a lot safer to play in the > theoretical space ;) I was wondering tho' if you > imagine a 3-dimensional space, does changing the power > to 3 from 2 work for Pythagoras? >

Yes, if your 3-space is Euclidean.

> I recently talked to some GIS guys who pointed out the > inherent errors in GPS because the tectonic plates > move up to 51 cm / year, or something. They got a big > kick out of that, but somehow I missed the punchline. >

ROTFL! You should respond, "Yeah, but they more than compensated for that when they turned of SA!"

> I did think, though, that this was a definitively more > accurate method than the original discussion (don't > know if you saw it) that was going to interleave the > long digits into the latitude... >

Well, I saw it, but didn't understand what problem was being solved (and still don't). Is the task is to: 1) take a given target location and return the point in the database that is geographically nearest; 2) take a given target location and return a list of all points from the database that are "near enough"; 3) sort a collection of points and present a human- readable display based on nearest-to-farthest; or 4) something else? In addition, I'm unclear as to whether: 1) the target is an arbitrary point or is constrained to be one of the points in the database; 2) how many points are in the database; 3) what resolution is required; and 4) the dispersion of the points in the database: 4a) how wide an area is covered, and 4b) how "clumpy" the distribution is. All of these would influence my choice of strategy. For example, if the task is (1) or (2), and the points are fairly uniformly distributed over a large range of lat/lon values, I'd consider a two-level strategy of partitioning the data into coarse-grained lat/lon cells (one-time, front-end work), then searching only the cell that should contain the target point (or nearby cells as needed for near enough or "empty cell" constraints). For example -- assuming two-degree cells for the sake of illustration -- to locate the point 37.4968N x 89.9935W, look first in the cell containing 36-37N x 88-89W. As needed, expand the search into the neighboring cells: ...... ...... ...... ...... ...... ...... ...... ...... ...... ...... ...... 38-39N 36-37N 34-35N ...... ...... 90-91W 90-91W 90-91W ...... ...... 38-39N 36-37N 34-35N ...... ...... 88-89W 88-89W 88-89W ...... ...... 38-39N 36-37N 34-35N ...... ...... 86-87W 86-87W 86-87W ...... ...... ...... ...... ...... ...... ...... ...... ...... ...... ...... This strategy lets you "zoom in" quickly to the area of relevance, then do the more complex calculations only for a small subset of the space. Another strategy would be to pre-compute the true distance-from-origin for each point (storing that value with each point), then use that as a figure of merit for limiting the search space. Points near the target will obviously have a similar distance-from-origin to the target's value. Of course, the converse is not true (90Nx0W is the same distance from 0Nx0W as 0Wx90N), but the point of this trick is just to reduce the nunber of points that must be individually evaluated. Working with real data on real computers with real memory and performance limits can DEFINITELY get hairy! Have fun! -jn-

[6/7] from: lmecir:geocities at: 26-Sep-2000 0:21

Hi, although your question has a little in common with Rebol, my answers are as follows: 1. It would be more precise to interleave each bit - ie. each binary digit - as the smallest unit of information 2. in that case the bit - sort yields: [0,0 2,2 1,5 5,1] 3. as you pointed out, the result of the sort doesn't depend on the representation, but on the comparison function, any suitable representation can do 4. your suggestion doesn't work, because you need close things in the terms of distance to remain close in the terms of sort. Your suggestion works only in the neighborhood of the 0,0 point Best regards Ladislav

[7/7] from: grantwparks::yahoo at: 26-Sep-2000 13:33

I think it is an interesting programming topic (and I did not mean my earlier comments to be absolute - it was directed at the seemingly unresolvable debates about whether or not do such and such). Most sorting is done, it seems, on some sort of scalar level, and I was interested in the idea of using each sort item as a description of something else, in this case a point on a plane. (I'm not sure why your #4 comment means my function is wrong. Unless you mean that the resulting sorted list is in order according to "its distance from a point 0,0 that defines the most 'lower-left' point of an arbitrary plane". I.E. it only works in quadrant II of that old x-y graph. Please explain, if not. Maybe what's more useful would be the function sorting in terms of distance from any point, where I think the deltas come back in.) To me it could relate to some sort of abstraction on sort (using objects, blocks whatever) that might be useful. I guess I knew I was opening myself up to a "look who's talking" attitude, but I was thinking about a real-world problem in terms of some abstraction and trying to really understand what the original poster's interleave method was getting at. Often that leads to a cool generalization that can be used elsewhere. That's why I considered it to be a programming topic. You could argue it belonged in a different list, but this is where the original post was, and I'm trying to learn REBOL, so it's where I posted. Is that the same as a thread discussing why MS is successful, or why they suck? Again, on some level you could argue it is. Up to you, I guess. --- [lmecir--geocities--com] wrote: