Mailing List Archive: 49091 messages
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

[REBOL] Re: Algorithm challenge: selecting a range from multiple blocks

From: dhsunanda:gm:ail at: 22-Sep-2007 3:51

Thanks for the responses so far. I haven't had time to do any detailed timing tests on larger datasets, but what I have checked has worked well. Thanks to all! *** Tom:
> is it ok for the results to have a mix of new and existing objects
Yes -- the block is ephemeral, so get-subset is just one stage of winnowing it down to a final data structure.
> is 'data only appended to
Yes -- to keep the objects in the same order. There may be ways other than append to achieve that.
> can 'data objects with empty items: [] be safely deleted from 'data?
Yes.
> what is the ratio between updating and querying 'data > what are typical ranges? > how often do ranges fall within one items block? > how big is length? data
You are really asking what is the live application. Good question.... ....It's REBOL.org's search for Altme world archives. If you look here while not logged on: http://www.rebol.org/cgi-bin/cgiwrap/rebol/aga-index.r you'll see only one world archive right now. But we may add others (eg the original REBOL world, then its successor: REBOL2). If you are logged on, then you will see multiple world archives: the RUA/user.r world is visible if you are logged on. Some other world archives exist too (mainly for testing) You'll only see those if your REBOL.org member name is on the list for those world archives. The CGI search (not yet live) works by searching *all* world archives visible to you, and then windowing the results -- so you may see 100 results to a webpage. Those results may be partially from (say) the R3WP archive and partially from the RUA archive. What's a typical search? It's hard to say. We want to work well and fast for edge cases..... ....Like a search for the word "the" or "a". Those cases will produce objects with many tens thousands of entries. If the user has their paging window set to (say) 50 results, typically get-subset will return just one object with 50 entries. .....A search for a rare word ("bucket" is in my test data set) produces relatively few hits, so get-subset typically ends up returning all the objects with all the items -- ie the use will see only one page of results. Though the code to add the pagination and emit HTML is not in place, you can see a sneak preview of the code to date here: http://www.rebol.org/cgi-bin/cgiwrap/rebol/aga-search.r?q=bucket Try while logged on, and vary the word being searched, and you'll get a feel for the sort of data get-subset will be working on. To formally map to the algorithm challenge: * there is one object per visible world archive * the raw-hits block within each object contains the zero or more integers; each maps to an Altme posting that contains the searched word(s). * get-subset has not (yet!) been applied to the data you see on the webpage *** More challenge entries welcome! Sunanda