Mailing List Archive: 49091 messages
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

Challenging script idea

 [1/9] from: grantwparks::yahoo::com at: 29-Aug-2000 12:43


Don't laugh, but... I was noticing in the script library (web section) that mailpage.r and websend.r are identical. So here's the challenge: as powerful as parse (and other language processing features) is, can someone come up with a script that would analyze the tokens in a pair of scripts and determine when they are essentially the same?

 [2/9] from: sterling:rebol at: 29-Aug-2000 14:29


Well, it sounded fun so here's what I've got. The output running it on the two files you talked about is at the bottom. The diff shows a list of blocks with tokens and a number which is how many times that token was in the file. You may see the same token listed in the diff for each file if the number of appearances is different. Well, enjoy! Sterling REBOL [ Title: "Simple token diff" Purpose: { I don't know, really. It just tries to figure out how many REBOL tokens are different between two files. Somebody thought it would be neat. ;) Maybe they'll ake it complete and fix whatever lurking bugs there are in this code.} Author: "Sterling Newton" ] a: ask "File or URL #1? " b: ask "File or URL #2? " get-type: func [item [string!]] [ switch/default true reduce [ found? find item "://" [item: to-url item] found? find item "%" [item: to-file next item] ] [a: to-file a] item ] a: get-type a b: get-type b ; the unique tokens and totals blocks foreach item [a-tokens b-tokens a-totals b-totals] [ set item copy [] ] file1: load/next a file2: load/next b tokenize-block: func [ blk [block!] tokens [block!] totals [block!] /local tmp idx] [ while [not empty? blk] [ either block? blk/1 [ tokenize-block load/next form blk/1 tokens totals ] [ either tmp: find tokens blk/1 [ idx: index? tmp totals/:idx/2: totals/:idx/2 + 1 ] [ append tokens blk/1 repend/only totals [blk/1 1] ] ] blk: load/next blk/2 ] ] tokenize-block load/next file1 a-tokens a-totals tokenize-block load/next file2 b-tokens b-totals print ["The two files differ by:" length? difference a-tokens b-tokens "tokens."] print ["----- Tokens in" a "not in" b "-----"] foreach item intersect diff: difference a-totals b-totals a-totals [ probe item ] print ["----- Tokens in" b "not in" a "-----"] foreach item intersect diff b-totals [ probe item ]
> Don't laugh, but... > I was noticing in the script library (web section)
<<quoted lines omitted: 7>>
> Yahoo! Mail - Free email you can access from anywhere! > http://mail.yahoo.com/
========== results from the two web page emailing scripts ==========
>> do %/home/moses/temp/diff.r
File or URL #1? http://www.rebol.com/library/html/mailpage.html File or URL #2? http://www.rebol.com/library/html/websend.html The two files differ by: 14 tokens. ----- Tokens in http://www.rebol.com/library/html/mailpage.html not in http://www.rebol.com/library/html/websend.html ----- [Email 2] [a 2] [Page 1] [mailpage.r 1] [10-Sep-1999 1] [page. 1] [(simple) 1] [http://www.rebol.com/releases.html</font> 1] ----- Tokens in http://www.rebol.com/library/html/websend.html not in http://www.rebol.com/library/html/mailpage.html ----- [Page 2] [Emailer 1] [websend.r 1] [20-May-1999 1] [Fetch 1] [a 1] [and 1] [it 1] [as 1] [email. 1] [email 1] [http://www.rebol.com</font> 1]

 [3/9] from: grantwparks:yaho:o at: 30-Aug-2000 8:37


Could this script evolve in the following ways? 1. Ignore header info like author, date and comments 2. Ignore a difference in URLs, filenames, etc. when they are the target of a particular verb. I.E. if I read a URL, but the specific value differs, the 2 scripts are still doing the same thing. In fact the coolest thing I can think of in this case would be reporting that "these 2 scripts are functionally the same, but the first operates on URL: http://www.url1.com and the second on http://www.url2.com..." Don't mean to put the work on others, but I have barely had the time to even dabble with Rebol, yet (hey, 'dabble with Rebol' has a cool ring to it!). I do, however, have a knack for grammars and other meta/abstraction concepts. And since one can look at the template of any defined function at run time, it seems possible to determine which tokens have significance in a context, and which have none or less. Sort of like being able to determine that "the names have been changed to protect the innocent", but the story's the same. What dost thou think? --- [sterling--rebol--com] wrote:

 [4/9] from: g:santilli:tiscalinet:it at: 30-Aug-2000 21:50


Hello [grantwparks--yahoo--com]! On 30-Ago-00, you wrote: g> And since one can look at g> the template of any defined function at run time, it g> seems possible to determine which tokens have g> significance in a context, and which have none or g> less. Sort of like being able to determine that "the g> names have been changed to protect the innocent", but g> the story's the same. What dost thou think? If you want to determine if two scripts are doing the same thing, you'd at least have to simulate their execution. That is, you can't determine that by just statically analyzing REBOL code. If you really find a way to do so, then you've found a way to compile REBOL code. Regards, Gabriele. -- Gabriele Santilli <[giesse--writeme--com]> - Amigan - REBOL programmer Amiga Group Italia sez. L'Aquila -- http://www.amyresource.it/AGI/

 [5/9] from: joel:neely:fedex at: 30-Aug-2000 15:26


Actually, I think one can make a stronger statement. I suspect (without spending many brain-cycles on it) that the GENERAL question "Does function f1 do the same thing as function f2?" is formally unsolvable, in the same way that the halting problem has no general solution. If you can really find a way to do so, you'll cause a general protection fault that will crash the universe. ;-) -jn- [g--santilli--tiscalinet--it] wrote:

 [6/9] from: sterling:rebol at: 30-Aug-2000 14:01


If it's a GPF that takes down the universe, you would be implying that we're all living in MS Universe 2000 or something like that. Maybe that's why we die, eh? It's a bug in the system. Sterling

 [7/9] from: jeff:rebol at: 30-Aug-2000 15:25


Howdy, Sterling:
> If it's a GPF that takes down the universe, you would be > implying that we're all living in MS Universe 2000 or > something like that. Maybe that's why we die, eh? It's a > bug in the system.
That theory sounds as good as any to me -- or maybe a bus error in the brain. If Microsoft created the universe, then we all know that there could be no heaven. "Welcome to heaven, click the start button to begin." Sounds like hell to me. -jeff

 [8/9] from: galtbarber:mailandnews at: 30-Aug-2000 19:52


Well, there is the trivial case of identical source, assuming that it is run in the same context. This is interesting! Could you please elaborate? Isn't checking to see if two different programs are doing "the same thing" rather difficult for all but the dullest cases, in practically any programming language? -Galt

 [9/9] from: grantwparks:ya:hoo at: 31-Aug-2000 8:36


I see the point about how difficult it would be with Rebol, but in other languages, no, it's not that difficult - and in fact using Rebol to analyze another programming language would probably make it a lot easier. I used to do this using Rexx on IBMs to analyze C and assembler programs - replacing sequences of code with more efficient sequences and finding sequences that should become functions and determining what tokens in that code should become parameters. --- [galtbarber--MailAndNews--com] wrote:

Notes
  • Quoted lines have been omitted from some messages.
    View the message alone to see the lines that have been omitted