[AWK GREP] AWK and GREP funcs, arg order

[1/11] from: gregg::pointillistic::com at: 15-May-2007 12:23

Design question: AWK and GREP take the program/pattern as their first arg (not counting switches and options), and the file or files as the last arg. I remember Carl saying once that the arg order in REBOL is generally governed by the desire to put the most important arg first. e.g. series functions take the series as the first arg. If you were designing AWK and GREP functions for REBOL, what arg order would you use? -- Gregg

[2/11] from: anton::wilddsl::net::au at: 17-May-2007 13:02

Hi Gregg, I think the best arg order is the one that puts the smallest and/or most constant-sized args first, and the larger or variable-length args later. This leads to more easily identifiable expressions, and more of the expression can be typed before pausing to consider the more variable parts. When SWITCH was being added to rebol, I argued against using Carl's simple arg order rule, which would have put the CASES arg before the VALUE arg. Thank heavens he listened to reason. I generally dislike looking for SELECT's second argument, and NEW-LINE bugs me more, where you have to wrap the thing you're modifying in a sandwhich of 2 arguments. It means more unnecessary keyboard navigation and visual concentration to find the position for the second arg. Anyway, rant off. Can you give some examples of typical arguments for your AWK and GREP ? Then we can consider various orders. Regards,7 Anton.

[3/11] from: gregg::pointillistic::com at: 17-May-2007 17:51

Hi Anton, AR> Can you give some examples of typical arguments for your AR> AWK and GREP ? Then we can consider various orders. rgrep/deep "REBOL" %*.r rgrep "my-func" %my-lib.r rawk %*.r [(find __ "REBOL") [print [_filename _fnr]] rawk/deep %*.r [(find __ "REBOL") [print [_filename _fnr]]]] rawk/deep %*.r [ (find __ "REBOL") [print [second split-path filename _fnr copy/part __ 20]] ] files: file-list/deep %*.r rawk files [ (find __ "REBOL") [print [second split-path filename _fnr copy/part __ 20]] ] files: file-list/deep %*.r prog: [ (find __ "REBOL") [ print [second split-path filename _fnr copy/part __ 20]] ] rawk files prog Both AWK and GREP take the program/pattern first. I did that for both of mine originally, then changed RAWK to take the sources first, which I think I prefer. They could both eventually load their program or pattern specs from disk, and files can be a single name, a block of files, a file spec (glob), or a variable. Are we applying the program to the files, or the files to the program? -- Gregg

[4/11] from: anton:wilddsl:au at: 18-May-2007 16:01

Hi Gregg, Ah, well now this reminds me of my FIND-FILE function, which seems to operate similarly. In find-file, I settled on having the filename-pattern first, and the optional contents-pattern second. (Designed for the console, I allow the second arg to be unset!, which is convenient.) I say you're applying the program to the files, because without any files, there's no program. So stage one is determining the files, stage two is examining the contents. Regards, Anton.

[5/11] from: Tom:Conlin:gm:ail at: 18-May-2007 0:23

it is all data. why not let rawk figure out which is which Gregg Irwin wrote:

[6/11] from: GedB::Rushcoding::co::uk at: 18-May-2007 10:12

On 5/18/07, Gregg Irwin <gregg-pointillistic.com> wrote:

> Both AWK and GREP take the program/pattern first. I did that for both > of mine originally, then changed RAWK to take the sources first, which

<<quoted lines omitted: 3>>

> Are we applying the program to the files, or the files to the program? > -- Gregg

I think the logic is that the program/pattern is mostly likely to be reused against different files. For example, if you have a pattern for cleaning up whitespace then that pattern is unlikely to change. You will, however, apply that pattern to different files. So the part that is most likely to change is kept at the end. This especially makes sense in a command line environment, since it involves retrieves the commands last use, deleting the last paramater and typing a new one. If the file to be update came first, you would have to retrieve the last use, navigate to the filename, delete it while being careful to leave the pattern intact, and then insert the new filename. Ged.

[7/11] from: gregg::pointillistic::com at: 18-May-2007 9:47

Ged: GB> I think the logic is that the program/pattern is mostly likely to be GB> reused against different files. Anton: AR> I say you're applying the program to the files, because without AR> any files, there's no program. Tom: T> it is all data. why not let rawk figure out which is which I agree with all of you. :) I like the idea of auto-identification, but there may be ambiguity. The risk may be low enough that it's worth it though. In order to do that, it needs to be a dialect, rather than a func with args for each one. I thought about that originally, but didn't come up with one I liked, so I fell back on making it a simple func for the time being (I was more concerned with the internals at that point). What should the dialect look like? We have three categories of values: * sources -- files to analyze, but I also support blocks and strings. * program -- probably only one, with the pattern/action pairs as sub-items. * options -- e.g. /deep For command line use, the programs and file specs will probably be very simple, but you can write more complete programs and store them in files. How flexible should it be with regard to applying multiple source specs to multiple programs? For example, if you have overlapping functionality, do you need to run multiple programs, or reload a file list from a global spec for each "pass"? -- Gregg

[8/11] from: jjmmes::yahoo::es at: 19-May-2007 9:01

Hi Gregg,=0A=0AAre you using a suffix tree or a suffix array for your implementation ?=0A=0Athanks=0Ajose=0A=0A----- Mensaje original ----=0ADe: Gregg Irwin <gregg-pointillistic.com>=0APara: Ged Byrne <rebolist-rebol.com>=0AEnviado: viernes, 18 de mayo, 2007 17:47:16=0AAsunto: [REBOL] Re: [AWK GREP] AWK and GREP funcs, arg order=0A=0A=0AGed:=0AGB> I think the logic is that the program/pattern is mostly likely to be=0AGB> reused against different files.=0A=0AAnton:=0AAR> I say you're applying the program to the files, because without=0AAR> any files, there's no program.=0A=0ATom:=0AT> it is all data. why not let rawk figure out which is which=0A=0A=0AI agree with all of you. :) I like the idea of auto-identification,=0Abut there may be ambiguity. The risk may be low enough that it's worth=0Ait though.=0A=0AIn order to do that, it needs to be a dialect, rather than a func with=0Aargs for each one. I thought about that originally, but didn't come up=0Awith one I liked, so I fell back on making it a simple func for the=0Atime being (I was more concerned with the internals at that point).=0A=0AWhat should the dialect look like? We have three categories of values:=0A=0A* sources -- files to analyze, but I also support blocks and strings.=0A=0A* program -- probably only one, with the pattern/action pairs as=0A sub-items.=0A=0A* options -- e.g. /deep=0A=0AFor command line use, the programs and file specs will probably be=0Avery simple, but you can write more complete programs and store them=0Ain files. How flexible should it be with regard to applying multiple=0Asource specs to multiple programs? For example, if you have=0Aoverlapping functionality, do you need to run multiple programs, or=0Areload a file list from a global spec for each "pass"?=0A=0A-- Gregg=0A=0A-- =0ATo unsubscribe from the list, just send an email to =0Alists at rebol.com with unsubscribe as the subject.=0A=0A=0A=0A=0A=0A=0A =0A____________________________________________________________________________________=0A=A1Descubre una nueva forma de obtener respuestas a tus preguntas!=0AEntra en Yahoo! Respuestas.=0Ahttp://es.answers.yahoo.com/info/welcome

[9/11] from: anton::wilddsl::net::au at: 20-May-2007 2:09

Hi Ged, It depends on the user. For myself, I found the opposite, that I was often using the same filename pattern (*.r) as a rebol developer, and looking for different contents. But I can certainly imagine particular contents patterns being reused often in other usage scenarios. Given this ambiguity, I would favour putting the higher-level file-pattern first. Wouldn't it be nice if we could swap the order of arguments somehow ? eg. rawk file-pattern contents-pattern or rawk swap contents-pattern file-pattern where SWAP changes the order of the arguments to suit RAWK's preferred order. (Actually, we should put everything into an associative database, then the above problems do not exist.) Regards, Anton.

[10/11] from: gregg:pointillistic at: 19-May-2007 14:16

Hi Jose, j> Are you using a suffix tree or a suffix array for your implementation? No, it's a simple PARSE-based pattern matcher; you can do other file globbing patterns, not just suffixes. -- Gregg

[11/11] from: gregg::pointillistic::com at: 19-May-2007 14:26

AR> (Actually, we should put everything into an associative AR> database, then the above problems do not exist.) We can do anything we want, if we don't care about keeping it close to the tools they're modeled on; maybe we shouldn't worry about that. One advantage to the fixed-position/func-arg model is that it's easy to write doc strings for and consistent. Another consideration, maybe one that drove the design of many *nix utils, is how the data will be used in a pipe-and-filter scenario. -- Gregg

Notes

Quoted lines have been omitted from some messages.
View the message alone to see the lines that have been omitted