[AWK GREP] AWK and GREP funcs, arg order
[1/11] from: gregg::pointillistic::com at: 15-May-2007 12:23
Design question:
AWK and GREP take the program/pattern as their first arg (not counting
switches and options), and the file or files as the last arg.
I remember Carl saying once that the arg order in REBOL is generally
governed by the desire to put the most important arg first. e.g.
series functions take the series as the first arg.
If you were designing AWK and GREP functions for REBOL, what arg
order would you use?
-- Gregg
[2/11] from: anton::wilddsl::net::au at: 17-May-2007 13:02
Hi Gregg,
I think the best arg order is the one that puts the
smallest and/or most constant-sized args first, and
the larger or variable-length args later.
This leads to more easily identifiable expressions,
and more of the expression can be typed before pausing
to consider the more variable parts.
When SWITCH was being added to rebol, I argued
against using Carl's simple arg order rule, which would
have put the CASES arg before the VALUE arg.
Thank heavens he listened to reason.
I generally dislike looking for SELECT's second argument,
and NEW-LINE bugs me more, where you have to wrap the
thing you're modifying in a sandwhich of 2 arguments.
It means more unnecessary keyboard navigation and visual
concentration to find the position for the second arg.
Anyway, rant off.
Can you give some examples of typical arguments for your
AWK and GREP ? Then we can consider various orders.
Regards,7
Anton.
[3/11] from: gregg::pointillistic::com at: 17-May-2007 17:51
Hi Anton,
AR> Can you give some examples of typical arguments for your
AR> AWK and GREP ? Then we can consider various orders.
rgrep/deep "REBOL" %*.r
rgrep "my-func" %my-lib.r
rawk %*.r [(find __ "REBOL") [print [_filename _fnr]]
rawk/deep %*.r [(find __ "REBOL") [print [_filename _fnr]]]]
rawk/deep %*.r [
(find __ "REBOL")
[print [second split-path filename _fnr copy/part __ 20]]
]
files: file-list/deep %*.r
rawk files [
(find __ "REBOL")
[print [second split-path filename _fnr copy/part __ 20]]
]
files: file-list/deep %*.r
prog: [
(find __ "REBOL") [
print [second split-path filename _fnr copy/part __ 20]]
]
rawk files prog
Both AWK and GREP take the program/pattern first. I did that for both
of mine originally, then changed RAWK to take the sources first, which
I think I prefer. They could both eventually load their program or
pattern specs from disk, and files can be a single name, a block of
files, a file spec (glob), or a variable.
Are we applying the program to the files, or the files to the program?
-- Gregg
[4/11] from: anton:wilddsl:au at: 18-May-2007 16:01
Hi Gregg,
Ah, well now this reminds me of my FIND-FILE function,
which seems to operate similarly.
In find-file, I settled on having the filename-pattern first,
and the optional contents-pattern second.
(Designed for the console, I allow the second arg to be
unset!, which is convenient.)
I say you're applying the program to the files, because without
any files, there's no program.
So stage one is determining the files, stage two is
examining the contents.
Regards,
Anton.
[5/11] from: Tom:Conlin::gmail at: 18-May-2007 0:23
it is all data. why not let rawk figure out which is which
Gregg Irwin wrote:
[6/11] from: GedB::Rushcoding::co::uk at: 18-May-2007 10:12
On 5/18/07, Gregg Irwin <gregg-pointillistic.com> wrote:
> Both AWK and GREP take the program/pattern first. I did that for both
> of mine originally, then changed RAWK to take the sources first, which
<<quoted lines omitted: 3>>
> Are we applying the program to the files, or the files to the program?
> -- Gregg
I think the logic is that the program/pattern is mostly likely to be
reused against different files.
For example, if you have a pattern for cleaning up whitespace then
that pattern is unlikely to change. You will, however, apply that
pattern to different files. So the part that is most likely to change
is kept at the end.
This especially makes sense in a command line environment, since it
involves retrieves the commands last use, deleting the last paramater
and typing a new one.
If the file to be update came first, you would have to retrieve the
last use, navigate to the filename, delete it while being careful to
leave the pattern intact, and then insert the new filename.
Ged.
[7/11] from: gregg::pointillistic::com at: 18-May-2007 9:47
Ged:
GB> I think the logic is that the program/pattern is mostly likely to be
GB> reused against different files.
Anton:
AR> I say you're applying the program to the files, because without
AR> any files, there's no program.
Tom:
T> it is all data. why not let rawk figure out which is which
I agree with all of you. :) I like the idea of auto-identification,
but there may be ambiguity. The risk may be low enough that it's worth
it though.
In order to do that, it needs to be a dialect, rather than a func with
args for each one. I thought about that originally, but didn't come up
with one I liked, so I fell back on making it a simple func for the
time being (I was more concerned with the internals at that point).
What should the dialect look like? We have three categories of values:
* sources -- files to analyze, but I also support blocks and strings.
* program -- probably only one, with the pattern/action pairs as
sub-items.
* options -- e.g. /deep
For command line use, the programs and file specs will probably be
very simple, but you can write more complete programs and store them
in files. How flexible should it be with regard to applying multiple
source specs to multiple programs? For example, if you have
overlapping functionality, do you need to run multiple programs, or
reload a file list from a global spec for each "pass"?
-- Gregg
[8/11] from: jjmmes::yahoo::es at: 19-May-2007 9:01
Hi Gregg,=0A=0AAre you using a suffix tree or a suffix array for your implementation
?=0A=0Athanks=0Ajose=0A=0A----- Mensaje original ----=0ADe: Gregg Irwin <gregg-pointillistic.com>=0APara:
Ged Byrne <rebolist-rebol.com>=0AEnviado: viernes, 18 de mayo, 2007 17:47:16=0AAsunto:
[REBOL] Re: [AWK GREP] AWK and GREP funcs, arg order=0A=0A=0AGed:=0AGB> I think the logic
is that the program/pattern is mostly likely to be=0AGB> reused against different files.=0A=0AAnton:=0AAR>
I say you're applying the program to the files, because without=0AAR> any files, there's
no program.=0A=0ATom:=0AT> it is all data. why not let rawk figure out which is which=0A=0A=0AI
agree with all of you. :) I like the idea of auto-identification,=0Abut there may be
ambiguity. The risk may be low enough that it's worth=0Ait though.=0A=0AIn order to do
that, it needs to be a dialect, rather than a func with=0Aargs for each one. I thought
about that originally, but didn't come up=0Awith one I liked, so I fell back on making
it a simple func for the=0Atime being (I was more concerned with the internals at that
point).=0A=0AWhat should the dialect look like? We have three categories of values:=0A=0A*
sources -- files to analyze, but I also support blocks and strings.=0A=0A* program --
probably only one, with the pattern/action pairs as=0A sub-items.=0A=0A*
options -- e.g. /deep=0A=0AFor command line use, the programs and file specs will probably
be=0Avery simple, but you can write more complete programs and store them=0Ain files.
How flexible should it be with regard to applying multiple=0Asource specs to multiple
programs? For example, if you have=0Aoverlapping functionality, do you need to run multiple
programs, or=0Areload a file list from a global spec for each "pass"?=0A=0A-- Gregg=0A=0A--
=0ATo unsubscribe from the list, just send an email to =0Alists at rebol.com with unsubscribe
as the subject.=0A=0A=0A=0A=0A=0A=0A =0A____________________________________________________________________________________=0A=A1Descubre
una nueva forma de obtener respuestas a tus preguntas!=0AEntra en Yahoo! Respuestas.=0Ahttp://es.answers.yahoo.com/info/welcome
[9/11] from: anton::wilddsl::net::au at: 20-May-2007 2:09
Hi Ged,
It depends on the user.
For myself, I found the opposite, that I was often using the
same filename pattern (*.r) as a rebol developer,
and looking for different contents.
But I can certainly imagine particular contents patterns being
reused often in other usage scenarios.
Given this ambiguity, I would favour putting the higher-level
file-pattern first.
Wouldn't it be nice if we could swap the order of arguments
somehow ? eg.
rawk file-pattern contents-pattern
or
rawk swap contents-pattern file-pattern
where SWAP changes the order of the arguments to suit
RAWK's preferred order.
(Actually, we should put everything into an associative
database, then the above problems do not exist.)
Regards,
Anton.
[10/11] from: gregg:pointillistic at: 19-May-2007 14:16
Hi Jose,
j> Are you using a suffix tree or a suffix array for your implementation?
No, it's a simple PARSE-based pattern matcher; you can do other
file globbing patterns, not just suffixes.
-- Gregg
[11/11] from: gregg::pointillistic::com at: 19-May-2007 14:26
AR> (Actually, we should put everything into an associative
AR> database, then the above problems do not exist.)
We can do anything we want, if we don't care about keeping it close to
the tools they're modeled on; maybe we shouldn't worry about that.
One advantage to the fixed-position/func-arg model is that it's easy
to write doc strings for and consistent.
Another consideration, maybe one that drove the design of many
*nix utils, is how the data will be used in a pipe-and-filter
scenario.
-- Gregg
Notes
- Quoted lines have been omitted from some messages.
View the message alone to see the lines that have been omitted