Command Line Parser Module
[1/3] from: greggirwin::mindspring::com at: 6-Oct-2003 22:10
Hi All,
Here's an experimental command line parsing module. Let me know if you
think it's useful (and worth pursuing further), what you like, and what
you don't.
-- Gregg
Command Line Parser Module
=== Read-Me
--- Introduction
This is an *experimental* version of a command line parsing module
for REBOL. It's so experimental that I don't even have a good name
cooked up for it yet. :)
The goal is to make it as easy as possible to define and process
command line interfaces for REBOL scripts and applications. To that
end, there is a dialect and one main function on the front end.
The PARSE-COMMAND-LINE function takes input data--generally something
like system/options/args--and a dialected spec block, very much like
PARSE does, but using its own dialect. All you have to do after that
is read the results.
args: {--verbose -f %input.txt} ; << system/script/args
options: parse-command-line args [
--verbose -v {verbose mode}
-f input-file {The input file}
]
>> options/verbose
== true
>> options/input-file
== %input.txt
To make your life even easier, standard usage and version options are
built in (though crude right now).
The command line syntax is meant to support standard Unix utility
formats. I'm a Windows guy, but REBOL runs everywhere and I think
this is the best option. Adding support for DOS "/" option syntax,
as an alternative to "-" should be neither hard nor necessary. :)
IMPORTANT NOTE! The input data is converted to a block! before it
is parsed.
--- Definitions
:option - A switch that takes no arguments.
:opt-arg - A switch that takes one or more arguments.
:operand - A positional argument
Options some in two varieties: short and long. Short options are
a single character preceded by a single dash (-) and long options
are a word precded by two dashes (--).
-q
--verbose
Opt-Args consume one or more arguments following the option token.
Right now, the input is converted to a block for easy processing,
so standard REBOL lexical rules apply.
For long options, single argument(s) must be separated from the
token by a space or equal sign (=); multiple arguments have to be
separate lexical items. If you use the "<opt>=<arg>" syntax, the
argument will be seen as a string. You'll need to convert it
yourself until datatype validations and coercions are in place.
--file %input.txt
--file=%input.txt
Short options use the same form as long options, with one addition;
you can put the argument immediately following the switch token. I
don't care for it, but it's an accepted standard.
-f %input.txt
-f=input.txt
-finput.txt
Operands are just regular values that have no corresponding switch
token associated with them. They are just accumulated in a block;
there is no support for operands beyond that at this point.
--- The Dialect
You specify options and opt-args much as they would appear in
documentation for your program. e.g.
--quiet -q {quiet mode}
--verbose -v {verbose mode}
The rules are simple: Start with one or more words that begin with
a single or double dash, which will be interpreted as option tokens.
Next you can optionally include a word that *doesn't* start with
a dash, which says you're defining an opt-arg instead of just
a plain option. Next, put a string that will be displayed if the
user asks for help on the program. Finally, you can include a spec
block for the internal object that will be used during processing
(See: Objects). The dialect is basically:
some dash-word!s opt non-dash-word! string! opt block!
Example:
-C n {
Shows n lines of context before and after each change. diff
marks lines removed from path1 with -, lines added to path2
with + and lines changed in both files with !. This option
conflicts with the -e and -f options.
} [
name: 'num-context-lines ; overrides 'n
args: integer!
action: [value: to integer! arg]
default: 3
]
\note
You can put short strings on the same line as the tokens, but longer
strings should be formatted over multiple lines as you want them to
appear to the user.
/note
If multiple tokens are given for an option, the "name" of the token
will be taken from the first one given. In the case of an opt-arg,
the name will come from the arg name given after the option tokens.
The name is important because that's how you're going to find it
later.
--- Objects
When you call PARSE-COMMAND-LINE, giving it your command line spec,
it will return a COMMAND-PARSER object that is filled with values
that were parsed from the command line data. The name of each option
becomes a word in that object, which is how you read out values that
were set during the parsing process. For example:
args: {--verbose -f %input.txt}
options: parse-command-line args [
--verbose -v {verbose mode}
-f input-file {The input file}
]
>> options/verbose
== true
>> options/input-file
== %input.txt
In addition to the individual word values, you can get a list of all
the custom words added for each option in the NAMES field.
>> options/names
== [verbose input-file help version]
\note
HELP and VERSION are built-in options that provide standard functionality
for you at no charge.
/note
As your command line spec is parsed, internal objects are created
for each option and opt-arg (and, likely, eventually operands). By
providing a spec block for an option you can perform actions, override
the name, provide default values, and tell it how many and what type
of arguments it takes (though validation and type casting are not in
place yet).
--- Actions
Actions are defined in the spec block for an option.
-C n {Number of context lines} [
action: [value: to integer! arg]
]
Actions are just blocks of REBOL code. The current implementation is...
um...not all that great in regard to how these are handled. The thing
you really need to know is that VALUE, ARG, and SPEC are special words
in the context of an action. VALUE means the value of the option
object where the action block is defined, ARG refers to the argument(s)
consumed for the current opt-arg, and SPEC refers to the original
command line spec you provided (it's used for automatic USAGE display).
The default action, if none is given, will set the value of an option
to true and the value of an opt-arg to the argument(s) consumed on its
behalf.
;==============================================================
REBOL [
Title: "Command Line Dialect: Experimental Version A"
File: %cl-dialect-ex-a.r
Author: "Gregg Irwin"
Email: [greggirwin--acm--org]
Date: 6-Oct-2003
Version: 0.0.1
Purpose: {
Provide support for easy, automatic, command line parsing.
You define the options in a dialect, which is used to
build internal objects that are used to parse a command line.
}
Comment: {
*****************************
*** THIS IS EXPERIMENTAL! ***
*****************************
I include that caveat because I'm somewhat embarrassed by
how I hacked it together. I have pages and pages of design
ideas and notes, along with visions of an elegant PARSE-based
implementation, but I was spending so much time thinking about
the different ways it could be done, that I never got around
to actually *doing* something with it. :\ So, I decided that
I'd take some of the ideas and just whack something together
to play with, and that's what this is. No, I'm not sure I like
it, but it's something we can all use as a starting point, even
if we only learn what *not* to do from it. :)
One of the things we should use it for is to iron out the
input dialect.
There is a lot of other stuff to do:
- General design needs to be re-thought to avoid all the
ugly binding issues I created with this design (i.e.
option/action contexts).
- usage info for operands
- conflict specification and handling
- data type validation/coercion
- what to do with unknown tokens
- clean out and refactor unused idea bits
- comment and explain things a lot more
- CHOICE args (i.e. one of a set of options)
- where best to get program name and version
- match abbreviated long option names?
- mulitple arg names for opt-args?
- action handler for operands?
- named operands?
- more complete program info dialect?
program-info: [
name: version: synopsis: description: options:
operands: examples: environment-variables:
diagnostics: messages: limits:
]
I've looked at a number of modules in other languages that
do this kind of thing, from getopt on up, but the heaviest
influence was the Python Optik module by Greg Ward
(http://optik.sourceforge.net/).
}
]
option!: make object! [
type: 'option
name: tokens: action: conflicts-with: desc: value: none
]
opt-arg!: make option! [
type: 'opt-arg
args: default: none
]
operand!: make object! [
type: 'operand
name: args: default: optional: ordinal: desc: value: none
]
command-parser: make object! [
; For internal use and debugging needs only.
_option: copy []
_opt-arg: copy []
_operand: copy []
_token-map: copy []
_spec: none
; ; Add default options (if we decide not to do it in parse-command-line).
; append _option reduce [
; 'help make option! [
; name: 'help tokens: [--help -h] action: [show-usage]
; desc: "Show usage information"
; ]
; 'version make option! [
; name: 'version tokens: [--version] action: [show-version]
; desc: "Show version information"
; ]
; ]
names: copy [] ; custom option names added to the object.
operands: does [:_operand] ; public access to operands.
clean-token: func [
"Returns token less any attached arguments that come after an = sign."
token
][
; This fails if token is empty or has only spaces before the = sign.
to word! first parse/all token "="
]
do-action: func [
"Execute the action associated with the token."
token arg
/local obj act
][
;!! This routine is kludgey, because I put things together
; in such a way that binding/evaluation issues are problematic.
;print ["do-action" token arg]
;attempt [
obj: obj-from-token clean-token token
either obj [
act: copy any [
obj/action
;?? Should we allow default values for options and use
; "not obj/default" here instead of true?
[set in obj 'value either opt-arg? token [arg][true]]
]
;print mold act
replace/all act 'arg either word? arg [to lit-word! arg][arg]
;!! YAK (yet another kludge). MOLDing to prevent evaluation.
replace/all act 'spec mold _spec
;print ["x" token arg type? attempt [last act] mold act]
do act
][
print ["Unknown token found:" token]
]
;]
]
find-in-map: func [token][find _token-map to word! token]
name-from-token: func [
"Returns an option name given any token that maps to it."
token /local pos
][
either pos: find-in-map token [to word! first find pos lit-word!][none]
]
obj-from-token: func [
"Returns an option name given any token that maps to it."
token /local pos
][
either pos: find-in-map token [first find pos object!][none]
]
opt-arg?: func [
"Returns true if the token maps to an opt-arg; false otherwise."
token
][
attempt [select _opt-arg name-from-token clean-token token]
]
parse-cl: func [
{Parses a command line according to the settings (options, etc.)
in the parent command-parser object. Returns the object filled
with data from the parse operation.}
data
/local
;-- funcs
get-args get-opt-args process-long-opt process-operand
process-opt process-short-opts
;-- vars
args arg arg-str
][
;-- Local Functions
get-args: func [
{Returns the arguments for the given token.}
token [string!]
obj
/short full-token [string!]
][
;print [tab "get-arg:" token mold obj/args]
any [
; For short options, with a single arg, it can be butted
; right up against them.
all [
short
(2 < length? full-token)
; Allow things like "-qfin-file.txt", where
; -q is an option and -f is an opt-arg?
(copy next find full-token last token)
]
; Both short and long args can have their args after an = sign.
; e.g. -a=on, --mode=text
; Handle opt-args using <opt>=<arg> format.
pick parse/all token "=" 2
; Get next <n> items from ARGS.
get-opt-args obj
]
]
get-opt-args: func [
{Consumes the number of arguments specified for the given
opt-arg object from ARGS and returns them.}
obj /local result num-args
][
;print [tab tab "get-opt-args:" obj/name mold obj/args]
num-args: either block? obj/args [length? obj/args][1]
result: either num-args > 1 [
copy/part next args num-args
][
first next args
]
args: skip args num-args
result
]
process-long-opt: func [
{Consumes any arguments for the option and performs its actions.}
token [string!] /local obj arg
][
;print ["Long Opt:" token]
process-opt token
;print [tab "opt-arg?:" either obj [true][false] tab "arg:" arg]
]
process-operand: func [arg] [
;print ["Operand:" arg]
append self/_operand arg
;do-action arg ???
]
process-opt: func [
{Inner option processor, for short opts that have an <opt>=<arg>
format and all long opts.}
token /local obj arg
][
if obj: opt-arg? token [arg: get-args token obj]
;print [
; tab "process-opt:" token tab "opt-arg?:"
; either obj [true][false] tab "arg:" arg
;]
do-action token arg
]
process-short-opts: func [
{Consumes any arguments for the options and performs their actions.
Handles both single and grouped tokens.}
token [string!] /local obj tok arg
][
;print ["Short Opts:" token]
either find token #"=" [
process-opt token
][
foreach char next token [ ; skip leading "-"
arg: none
;print [tab "Short Opt:" join "-" char]
either obj: opt-arg? tok: join "-" char [
arg: get-args/short tok obj token
do-action tok arg
;print [tab "Short opt-arg:" tok tab "arg:" arg]
break
][
do-action tok arg
;print [tab "Short option:" tok]
]
]
]
]
;-- Processing
args: to block! data
while [not tail? args] [
arg-str: form arg: first args
switch/default true reduce [
'-- = arg [
[2/3] from: AJMartin:orcon at: 24-Dec-2003 22:38
Gregg wrote:
> Here's an experimental command line parsing module.
I admire the sheer size of it, Gregg! :)
Andrew J Martin
Grail Jedi
Who's feeling... "inadequate"...
ICQ: 26227169
http://www.rebol.it/Valley/
http://valley.orcon.net.nz/
http://Valley.150m.com/
[3/3] from: greggirwin:mindspring at: 7-Oct-2003 1:28
Hi Andrew,
AJM> I admire the sheer size of it, Gregg! :)
What's that Pascal quote..."I would have written a shorter module, but
I didn't have time." :)
-- Gregg