Command Line Parser Module

[1/3] from: greggirwin::mindspring::com at: 6-Oct-2003 22:10

Hi All, Here's an experimental command line parsing module. Let me know if you think it's useful (and worth pursuing further), what you like, and what you don't. -- Gregg Command Line Parser Module === Read-Me --- Introduction This is an *experimental* version of a command line parsing module for REBOL. It's so experimental that I don't even have a good name cooked up for it yet. :) The goal is to make it as easy as possible to define and process command line interfaces for REBOL scripts and applications. To that end, there is a dialect and one main function on the front end. The PARSE-COMMAND-LINE function takes input data--generally something like system/options/args--and a dialected spec block, very much like PARSE does, but using its own dialect. All you have to do after that is read the results. args: {--verbose -f %input.txt} ; << system/script/args options: parse-command-line args [ --verbose -v {verbose mode} -f input-file {The input file} ]

>> options/verbose

== true

>> options/input-file

== %input.txt To make your life even easier, standard usage and version options are built in (though crude right now). The command line syntax is meant to support standard Unix utility formats. I'm a Windows guy, but REBOL runs everywhere and I think this is the best option. Adding support for DOS "/" option syntax, as an alternative to "-" should be neither hard nor necessary. :) IMPORTANT NOTE! The input data is converted to a block! before it is parsed. --- Definitions :option - A switch that takes no arguments. :opt-arg - A switch that takes one or more arguments. :operand - A positional argument Options some in two varieties: short and long. Short options are a single character preceded by a single dash (-) and long options are a word precded by two dashes (--). -q --verbose Opt-Args consume one or more arguments following the option token. Right now, the input is converted to a block for easy processing, so standard REBOL lexical rules apply. For long options, single argument(s) must be separated from the token by a space or equal sign (=); multiple arguments have to be separate lexical items. If you use the "<opt>=<arg>" syntax, the argument will be seen as a string. You'll need to convert it yourself until datatype validations and coercions are in place. --file %input.txt --file=%input.txt Short options use the same form as long options, with one addition; you can put the argument immediately following the switch token. I don't care for it, but it's an accepted standard. -f %input.txt -f=input.txt -finput.txt Operands are just regular values that have no corresponding switch token associated with them. They are just accumulated in a block; there is no support for operands beyond that at this point. --- The Dialect You specify options and opt-args much as they would appear in documentation for your program. e.g. --quiet -q {quiet mode} --verbose -v {verbose mode} The rules are simple: Start with one or more words that begin with a single or double dash, which will be interpreted as option tokens. Next you can optionally include a word that *doesn't* start with a dash, which says you're defining an opt-arg instead of just a plain option. Next, put a string that will be displayed if the user asks for help on the program. Finally, you can include a spec block for the internal object that will be used during processing (See: Objects). The dialect is basically: some dash-word!s opt non-dash-word! string! opt block! Example: -C n { Shows n lines of context before and after each change. diff marks lines removed from path1 with -, lines added to path2 with + and lines changed in both files with !. This option conflicts with the -e and -f options. } [ name: 'num-context-lines ; overrides 'n args: integer! action: [value: to integer! arg] default: 3 ] \note You can put short strings on the same line as the tokens, but longer strings should be formatted over multiple lines as you want them to appear to the user. /note If multiple tokens are given for an option, the "name" of the token will be taken from the first one given. In the case of an opt-arg, the name will come from the arg name given after the option tokens. The name is important because that's how you're going to find it later. --- Objects When you call PARSE-COMMAND-LINE, giving it your command line spec, it will return a COMMAND-PARSER object that is filled with values that were parsed from the command line data. The name of each option becomes a word in that object, which is how you read out values that were set during the parsing process. For example: args: {--verbose -f %input.txt} options: parse-command-line args [ --verbose -v {verbose mode} -f input-file {The input file} ]

>> options/verbose

== true

>> options/input-file

== %input.txt In addition to the individual word values, you can get a list of all the custom words added for each option in the NAMES field.

>> options/names

== [verbose input-file help version] \note HELP and VERSION are built-in options that provide standard functionality for you at no charge. /note As your command line spec is parsed, internal objects are created for each option and opt-arg (and, likely, eventually operands). By providing a spec block for an option you can perform actions, override the name, provide default values, and tell it how many and what type of arguments it takes (though validation and type casting are not in place yet). --- Actions Actions are defined in the spec block for an option. -C n {Number of context lines} [ action: [value: to integer! arg] ] Actions are just blocks of REBOL code. The current implementation is... um...not all that great in regard to how these are handled. The thing you really need to know is that VALUE, ARG, and SPEC are special words in the context of an action. VALUE means the value of the option object where the action block is defined, ARG refers to the argument(s) consumed for the current opt-arg, and SPEC refers to the original command line spec you provided (it's used for automatic USAGE display). The default action, if none is given, will set the value of an option to true and the value of an opt-arg to the argument(s) consumed on its behalf. ;============================================================== REBOL [ Title: "Command Line Dialect: Experimental Version A" File: %cl-dialect-ex-a.r Author: "Gregg Irwin" Email: [greggirwin--acm--org] Date: 6-Oct-2003 Version: 0.0.1 Purpose: { Provide support for easy, automatic, command line parsing. You define the options in a dialect, which is used to build internal objects that are used to parse a command line. } Comment: { ***************************** *** THIS IS EXPERIMENTAL! *** ***************************** I include that caveat because I'm somewhat embarrassed by how I hacked it together. I have pages and pages of design ideas and notes, along with visions of an elegant PARSE-based implementation, but I was spending so much time thinking about the different ways it could be done, that I never got around to actually *doing* something with it. :\ So, I decided that I'd take some of the ideas and just whack something together to play with, and that's what this is. No, I'm not sure I like it, but it's something we can all use as a starting point, even if we only learn what *not* to do from it. :) One of the things we should use it for is to iron out the input dialect. There is a lot of other stuff to do: - General design needs to be re-thought to avoid all the ugly binding issues I created with this design (i.e. option/action contexts). - usage info for operands - conflict specification and handling - data type validation/coercion - what to do with unknown tokens - clean out and refactor unused idea bits - comment and explain things a lot more - CHOICE args (i.e. one of a set of options) - where best to get program name and version - match abbreviated long option names? - mulitple arg names for opt-args? - action handler for operands? - named operands? - more complete program info dialect? program-info: [ name: version: synopsis: description: options: operands: examples: environment-variables: diagnostics: messages: limits: ] I've looked at a number of modules in other languages that do this kind of thing, from getopt on up, but the heaviest influence was the Python Optik module by Greg Ward (http://optik.sourceforge.net/). } ] option!: make object! [ type: 'option name: tokens: action: conflicts-with: desc: value: none ] opt-arg!: make option! [ type: 'opt-arg args: default: none ] operand!: make object! [ type: 'operand name: args: default: optional: ordinal: desc: value: none ] command-parser: make object! [ ; For internal use and debugging needs only. _option: copy [] _opt-arg: copy [] _operand: copy [] _token-map: copy [] _spec: none ; ; Add default options (if we decide not to do it in parse-command-line). ; append _option reduce [ ; 'help make option! [ ; name: 'help tokens: [--help -h] action: [show-usage] ; desc: "Show usage information" ; ] ; 'version make option! [ ; name: 'version tokens: [--version] action: [show-version] ; desc: "Show version information" ; ] ; ] names: copy [] ; custom option names added to the object. operands: does [:_operand] ; public access to operands. clean-token: func [ "Returns token less any attached arguments that come after an = sign." token ][ ; This fails if token is empty or has only spaces before the = sign. to word! first parse/all token "=" ] do-action: func [ "Execute the action associated with the token." token arg /local obj act ][ ;!! This routine is kludgey, because I put things together ; in such a way that binding/evaluation issues are problematic. ;print ["do-action" token arg] ;attempt [ obj: obj-from-token clean-token token either obj [ act: copy any [ obj/action ;?? Should we allow default values for options and use ; "not obj/default" here instead of true? [set in obj 'value either opt-arg? token [arg][true]] ] ;print mold act replace/all act 'arg either word? arg [to lit-word! arg][arg] ;!! YAK (yet another kludge). MOLDing to prevent evaluation. replace/all act 'spec mold _spec ;print ["x" token arg type? attempt [last act] mold act] do act ][ print ["Unknown token found:" token] ] ;] ] find-in-map: func [token][find _token-map to word! token] name-from-token: func [ "Returns an option name given any token that maps to it." token /local pos ][ either pos: find-in-map token [to word! first find pos lit-word!][none] ] obj-from-token: func [ "Returns an option name given any token that maps to it." token /local pos ][ either pos: find-in-map token [first find pos object!][none] ] opt-arg?: func [ "Returns true if the token maps to an opt-arg; false otherwise." token ][ attempt [select _opt-arg name-from-token clean-token token] ] parse-cl: func [ {Parses a command line according to the settings (options, etc.) in the parent command-parser object. Returns the object filled with data from the parse operation.} data /local ;-- funcs get-args get-opt-args process-long-opt process-operand process-opt process-short-opts ;-- vars args arg arg-str ][ ;-- Local Functions get-args: func [ {Returns the arguments for the given token.} token [string!] obj /short full-token [string!] ][ ;print [tab "get-arg:" token mold obj/args] any [ ; For short options, with a single arg, it can be butted ; right up against them. all [ short (2 < length? full-token) ; Allow things like "-qfin-file.txt", where ; -q is an option and -f is an opt-arg? (copy next find full-token last token) ] ; Both short and long args can have their args after an = sign. ; e.g. -a=on, --mode=text ; Handle opt-args using <opt>=<arg> format. pick parse/all token "=" 2 ; Get next <n> items from ARGS. get-opt-args obj ] ] get-opt-args: func [ {Consumes the number of arguments specified for the given opt-arg object from ARGS and returns them.} obj /local result num-args ][ ;print [tab tab "get-opt-args:" obj/name mold obj/args] num-args: either block? obj/args [length? obj/args][1] result: either num-args > 1 [ copy/part next args num-args ][ first next args ] args: skip args num-args result ] process-long-opt: func [ {Consumes any arguments for the option and performs its actions.} token [string!] /local obj arg ][ ;print ["Long Opt:" token] process-opt token ;print [tab "opt-arg?:" either obj [true][false] tab "arg:" arg] ] process-operand: func [arg] [ ;print ["Operand:" arg] append self/_operand arg ;do-action arg ??? ] process-opt: func [ {Inner option processor, for short opts that have an <opt>=<arg> format and all long opts.} token /local obj arg ][ if obj: opt-arg? token [arg: get-args token obj] ;print [ ; tab "process-opt:" token tab "opt-arg?:" ; either obj [true][false] tab "arg:" arg ;] do-action token arg ] process-short-opts: func [ {Consumes any arguments for the options and performs their actions. Handles both single and grouped tokens.} token [string!] /local obj tok arg ][ ;print ["Short Opts:" token] either find token #"=" [ process-opt token ][ foreach char next token [ ; skip leading "-" arg: none ;print [tab "Short Opt:" join "-" char] either obj: opt-arg? tok: join "-" char [ arg: get-args/short tok obj token do-action tok arg ;print [tab "Short opt-arg:" tok tab "arg:" arg] break ][ do-action tok arg ;print [tab "Short option:" tok] ] ] ] ] ;-- Processing args: to block! data while [not tail? args] [ arg-str: form arg: first args switch/default true reduce [ '-- = arg [

[2/3] from: AJMartin:orcon at: 24-Dec-2003 22:38

Gregg wrote:

> Here's an experimental command line parsing module.

I admire the sheer size of it, Gregg! :) Andrew J Martin Grail Jedi Who's feeling... "inadequate"... ICQ: 26227169 http://www.rebol.it/Valley/ http://valley.orcon.net.nz/ http://Valley.150m.com/

[3/3] from: greggirwin:mindspring at: 7-Oct-2003 1:28

Hi Andrew, AJM> I admire the sheer size of it, Gregg! :) What's that Pascal quote..."I would have written a shorter module, but I didn't have time." :) -- Gregg