Mailing List Archive: 49091 messages
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

Perl is to stupid to understand this 1 liner.

 [1/47] from: doug:vos:eds at: 12-Dec-2001 16:24


I love the simplicity of rebol's interaction with humans. Many other scripting languages are so dumb, you have to get down to the dumb level to understand them... Example: try this at the prompt. - Test for leap year
>> 29-Feb-1999
** Syntax Error: Invalid date -- 29-Feb-1999 ** Near: (line 1) 29-Feb-1999 However there is a leap year in 2000, so rebol knows right away what you mean...
>> 29-Feb-2000
== 29-Feb-2000
>> Can perl do that?
== NO

 [2/47] from: jseq:mediaone at: 12-Dec-2001 18:07


Can't... resist ... replying to .... flamebait ... perl -MDate::Manip -e "print q(bad date) unless &ParseDate( q( 29-Feb-1999 ) )" I admit to cheating and using a CPAN solution, but hey that's the idiomatic way to do it. I've learned a lot from REBOL about cleaner programming style, and I've looked to apply that knowledge back into the perl coding that I do. Mixing data and script w/<DATA> and <<here, (de-)serializing data structures with Data:Dumper and 'do', using Parse::RecDescent to build grammars, functional manipulation with map/grep, etc. Perl supports the same paradigms, though some (types esp.) are bolted on a bit more coursely than others. Anyway, why not be smart and use both? John Sequeira http://www.pobox.com/~johnseq [johnseq--pobox--com]

 [3/47] from: joel:neely:fedex at: 13-Dec-2001 18:15


Hi, Doug, Vos, Doug wrote:
...
> try this at the prompt. - Test for leap year > >> 29-Feb-1999
<<quoted lines omitted: 6>>
> >> Can perl do that? > == NO
I respectfully suggest that you do NOT want to open that door. The only valid way I know of to use "one-liners" in comparing languages would be to consider: 1) The range of one-liners in language A and language B. 2) The actual utility of EACH such one-liner in each language. 3) The effort required to learn them. Items 1 and 2 give the credit side of the ledger, while item 3 is the debit side. Net ROI (return-on-investment) is the balance between the two. Note that item 2 includes both "How often would I use this one-liner?" and "Would learning this one-liner prepare me to create other useful one-liners?" One-liners often encourage (or at least demonstrate) techniques which may not have wide application, and may be more biased to compactness than readability and reusability. That's not always bad, but it means that one-liners may have little to do with the way a professional programmer will write significant pieces of code in the language in question. Feature count is by no means the only (nor most useful) measure of a language; if it were, REBOL would come out on the short end of *many* comparisons. REBOL is a flexible and compact scripting language designed primarily for "programming in the small" (according to a quotation from Carl Sassenrath in the document "shoebox" on the REBOL Forces site). However, that compactness comes with a price: limits on features. REBOL is also far from the highest in brute performance of the scripting languages of my acquaintance. Perl, on the other hand, is a mature, high-performance language with an incredibly rich feature set and even richer standard library of additional components. However, that richness comes with a price: the installation of Perl 5.6 on the Solaris box I'm using today weighs in at about 15 Mb. It is certainly legitimate to show a piece of code that does a specific thing that one finds useful and cite that as a benefit of the language. I do not think it is fair to take one such example and claim that another language that takes more keystrokes for the same result is "stupid". By that measure, almost every language (including REBOL) is "stupid" compared with APL. I might add that I recently attended a meeting which included a presentation on Python; the presenter concluded with a single Python expression which, when evaluated, created an ASCII-art image of the Mandelbrot Set. Although the expression was long enough to line-wrap (multiple times! ;-) it was still far more compact than one would reasonably expect, and *way* far more compact than one could write in REBOL. I am neither advocating for nor against Perl or REBOL here; my oft-stated view is that languages are tools, each with specific capabilities and strengths (and weaknesses). I have no time for arguments regarding whether a handsaw is "better" or "worse" than a screwdriver. That said, I can play parlor games too! (To avoid appearing to contradict myself, let me point out that I don't take this exercise at all seriously...) SUPPOSE that I have a memo containing phone numbers. I need to give someone a copy of the memo, but need to blot out the phone numbers, in the best CIA tradition. ;-) Here's the memo: 8<---------- Ms. Antoinette, I spoke with George Washington at 555-1212 about our pending contract. He referred me to Ben Franklin (800-555-1111) of their technical support department. Ben said that they were testing their latest release on WhizBangOS version 17.3 as we had requested, and that he would have our answer tomorrow. Ben also said that their lead developer, Betsy Ross, would like to talk to you about the use of complex numbers in the SystemSleepFor function. You may call her office at 123-4576, her cell phone at 987-6543, or page her at 111-1111. She is very eager to describe this new feature. Sincerely, Thomas Paine 8<---------- I can protect everybody's phone numbers from lurking telemarketers with the following one-liner in Perl: 8<---------- # perl -p -e 's/\b(\d{3}-)?\d{3}-\d{4}/####/g' memo.txt Ms. Antoinette, I spoke with George Washington at #### about our pending contract. He referred me to Ben Franklin (####) of their technical support department. Ben said that they were testing their latest release on WhizBangOS version 17.3 as we had requested, and that he would have our answer tomorrow. Ben also said that their lead developer, Betsy Ross, would like to talk to you about the use of complex numbers in the SystemSleepFor function. You may call her office at ####, her cell phone at ####, or page her at ####. She is very eager to describe this new feature. Sincerely, Thomas Paine 8<---------- Anyone is welcome to propose a minimalist solution in REBOL! -jn-

 [4/47] from: carl:cybercraft at: 14-Dec-2001 20:49


On 14-Dec-01, Joel Neely wrote:
> That said, I can play parlor games too! (To avoid appearing to > contradict myself, let me point out that I don't take this exercise
<<quoted lines omitted: 36>>
> 8<---------- > Anyone is welcome to propose a minimalist solution in REBOL!
A two-liner is the best I can do Joel... rebol[]foreach n parse/all f: read %memo.txt " ^/(),."[if 1 < length? parse/all n "-"[change/part find f n "####" length? n]]print f It could've been a bit shorter by making duplicate 1-character words for 'parse and 'length but we want to keep it "readable", right? (: -- Carl Read

 [5/47] from: joel:neely:fedex at: 14-Dec-2001 6:43


Hi, Carl, Close, but no cigar!
> > I can protect everybody's phone numbers from lurking telemarketers > > with the following one-liner in Perl: > > > 8<---------- > > # perl -p -e 's/\b(\d{3}-)?\d{3}-\d{4}/####/g' memo.txt > > 8<---------- >
I should confess to an obvious typo; the above expression should have read s/\b(\d{3}-)?\d{3}-\d{4}\b/####/g of course! How silly of me!
> > Anyone is welcome to propose a minimalist solution in REBOL! > > A two-liner is the best I can do Joel... > > rebol[]foreach n parse/all f: read %memo.txt " ^/(),."[if 1 < length? > parse/all n "-"[change/part find f n "####" length? n]]print f >
The meaning of the Perl regular expression is very specific: s/ # substitute for this pattern... \b # boundary (e.g. whitespace or beginning of line) ( # begin subpattern \d{3} # exactly three digits - # followed by a hyphen )? # end subpattern and make it optional \d{3} # exactly three digits - # followed by a hyphen, \d{4} # exactly four more digits, and \b # a boundary (whitespace or end of line) /####/gx; # ... four octothorps wherever possible Other occurrences of hyphens are not relevant. (The above is also legal Perl, BTW.) This means that if the memo reads: 8<---------- Ms. Antoinette, On 14-Dec-2001 I spoke with George Washington at 555-1212 about our pending contract. He referred me to Ben Franklin (800-555-1111) of their technical support department. Ben said that they were testing their latest release (described in the letter from Albert Jones-Smythe sent on 12-01-2001) on WhizBangOS version 17.3 as we had requested in our memo of 28-Nov-2001, and that he would have our answer tomorrow. Ben also said that their lead developer, Betsy Ross, would like to talk to you about the use of complex numbers in the SystemSleepFor function. You may call her office at 123-4576; her cell phone is 987-6543; her pager is 111-1111. She is very eager to describe this new feature. Sincerely, Thomas Paine (8<---------- the Perl one-liner produces: 8<---------- Ms. Antoinette, On 14-Dec-2001 I spoke with George Washington at #### about our pending contract. He referred me to Ben Franklin (####) of their technical support department. Ben said that they were testing their latest release (described in the letter from Albert Jones-Smythe sent on 12-01-2001) on WhizBangOS version 17.3 as we had requested in our memo of 28-Nov-2001, and that he would have our answer tomorrow. Ben also said that their lead developer, Betsy Ross, would like to talk to you about the use of complex numbers in the SystemSleepFor function. You may call her office at ####; her cell phone is ####; her pager is ####. She is very eager to describe this new feature. Sincerely, Thomas Paine 8<---------- but the REBOL code above does this instead: 8<---------- Ms. Antoinette, On #### I spoke with George Washington at #### about our pending contract. He referred me to Ben Franklin (####) of their technical support department. Ben said that they were testing their latest release (described in the letter from Albert #### sent on ####) on WhizBangOS version 17.3 as we had requested in our memo of ####, and that he would have our answer tomorrow. Ben also said that their lead developer, Betsy Ross, would like to talk to you about the use of complex numbers in the SystemSleepFor function. You may call her office at #### her cell phone is #### her pager is ####. She is very eager to describe this new feature. Sincerely, Thomas Paine 8<---------- Notice that the dates and hyphenated last name were blotted out as well as the phone numbers. Adding the necessary tests for character class and length would make the REBOL version noticeably longer. -jn-
> It could've been a bit shorter by making duplicate 1-character words > for 'parse and 'length but we want to keep it "readable", right? (: >
Right! ;-) Also, including the definitions for P = PARSE and L = LENGTH would just about eat up the 20-character savings anyway, and the unspoken "fairness" rule is that any definitions outside the standard installation features of the language would be included. -jn- -- ; sub REBOL {}; sub head ($) {@_[0]} REBOL [] # despam: func [e] [replace replace/all e ":" "." "#" "@"] ; sub despam {my ($e) = @_; $e =~ tr/:#/.@/; return "\n$e"} print head reverse despam "moc:xedef#yleen:leoj" ;

 [6/47] from: petr:krenzelok:trz:cz at: 14-Dec-2001 14:41


Hi, Carl used parse without any need to define a rule, e.g. parse some-string none will give you a block of strings separated by space But why not to use rule? Because of having everything on one line? OK, with rebol - it is pretty legal to have whole script on one line (if you don't use comments of course :-), so: digits: charset "0123456789" spacer: #"-" tel-num: [3 digits spacer some [digits | spacer]] parse/all text [some [start: tel-num end: (print copy/part start end) | skip]] 555-1212 800-555-1111 123-4576 987-6543 111-1111 == true I know that tel-num rule allows for some 999------, but that is sufficient for our demo ... Just put on one line, shorten word names, and replace rebol code in parens to get your result ... If someone finds easier solution (e.g. iteration based) for my string replacement, cool then :-) hide-it: func [s e][ l: length? copy/part s e remove/part s e insert/dup s # l) parse/all text [some [start: tel-num end: (hide-it) | skip]] ; <----- clear and readable, isn't it? :-) Well, on the other hand not so straightforward as your Perl example ... Cheers, -pekr-

 [7/47] from: doug:vos:eds at: 14-Dec-2001 9:46


Thought I would reply to both real quick. John correctly interpreted my original remarks, as simply being 'funny flamebait' - and provided a great reply in the form of a usable perl 1 liner. John Sequeira wrote thus { Can't... resist ... replying to .... flamebait ...
>> perl -MDate::Manip -e "print q(bad date) unless &ParseDate( q(
29-Feb-1999 ) )" } I also want to commend the ever so wise, poly-linguistic, Joel Neely, for his sage advise, and practical reminder about ROI. When we talk about ROI in real world business, there are obviously situations where languages like Java and Perl should be leveraged to get the job done quickly. Because of the huge base of code already written in perl for the past 14 years, it has advantages. See for instance the reboLDAP gateway project, http://vvn.net/reboldap which is uses both REBOL and PERL to get the job done quickly for the fastest ROI ... When you talk about tools, (eg. toothpick, toothbrush, hammer, chisel, chain-saw, jack-hammer, duct-tape, paper-clip) you must pick the right one for the job. I have no interest in promoting a competition of one-liners. Simplicity is what I'm after. Even organic simplicity. Consider the lowly earthworm -- A great tool for converting garbage into good soil. Complex solutions don't impress me. Simple solutions to complex problems impress me. - Doug

 [8/47] from: rotenca:telvia:it at: 14-Dec-2001 16:30


Hi Petr,
> But why not to use rule? Because of having everything on one line? OK, with > rebol - it is pretty legal to have whole script on one line (if you don't > use comments of course :-), so:
For joke not for war, this is mine one-istruction one-line solution: print (parse/all s: read %memo.txt [(digit: charset [#"0" - #"9"] div: charset [#"^/" #"^-" #"(" # ) #";" #"." #"," #"{" #"}"]) any [[opt div h: [3 digit "-" 3 digit "-" | 3 digit "-" ] 4 digit any digit h1: opt div (change h head insert/dup clear "" "#" (index? h1) - index? h)] | skip]] s) divisors in div can be changed as you like. --- Ciao Romano

 [9/47] from: rpgwriter:y:ahoo at: 14-Dec-2001 10:17


--- Joel Neely <[joel--neely--fedex--com]> wrote:
> Hi, Doug, > "Vos, Doug" wrote:
<<quoted lines omitted: 164>>
> Anyone is welcome to propose a minimalist solution > in REBOL!
I haven't used parse much, but it seems like a fairly trivial application of it.

 [10/47] from: chris:starforge at: 14-Dec-2001 19:16


#Friday 14 December 2001 18:17# Message from Christopher Dicely:
> I haven't used parse much, but it seems like a fairly > trivial application of it.
I don't mean to complain, but was it really necessary to quote 8k of text just to add a two line comment? Chris -- .------{ http://www.starforge.co.uk }-----. .---------------------------. =[ Explorer2260, Designer and Coder \=\ P: TexMaker, ROACH, site \ =[___You_will_obey_your_corporate_masters___]==[ Stack: EETmTmTRRSS------- ] -- Confession is good for the soul only in the sense that a tweed coat is good for dandruff. -- Peter de Vries

 [11/47] from: rpgwriter:yaho:o at: 14-Dec-2001 16:13


--- Chris <[chris--starforge--co--uk]> wrote:
> #Friday 14 December 2001 18:17# Message from > Christopher Dicely: > > > > I haven't used parse much, but it seems like a > fairly > > trivial application of it.
Well, no, it wasn't, and I actually meant to back out of that response, not send it. Sorry. Chris

 [12/47] from: carl:cybercraft at: 15-Dec-2001 22:52


On 15-Dec-01, Joel Neely wrote:
> Hi, Carl,
Hi Joel,
> Close, but no cigar!
Aw, gee...
>> I can protect everybody's phone numbers from lurking telemarketers >> with the following one-liner in Perl:
<<quoted lines omitted: 84>>
> character class and length would make the REBOL version > noticeably longer.
Well, you didn't say it was for any old phone numbers in any old text, did you? Will your Perl script get this right for instance... {New Zealand Telecom's phone numbers generally include spaces as can be seen by a quick look at http://www.yellowpages.co.nz/ (Search for hotel or the like.) So these are standard NZ phone numbers... National: 1-2-345 6789 Local: 345 6789 0800: 0800 123 456 } ? (: But anyway, the following will parse your 3-4/3-3-4 phone syntax (I think)... rebol[]c: charset "0123456789" parse/all f: read %memo.txt[some [a: 1 2[3 c "-"]4 c b:(change/part a "####" b) | skip]]print f and it's noticably (6 characters:) shorter (not longer!) than my previous version. (: I couldn't have done it without Petr's example though, as what I was trying wasn't working. (: But I now know a lot more about parsing than I did yesterday... -- Carl Read

 [13/47] from: tomc:darkwing:uoregon at: 15-Dec-2001 2:09


Hmmm While playing with a one liner for this thread I came across a disturbing difference between core on Solaris and Win. here I have split the offending line with a print to see what was happening; --><-- d: charset{#{0}-#{9}}a: complement d parse m: read %foo r:[any a p: (print copy/part :p 8) opt[3 d{-}4 d(change/part :p {###-####} 8)any r]skip any r] --><-- using the same text (from below) on win it works fine, on solaris it fails badly on windows ... 555-1212 800-555- 00-555-1 0-555-11 555-1111 17.3 as 7.3 as w 3 as we 123-4576 987-6543 111-1111 on solaris ... -1212 ab 00-555-1 0-555-11 -555-111 -1111) o -4576, h 987-6543 -6543, o -1111. S

 [14/47] from: carl:cybercraft at: 15-Dec-2001 23:12


On 15-Dec-01, Petr Krenzelok wrote:
> Hi, > Carl used parse without any need to define a rule, e.g. parse > some-string none will give you a block of strings separated by space
I planned to use a rule at the start, but then noticed that just a check of length would do the trick. But then Joel changed the rules... (;
> But why not to use rule? Because of having everything on one line? > OK, with rebol - it is pretty legal to have whole script on one line
<<quoted lines omitted: 17>>
> parse/all text [some [start: tel-num end: (hide-it) | skip]] > ; <----- clear and readable, isn't it? :-)
The height of clarity - but it doesn't work. (: The hide-it func ends in a ")" not a "]" and where it's used in the parse line the 's and 'e parameters aren't given. I take it you didn't test it? (: But never mind. When fixed it works, unlike what I was doing till I read your post...
> Well, on the other hand > not so straightforward as your Perl example ...
Perhaps Joel would now like to write a version in Perl that's designed to be as clear as possible as apposed to as short as possible?
> Cheers, > -pekr-
-- Carl Read

 [15/47] from: carl:cybercraft at: 16-Dec-2001 0:44


On 15-Dec-01, Tom Conlin wrote:
> Hmmm > While playing with a one liner for this thread I came across a
<<quoted lines omitted: 6>>
> opt[3 d{-}4 d(change/part :p {###-####} 8)any r]skip any r] > --><--
On Amiga I get the same results as Solaris. I at first thought it might've been an end-of-line problem, but it somehow doesn't look like that, though have you tried read/string or any other of the likely 'read refinements? And are you sure the scripts are exactly the same, along with the text-files?
> using the same text (from below) > on win it works fine, on solaris it fails badly
<<quoted lines omitted: 65>>
>> [rebol-request--rebol--com] with "unsubscribe" in the >> subject, without the quotes.
-- Carl Read

 [16/47] from: joel:neely:fedex at: 15-Dec-2001 4:00


Hi, Doug, Vos, Doug wrote:
> Thought I would reply to both real quick. > > John correctly interpreted my original remarks, > as simply being 'funny flamebait' ... >
He who posts flamebait had best be wearing asbestos long johns... ;-) Seriously, I didn't intend to sound negative or critical, but I have a long history with language "holy wars" going back to the days when the wars were over FORTRAN vs. COBOL vs. Assembler. Almost every language I know has *something* it can do more compactly than almost all other languages. Hence my own knee- jerk reaction that pulling out one such example proves little. As for the rest, consider my experience of showing REBOL to a Perl bigo^H^H^H^Hfan whose knee-jerk reaction was that since regular expressions are more compact than PARSE rules, REBOL must be stupid. (Those weren't the exact words, but the attitude was clear.) You can imagine how happy I was(N'T) over *that* reaction. -jn- -- Pardon the french... -- Gustav Malmfors (in an international mailing list) joel!dot!neely!FIX!PUNCTUATION!at!fedex!dot!com

 [17/47] from: joel:neely:fedex at: 15-Dec-2001 4:04


Hi, Doug, Vos, Doug wrote:
> Thought I would reply to both real quick. > > John correctly interpreted my original remarks, > as simply being 'funny flamebait' ... >
He who posts flamebait had best be wearing asbestos long johns... ;-) Seriously, I didn't intend to sound negative or critical, but I have a long history with language "holy wars" going back to the days when the wars were over FORTRAN vs. COBOL vs. Assembler. Almost every language I know has *something* it can do more compactly than almost all other languages. Hence my own knee- jerk reaction that pulling out one such example proves little. As for the rest, consider my experience of showing REBOL to a Perl bigo^H^H^H^Hfan whose knee-jerk reaction was that since regular expressions are more compact than PARSE rules, REBOL must be stupid. (Those weren't the exact words, but the attitude was clear.) You can imagine how happy I was(N'T) over *that* reaction. -jn- -- I can't understand why people are frightened of new ideas. I'm frightened of the old ones. -- John Cage joel+dot+neely+at+fedex+dot+FIX+PUNCTUATION+com

 [18/47] from: joel:neely:fedex at: 15-Dec-2001 4:40


Carl Read wrote:
> Well, you didn't say it was for any old phone numbers in any > old text, did you? >
No, but I did say "memo"; I suspect most of us would consider Albert Jones-Smythe ... on 12-01-2001 ... of 28-Nov-2001 as neither unusual text nor phone numbers. ;-)
> Will your Perl script get this right for instance... > So these are standard NZ phone numbers... > > National: 1-2-345 6789 > Local: 345 6789 > 0800: 0800 123 456 >
No, it wouldn't. I should have noted my deliberate decision to handle only a common form of US phone numbers (but the regular expression was certainly clear on that point ;-). Of course, the big danger of trying to write code that handles arbitrary human-written/readable text is that humans can use their context-sensitive intelligence to interpret (and even correct) a wide range of syntactical variation. The only way around that AFAIK is to impose limits on the amount of variation before the program either gives up or raises the case for discussion. For example, the simplest substitution rule that covers all of your NZ samples, as well as common variations on US phone numbers would be s/\b[- .()\d]{8,}\b/####/g which would replace *any* run (of at least 8 characters) of digits, hyphens, spaces, dots, and parentheses with the "blot-out" string. But, of course, that also would affect strings such as 31-12-2001 and 3.1415926535 which a human would likely *not* interpret as phone numbers. IIRC, there was a thread a few months ago about trying to come up with a PARSE rule which would recognize phone numbers from as many countries as possible with as few errors (false positives and false negatives) as possible. As I recall, the result was that the range of variation as one added countries with differing conventions rapidly made the task infeasible.
> rebol[]c: charset "0123456789" parse/all f: read %memo.txt[some > [a: 1 2[3 c "-"]4 c b:(change/part a "####" b) | skip]]print f >
Very nice! I find it hard to imagine a shorter solution in REBOL.
> I couldn't have done it without Petr's example though, as what > I was trying wasn't working. (: But I now know a lot more about > parsing than I did yesterday... >
And *THAT*, to my way of thinking, is the a real payoff to such parlor games as this -- we improve our grasp of what is (or isn't) feasible with one tool or another, and learn to use our tool(s) better. -jn- -- One job listing included the qualification "sense of humor" just after "ability to meet stringent deadlines". Man, ain't that the truth! -- Erik Naggum joel:dot:neely:at:fedex:FIX:PUNCTUATION:dot:com

 [19/47] from: petr:krenzelok:trz:cz at: 15-Dec-2001 18:01


Carl Read wrote:
>>hide-it: func [s e][ l: length? copy/part s e remove/part s e >>insert/dup s "#" l)
<<quoted lines omitted: 7>>
>But never mind. When fixed it works, unlike what I was doing till I >read your post...
Ah, sorry - it was a last minute hack, to have paren in parse rule short and readable :-) Now as I think of it, it would be easier to have some buff: {} to copy result to, and build string from scratch ... But anyway .... I would still prefer building clear and readable rules, even on several lines and put them into separate .r file, instead of having messy code as shown with perl. -pekr-

 [20/47] from: petr:krenzelok:trz:cz at: 15-Dec-2001 18:09


Joel Neely wrote:
>Carl Read wrote: >>Well, you didn't say it was for any old phone numbers in any
<<quoted lines omitted: 40>>
>> >Very nice! I find it hard to imagine a shorter solution in REBOL.
eh, I can't read it right now - what does 1 2 [3 c "-"] 4 c is doing? :-) Thanks ... I also found out there is some strange 'opt keyword in parse rules :-)
>>I couldn't have done it without Petr's example though, as what >>I was trying wasn't working. (: But I now know a lot more about
<<quoted lines omitted: 4>>
>feasible with one tool or another, and learn to use our tool(s) >better.
Yes! That's exactly it. Language flame wars could be seen so often, that I am no longer interested in them :-) On the other hand I have no time to stufy other languages. Sometimes though I call my friend knowing php e.g. and ask him, how would he solve it in the tool he uses. Then I try to think about it and map to what I know about Rebol, and try to find adequate solution ... -pekr-

 [21/47] from: joel:neely:fedex at: 15-Dec-2001 5:12


Hi, Carl, Carl Read wrote:
> I planned to use a rule at the start, but then noticed that just a > check of length would do the trick. But then Joel changed the > rules... (; >
Moi? "Changed"? Mais non! Just "clarified" for those who don't read REs (but can't everybody read regular expressions? ;-)
> > > Well, on the other hand > > not so straightforward as your Perl example ... > > Perhaps Joel would now like to write a version in Perl that's > designed to be as clear as possible as apposed to as short as > possible? >
To the everyday Perl hacker, what I published actually is both! No, really! ;-) The only things I could do to obviousify the script would be to write an explicit read/print loop (that's what the -p switch does) and embed comments in the RE that is the match pattern for the substitution. That would give us something like the following: 8<---------- #!/usr/bin/perl -w while (<>) { # loop over all input (all file arguments) s/ # substitute in current line for this pattern... \b # boundary (e.g. whitespace or beginning of line) ( # begin subpattern \d{3} # exactly three digits - # followed by a hyphen )? # end subpattern and make it optional \d{3} # exactly three digits - # followed by a hyphen, \d{4} # exactly four more digits, and \b # a boundary (whitespace or end of line) /####/gx; # ... four octothorps wherever possible print; # print current line after substitution(s) } 8<---------- but the addition of all the comments is hardly an improvement to a Perl programmer. That would be like showing someone who knows elementary algebra a paragraph of text that describes the quadratic formula, instead of simply writing (pardon the ASCII art...) ____________ + / 2 -b - / b - 4 a c V ------------------- 2 a The notation really is intended to be minimalist; the price of using it is taking a little time to learn something new, as is the case with all programming languages. I suspect the person that has never seen REBOL before would find the comparable PARSE rule less than obvious as well. Of course, I could define variables to hold the parts of the RE and give them mnemonic names ... 8<---------- #!/usr/bin/perl -w my $areacode = '(\d{3}-)?'; # 3 digits and hyphen, optional my $exchange = '\d{3}-'; # 3 digits and hyphen my $localine = '\d{4}'; # 4 digits my $phonenbr = "$areacode$exchange$localine"; while (<>) { # loop over all input s/\b$phonenbr\b/####/gx; # hiding phone numbers in each line print; # and print the line } 8<---------- ... but anyone who knows Perl will see that I had to do something subtle to make that work. Perhaps that would be more readable to some? What would one do in REBOL to make the PARSE rules more obvious to someone who doesn't speak REBOL as a native? -jn- -- The hardest problem in computer science is finding a problem to solve your solution. -- Aaron Watters joel(dot(FIX(PUNCTUATION(neely(at(fedex(dot(com

 [22/47] from: joel:neely:fedex at: 15-Dec-2001 7:00


Petr Krenzelok wrote:
> > > >>rebol[]c: charset "0123456789" parse/all f: read %memo.txt[some
<<quoted lines omitted: 6>>
> eh, I can't read it right now - what does 1 2 [3 c "-"] 4 c > is doing?
One or two occurrences of (three digits and a hyphen), all followed by four digits. It seems to me that there's a nice process of refactoring/refinement underlying that one: [c c c "-" c c c c] | [c c c "-" c c c "-" c c c c] [3 c "-" 4 c] | [3 c "-" 3 c "-" 4 c] [[3 c "-"] | [3 c "-" 3 c "-"]] 4 c 1 2 [3 c "-"] 4c -jn- -- A one-liner is a joke, not a comment. -- Christopher Hart joel)dot)neely)at)fedex)FIX)PUNCTUATION)dot)com

 [23/47] from: ammonjohnson:yaho:o at: 15-Dec-2001 13:24


<snip> > >>parsing than I did yesterday...
<<quoted lines omitted: 12>>
> adequate solution ... > -pekr-
</Snip> I found that is the most powerful way to program. ;-) Enjoy!! Ammon

 [24/47] from: joel:neely:fedex at: 15-Dec-2001 7:04


Petr Krenzelok wrote:
> .... I would still prefer building clear and readable rules, > even on several lines and put them into separate .r file, > instead of having messy code as shown with perl. >
What aspect strikes you as "messy"? (Not arguing, just curious as to which of the many ways -- including some not discussed on this list -- you might find less "messy"?) I'll certainly grant that Perl can be cryptic to someone who has not learned some of the notation, but (again) that's true of almost EVERY notation, especially those that are highly compact. -jn- P.S.: The sig block below was selected for this email by a random number generator. Coincidence or not? You be the judge! ;-) -- Real Men don't read instruction manuals. -- Tim Allen (Home Improvement) joel)dot)FIX)PUNCTUATION)neely)at)fedex)dot)com

 [25/47] from: carl:cybercraft at: 16-Dec-2001 8:32


On 16-Dec-01, Petr Krenzelok wrote:
>>> rebol[]c: charset "0123456789" parse/all f: read %memo.txt[some >>> [a: 1 2[3 c "-"]4 c b:(change/part a "####" b) | skip]]print f
<<quoted lines omitted: 4>>
> eh, I can't read it right now - what does 1 2 [3 c "-"] 4 c is > doing? :-)
Find 1 or 2 "000-"s followed by a "0000". The part of my code that was working before I had to steal some of yours to make the rest of it work. (: -- Carl Read

 [26/47] from: carl:cybercraft at: 16-Dec-2001 8:01


On 15-Dec-01, Joel Neely wrote:
> IIRC, there was a thread a few months ago about trying to come > up with a PARSE rule which would recognize phone numbers from > as many countries as possible with as few errors (false positives > and false negatives) as possible. As I recall, the result was > that the range of variation as one added countries with differing > conventions rapidly made the task infeasible.
Part of the problem is RT's decision to allow 12-12-2001 as a date datatype along with other forms for dates. If they hadn't hyphens-in-digits could've been used as a phone datatype, though I guess there's still places where 4 digits are typical phone numbers. But then, to-phone would get around that, as with to-decimal 10.
>> rebol[]c: charset "0123456789" parse/all f: read %memo.txt[some >> [a: 1 2[3 c "-"]4 c b:(change/part a "####" b) | skip]]print f
<<quoted lines omitted: 6>>
> feasible with one tool or another, and learn to use our tool(s) > better.
Yep. -- Carl Read

 [27/47] from: carl:cybercraft at: 16-Dec-2001 8:20


On 15-Dec-01, Joel Neely wrote:
>> Perhaps Joel would now like to write a version in Perl that's >> designed to be as clear as possible as apposed to as short as
<<quoted lines omitted: 38>>
> that has never seen REBOL before would find the comparable > PARSE rule less than obvious as well.
Fair enough, though there is a "read", a "parse" and a "change" in my REBOL example which does give some indication of what's being done with the file-name, while the file-name is the only thing any outsider would recognise in your Perl example.
> Of course, I could define variables to hold the parts of the RE > and give them mnemonic names ...
<<quoted lines omitted: 15>>
> doesn't speak REBOL as a native? > -jn-
-- Carl Read

 [28/47] from: rotenca:telvia:it at: 15-Dec-2001 21:34


Are these correct?
>>f: "123-123-234-2112-3444" parse/all f [some[a: 1 2[3 c "-"]4 c
b:(change/part a "####" b) | skip] ]print f 123-####-3444
>> f: "123-234-2112whatisthis?" parse/all f [some[a: 1 2[3 c "-"]4 c
b:(change/part a "####" b) | s kip]]print f ####whatisthis? f: "k-123+123-2112*2/4" parse/all f [some[a: 1 2[3 c "-"]4 c b:(change/part a #### b) | skip]]p rint f k-123+####*2/4
>> f: "k123123-2112-3444" parse/all f [some[a: 1 2[3 c "-"]4 c b:(change/part
a "####" b) | skip]]pri nt f k123####-3444 --- Ciao Romano

 [29/47] from: carl:cybercraft at: 16-Dec-2001 12:20


On 15-Dec-01, Joel Neely wrote:
> Of course, I could define variables to hold the parts of the RE > and give them mnemonic names ...
<<quoted lines omitted: 12>>
> subtle to make that work. > Perhaps that would be more readable to some?
Even without the comments it at least lets you know it's doing something with phone numbers, though you wouldn't pick up they were being blanked out just by the "####" in the line.
> What would one do > in REBOL to make the PARSE rules more obvious to someone who > doesn't speak REBOL as a native?
Well, this is how we're more or less supposed to write REBOL code... rebol[] blank-phone-numbers: func [ {Blanks out any phone numbers found in a string. Only phone numbers of the form: ###-#### or ###-###-#### are changed.} text-file [string!] /local blank-number digits match-start match-end ][ blank-number: func [num-start num-end][ change/part num-start "########" num-end ] digits: charset "0123456789" parse/all text-file [ some [ match-start: 1 2 [3 digits "-"] 4 digits match-end: (blank-number match-start match-end) | skip ] ] ] text-file: read %memo.txt blank-phone-numbers text-file print text-file ...and of course nearly always do. (; Apart from the longer "####" (to reduce line-shrinkage) that peforms the same as my two-liner. And someone might just twig that... 1 2 [3 digits "-"] 4 digits is how it finds the phone numbers and be able to change it to catch (say)... ##-#### or ##-##-#### 1 2 [2 digits "-"] 4 digits or... ###-###-###-#### or ###-###-#### or ###-#### 1 3 [3 digits "-"] 4 digits and so on. (Made up phone numbering formats of course.) -- Carl Read

 [30/47] from: carl:cybercraft at: 16-Dec-2001 13:25


On 16-Dec-01, Romano Paolo Tenca wrote:
> Are these correct? >>> f: "123-123-234-2112-3444" parse/all f [some[a: 1 2[3 c "-"]4 c > b:(change/part a "####" b) | skip] > ]print f > 123-####-3444
No - that's not correct. Joel says phone numbers are of the form ###-#### or ###-###-#### so phoning "123-123-234-2112-3444" would give you a number-not-found error...
>>> f: "123-234-2112whatisthis?" parse/all f [some[a: 1 2[3 c "-"]4 c > b:(change/part a "####" b) | s > kip]]print f > ####whatisthis?
No, "whatisthis?" is not how you write sentences - words are supposed to have spaces between them...
> f: "k-123+123-2112*2/4" parse/all f [some[a: 1 2[3 c "-"]4 c > b:(change/part a "####" b) | skip]]p
<<quoted lines omitted: 5>>
> nt f > k123####-3444
Well, 123-4567 could be a phone number or arithmatic too - though to REBOL's mind it's an invalid date. But the rule is, if in doubt, blot it out! (: So, can you write text with phone numbers in of the form ###-#### or ###-###-#### that /won't/ be blotted out by that parsing rule? (Hint: You can! :) -- Carl Read

 [31/47] from: joel:neely:fedex at: 16-Dec-2001 1:27


Hi, Romano, Romano Paolo Tenca wrote:
> Are these correct? >
No. And the simple Perl one-liner I offered doesn't get them all right either.
> >>f: "123-123-234-2112-3444" parse/all f [some[a: 1 2[3 c "-"] > 4 c b:(change/part a "####" b) | skip]]print f
<<quoted lines omitted: 8>>
> 4 c b:(change/part a "####" b) | skip]]print f > k123####-3444
Adding your test cases to the memo produces the following when run through my simple pattern match/substitute... 8<---------- Ms. Antoinette, On 14-Dec-2001 I spoke with George Washington at #### about our pending contract. He referred me to Ben Franklin (####) of their technical support department. Ben said that they were testing their latest release (described in the letter from Albert Jones-Smythe sent on 12-01-2001) on WhizBangOS version 17.3 as we had requested in our memo of 28-Nov-2001, and that he would have our answer tomorrow. Ben also said that their lead developer, Betsy Ross, would like to talk to you about the use of complex numbers in the SystemSleepFor function. You may call her office at ####; her cell phone is ####; her pager is ####. She is very eager to describe this new feature. Sincerely, Thomas Paine P.S.: Here are Romano's "torture tests" for phone hiding: 123-####-3444 123-234-2112whatisthis? k-123+####*2/4 k123123-2112-3444 8<---------- The \b actually matches "word boundaries" separating "word characters" (which could be used in a Perl identifier) from non-word characters (all others, including beginning/end of line). The "-" in the first torture case and the "+" and * in the third torture case act are non-word, so they allow the embedded digit-hyphen-digit... pattern to be found. However, I suspect that the torture cases above have taken us well into what I think of as "fractal territory", a metaphor (based on the Mandelbrot set) for my experience in some kinds of programming tasks. The entire Mandelbrot set lies within a circle in the complex plane centered on the origin, with a radius of 2. If you just want to know that you've enclosed the entire Mandelbrot set, draw that circle and say, "It's in there." If you need a bit more detail, it's approximately a cardioid with the dimple on the right, and a smaller circle attached at the left. If you really insist on getting all of the details correct, you'll be crunching numbers forever. By analogy, many problems I've worked with (e.g., parsing phone numbers, mailing addresses, and other human-interpretable text) have a trivial solution that's only right in the most general sense, and more detailed solutions that improve the precision. However, there's almost always some exceptional case that defies the solution at hand, and does so in such a way that one either has to add special-case logic or throw the entire approach away and start over from scratch. The complexity never goes away completely... I've run into this so many times that I've even claimed naming rights to a new fundamental principle: Neely's First Law of Systems (also known as "Monotonicity of Complexity") Complexity is like entropy; you can't decrease it and doing almost anything increases it. You can hide it, cover it up, pretend it's not there (until later), or make it somebody else's problem, but it won't go away. ;-) To get back to your examples, as soon as we start trying to "guard" the phone number pattern from pathological contexts, we step into a swamp that I don't know a good way around. For example: * We could say "must be bounded by spaces (or line ends)", but then we get tripped up if a phone number is at the end of a sentence (i.e. followed by a period). * Ooops. What about complicated part/document ID numbers, such as "123-4567.890"? Well, the period at the end of a sentence must be followed by whitespace or the end of the line. * Ooops. What about inside a compound sentence (e.g. followed by a comma or semicolon)? OK; add those to the set of allowed "follow" patterns. * Ooops. What about bounded by parentheses? OK; add "(" to the set of leaders, and ")" to the set of followers. * Ooops. What about the US convention of placing the area code in parentheses, such as "(800) 555-1212"? ... and the list goes on ... At some point, I don't know any better solution than to say, "This will have to be good enough", put it into production, and deal with any subsequent oddities as they arise. Another of my quote-file entries says... Peter Salus -- The difference between theory and practice in theory is smaller than the difference between theory and practice in practice. As always, I'd be very interested to hear if anyone else has a better solution to this meta-problem (or is that "meta-solution"?) -jn- -- With every passing hour our solar system comes forty-three thousand miles closer to globular cluster M13 in the constellation Hercules, and still there are some misfits who continue to insist that there is no such thing as progress. -- Ransom K. Ferm joel^dot^FIX^PUNCTUATION^neely^at^fedex^dot^com

 [32/47] from: joel:neely:fedex at: 16-Dec-2001 2:21


Hi, Carl, Context is all... Carl Read wrote:
> On 15-Dec-01, Joel Neely wrote: > > > Of course, I could define variables to hold the parts of the RE > > and give them mnemonic names ...
...
> > Perhaps that would be more readable to some? > > Even without the comments it at least lets you know it's doing > something with phone numbers... > > > What would one do in REBOL to make PARSE rules more obvious ... > > Well, this is how we're more or less supposed to write REBOL > code... >
... 26 lines snipped ...
> ...and of course nearly always do. (; >
I think a big issue is the setting in which the code is written; I'd certainly write differently in a book/tutorial/article for beginners (where the assumption is that every detail NEEDS to be explained) than in my normal work setting (where the assumption is that my collegues are all fairly skilled programmers). In the latter case, I suspect that # # hide US phone numbers, with optional area codes # s/\b(\d{3}-)?\d{3}-\d{4}\b/####/g; is quite adequate. I guess I see it as the string-processing analogue of something like the following: my $DAYS_TO_LIVE = 45; # delete log entries after this ... my $expiration = $DAYS_TO_LIVE * 24 * 60 * 60 + time; Where I expect a professional programmer to know * that time is in seconds since epoch, * hours per day, minutes per hour, and seconds per minute, * that $DAYS_TO_LIVE is the only number capable of being revised by a change in policy. And all of the work of defining names for the obvious (and immutable!) constants doesn't add any real value. However, audience is critical, and YMMV! -jn- -- I pretty much arbitrarily release whatever I have ready on the night before the release date. -- Guido van Rossum joel'dot'neely'at'fedex'FIX'PUNCTUATION'dot'com

 [33/47] from: joel:neely:fedex at: 16-Dec-2001 2:24


Hi, Carl, Context is all... Carl Read wrote:
> On 15-Dec-01, Joel Neely wrote: > > > Of course, I could define variables to hold the parts of the RE > > and give them mnemonic names ...
...
> > Perhaps that would be more readable to some? > > Even without the comments it at least lets you know it's doing > something with phone numbers... > > > What would one do in REBOL to make PARSE rules more obvious ... > > Well, this is how we're more or less supposed to write REBOL > code... >
... 26 lines snipped ...
> ...and of course nearly always do. (; >
I think a big issue is the setting in which the code is written; I'd certainly write differently in a book/tutorial/article for beginners (where the assumption is that every detail NEEDS to be explained) than in my normal work setting (where the assumption is that my collegues are all fairly skilled programmers). In the latter case, I suspect that # # hide US phone numbers, with optional area codes # s/\b(\d{3}-)?\d{3}-\d{4}\b/####/g; is quite adequate. I guess I see it as the string-processing analogue of something like the following: my $DAYS_TO_LIVE = 45; # delete log entries after this ... my $expiration = $DAYS_TO_LIVE * 24 * 60 * 60 + time; Where I expect a professional programmer to know * that time is in seconds since epoch, * hours per day, minutes per hour, and seconds per minute, * that $DAYS_TO_LIVE is the only number capable of being revised by a change in policy. And all of the work of defining names for the obvious (and immutable!) constants doesn't add any real value. However, audience is critical, and YMMV! -jn-

 [34/47] from: g:santilli:tiscalinet:it at: 16-Dec-2001 14:42


Hello Joel! On 15-Dic-01, you wrote: JN> Perl bigo^H^H^H^Hfan whose knee-jerk reaction was that since JN> regular expressions are more compact than PARSE rules, REBOL JN> must be stupid. (Those weren't the exact words, but the Personally I think we should never stress "compactness" about REBOL (except perhaps for the interpreter itself), but "simplicity". Also, maybe if someone writes a regexp parser we can show to all those perl fans that we can do that too. ;-)) (Then maybe someone of them could write a PARSE-like function for perl, so that we'll get the extra benefit of making perl a bit better --- even if that will probably mean +5MB in the distribution. ;) (Ok, ok, couldn't resist a bit of irony, please forgive me... :) Regards, Gabriele. -- Gabriele Santilli <[giesse--writeme--com]> - Amigan - REBOL programmer Amiga Group Italia sez. L'Aquila -- http://www.amyresource.it/AGI/

 [35/47] from: greggirwin:mindspring at: 16-Dec-2001 12:36


Hi Joel, << As always, I'd be very interested to hear if anyone else has a better solution to this meta-problem (or is that "meta-solution"?) >> It seems to me that you pointed it out in an earlier message when you made reference to how humans would identify phone numbers given potentially ambiguous data by using context. This is right up REBOL's alley, though far from the one liner approach. :) Do you recall my solution to your "table parser" dilemma (look at things like a human)? Same idea, different rules and, to be really good, it should learn. In this case you have some basic rules to start with, given your example text: [opt proper-name] 'at [phone-num] [proper-name] '( [phone-num] ') ['cell | 'phone | 'mobile...] [opt 'number] 'is [phone-num] What makes up a phone number? groups of digits, with various separators between. When you hit alpha chars (probably with special checks for "PIN", "x", "ext" followed by more digits, you're done with the number. Hmmm. Maybe a quick check for mutiple numbers? [phone-num] 'or [phone-num] Now, you run it and it shows you the results for confirmation, just like a new assistant you hire. You give it feedback, which it remembers, and over time it builds new rules to account for your style. At some point, it may do so well that you tell it "Don't ask me to proof your work unless you have some doubt about something". If something, like your example, is of a critical nature, you can always tell it "run this by me before you send it out", or "Add your own 'signature' so, if something is wrong, they can blame you." :) We could, of course, contrive a million examples that would confuse it, just as they might confuse a person who, for example, had never seen a number in an international format. One of the big things I believe is that we need to get over the perception that computers and software are faultless. We need to use them to lighten our load but any given system must build trust from its users over time and, even then, must be forgiven for its errors, just as we would its human counterpart. The world us just too lumpy. --Gregg

 [36/47] from: joel:neely:fedex at: 17-Dec-2001 19:14


Hi, Gregg, Gregg Irwin wrote:
...
> Now, you run it and it shows you the results for confirmation, > just like a new assistant you hire. You give it feedback, which
<<quoted lines omitted: 5>>
> out", or "Add your own 'signature' so, if something is wrong, > they can blame you." :)
That's an interesting (although severely non-trivial) approach to the development issue, but I was describing a property of the problem space itself. Using myself as a case in point, I made up a list of the ways I've actually seen US phone numbers written or typed/typeset: 551-1211 800-552-1212 1-800-553-1213 1+800-554-1214 800/555-1215 (800)556-1216 (800) 557-1217 800.558.1218 1.800.559.1219 Even handling this short list (with nicely commented/named code ;-) to find phone numbers in a text file required something roughly like the following: 8<------------------------------------------------------------ phones: make object! [ defaultarea: "123" areadata: none exchdata: none linedata: none digits: charset "0123456789" plusminus: charset "+-" ldcode: ["1" plusminus | none] optgap: [" " | none] area: [copy areadata 3 digits] exch: [copy exchdata 3 digits] line: [copy linedata 4 digits] phonepatterns: [ exch "-" line (areadata: none) | ldcode area "-" exch "-" line | area "/" exch "-" line | "(" area ")" optgap exch "-" line | ["1." | none] area "." exch "." line ] findphones: func [st [string!] /local result] [ result: clear [] parse/all st [ any [ phonepatterns ( append result rejoin [ any [areadata defaultarea] "-" exchdata "-" linedata ] ) | skip ] ] result ] run: func [fn [file!] /local text] [ text: read fn print rejoin [{"^/} text {^/"}] foreach phone findphones text [ print [tab phone] ] print "" ] ] phones/run %phones.txt 8<------------------------------------------------------------ I'm sure someone could tighten it up a bit, but that's not my main point here. This quick draft version is still sensitive to false positives (e.g. a line with a product number in it resembling #AB-1234-56789 In addition, when I described this to a collegue, she immediately asked, "What about extensions?", raising the issue of multi-line phone systems where the numbers might be written/typed as 800-555-1212x123 808-554-1212 x 234 888-556-1232 ext 456 889-567-1242 ext. 9876 898-576-1252/1234 (with the extension in combination with other phone number formats as appear in the code above). Whether a human manufactures the rules, or a piece of AI software attempts to do so (and I suspect the human will do a better job at this point in history), the problem remains that the size of the rule set itself undergoes a combinatorial explosion as we try to take into account the variations in the data. And we haven't even tackled odd cases like the following You may call me at my office at 1-800-555-1212--I expect to be there until 5:00PM--to discuss our presentation. which ultimately require an actual *understanding* of the text to achieve high accuracy. Hence my description of the problem domain itself as being metaphorically fractal. -jn-

 [37/47] from: greggirwin:mindspring at: 17-Dec-2001 22:48


Hi Joel, << That's an interesting (although severely non-trivial) approach to the development issue...>> Well, I never said it would be *easy*. :) <<...but I was describing a property of the problem space itself. >> Right. This is where my thinking diverges...I'm not sure to where. Rather than trying to identify all the ways a phone number could be formatted, because there are just too many of them even if we don't account for typographical errors in formatting, let's just say that a phone number is: A bunch of numbers with, possibly, some separators at the start or between them. Once you get to something other than a number, separator, or one of our "special" text strings (e.g. x, ext, pin), that's it. OK, that's a little vague, so we need to add some more rules. If it comes after <proper-name> 'at, that's a good clue. If it comes after <number|cell|...> 'is, that's a good clue. If it looks like one of the following, that's a *really* good clue. 551-1211 800-552-1212 1-800-553-1213 1+800-554-1214 800/555-1215 (800)556-1216 (800) 557-1217 800.558.1218 1.800.559.1219 If there are from 7 to 12 digits, and they're broken up into tuple-like groups, that's a good clue. If you find a proper name, and that name is in our address book, you can verify their number or maybe their country at least. Is this enough to handle all we might every want? Probably not. Do you want to allow things like 1.2/3)4(5-6.7 8.9? Perhaps...no. :) --Gregg

 [38/47] from: carl:cybercraft at: 18-Dec-2001 19:46


On 18-Dec-01, Joel Neely wrote:
> Whether a human manufactures the rules, or a piece of AI software > attempts to do so (and I suspect the human will do a better job at > this point in history), the problem remains that the size of the > rule set itself undergoes a combinatorial explosion as we try to > take into account the variations in the data.
Perhaps, instead of trying to make software understand documents written any old which way by humans, we should create strictly formal versions of current human languages that can be tested for correctness by computer? We'd then be able to have documents that could be examined by computer without the need to worry about an infinate number of special cases. -- Carl Read

 [39/47] from: reichart:prolific at: 18-Dec-2001 1:09


1. Q: CARL WROTE: Perhaps, instead of trying to make software understand documents written any old which way by humans, we should create strictly formal versions of current human languages that can be tested for correctness by computer? We'd then be able to have documents that could be examined by computer without the need to worry about an infinate number of special cases. A: And when "people" make mistakes; we debit their bank account! I love your plan. Reichart... [Reichart--Prolific--com] Be useful.

 [40/47] from: al:bri:xtra at: 18-Dec-2001 22:33


Reichart wrote:
> 1. Q: CARL WROTE: Perhaps, instead of trying to make software understand > documents written any old which way by humans, we should create strictly > formal versions of current human languages that can be tested for > correctness by computer? We'd then be able to have documents that could
be
> examined by computer without the need to worry about an infinate number of > special cases.
Hmmmmm. I thought the answer is Rebol! :-) Andrew Martin Half on and 'alf off the page... ICQ: 26227169 http://valley.150m.com/

 [41/47] from: carl:cybercraft at: 18-Dec-2001 22:55


On 18-Dec-01, Andrew Martin wrote:
> Reichart wrote: >> 1. Q: CARL WROTE: Perhaps, instead of trying to make software
<<quoted lines omitted: 5>>
>> number of special cases. > Hmmmmm. I thought the answer is Rebol! :-)
Like to write the English dialect Andrew? (: -- Carl Read

 [42/47] from: doncox:enterprise at: 18-Dec-2001 11:29


On 18-Dec-01, Joel Neely wrote:
> That's an interesting (although severely non-trivial) approach to > the development issue, but I was describing a property of the
<<quoted lines omitted: 10>>
> 800.558.1218 > 1.800.559.1219
The rule here seems to be "a set of from 2 to 4 strings of 1 to 4 digits, delimited by +, -, (, ), /, . or "ext or EXT". I guess the first thing is to look for any string of 8 to 24 characters containing only the above characters and spaces. The first character must be a digit or (, so you can use them as a trigger to start examining the next few characters. The false positives are a problem as a product code might be formatted exactly like a phone number, especially like your first example. My phone number is 44-1642-881220 Regards -- Don Cox [doncox--enterprise--net]

 [43/47] from: doncox:enterprise at: 18-Dec-2001 10:53


On 18-Dec-01, Carl Read wrote:
> Perhaps, instead of trying to make software understand documents > written any old which way by humans, we should create strictly formal > versions of current human languages that can be tested for > correctness by computer? We'd then be able to have documents that > could be examined by computer without the need to worry about an > infinate number of special cases.
What a computer understands is a small subset of what a human understands. How would you train people to write in this limited formal language? It is equivalent to learning to communicate with a horse. The best solution here is probably the "form" structure used on web pages. This limits the response to what can be handled automatically, and the syntax of the replies can be checked before being acted on, with an error message back to the user who gives an invalid answer. In other words, database entry for ordinary semi-trained users. However, if you are doing a marketing survey, for instance, you may still want some free text answers. These have to be either analysed by hand or subjected to a fuzzy search for key words (wrong spelling is common). Regards -- Don Cox [doncox--enterprise--net]

 [44/47] from: joel:neely:fedex at: 18-Dec-2001 1:37


Hi, Carl, [Metaphysicomputational rambling ahead; proceed at your own risk!] Carl Read wrote:
> On 18-Dec-01, Joel Neely wrote: > > Whether a human manufactures the rules, or a piece of AI
<<quoted lines omitted: 9>>
> could be examined by computer without the need to worry about an > infinate number of special cases.
That's been tried before. It was called COBOL. ;-) More recently, it's been tried again, and called mS Word. =8-0 Seriously, I think the proposal breaks down for two main reasons: 1) It merely displaces the issue -- whatever tool(s) enforce your "formal versions" and "correctness" rules would *STILL* need the rules to be defined, and users would *STILL* be annoyed with poor performance, false positives, and false negatives. 2) It assumes that we know in advance which rules are to be enforced based on possible future uses of the text being created/vetted. To elaborate (optional reading ;-)... 1) Word tries to vet spelling, grammar, punctuation, and typography "on the fly" as the user is typing. This process is a) is incredibly annoying/distracting, especially when writing a draft of a document I know I will subsequently revise and tidy up, but am trying to get "on paper" quickly; b) merely displaces the recognition problem to mS, because code/rules *still* must be designed to determine when something resembles e.g., a date or phone number closely enough that it can be put in "standard" format or challenged with a "Did you really mean...?" message (see point (1.a) above!); and this is even more annoying/frustrating when the rules are wrong, incomplete, or inadequate; and c) frightening, as I don't want a commercial entity taking control of my language, regardless of their own agenda. See http://slashdot.org/article.pl?sid=01/10/26/1334257&mode=thread for a story titled "Microsoft Edits English" that begins "An article in the 23-Oct-2000 issue of the New York Times ... talks about how Microsoft has eliminated words from its thesaurus so as to "not suggest words that may have offensive uses or provide offensive definitions for any words". Entering a word like "idiot" yields no hits in Word 2000 unlike the numerous hits in Word 97." d) problematic due to international/cultural variation; consider the controversy over conversion of all European currencies to the Euro, and the decades-old-and-still-barely-begun efforts to get the US public to use the metric system. It is quite clear to me that the *ONLY* reasonable way to write dates is 2001/12/18, and I can't understand why you haven't already figured that out for yourself (...I'm JOKING!!! ;-) 2) Human language is "living" and dynamic in the same sense as other human activities. a) We may not know in advance that we'd need to scan my memos from last year to find all of the email addresses, dates, phone numbers, street addresses, names of people who worked in the department then but have transferred to other jobs, program names and version numbers, hostnames for servers in one of our labs... b) New usages, abbreviations, conventions, etc. are being created all the time, because what we have to say, and the frequency with which we have to say it, is constantly changing; rigid standardization stifles expressivity and leaves us in a bland, barren, and plastic-laminated mental landscape -- ya' want fries with that memo? c) Humans are excellent at recognizing patterns, even in the presence of noise, many kinds of errors, and considerable variation (even of the never-seen-before kind). When I'm writing to another human being, I can move quickly because I can trust her/him to understant me even if I make a tpyo. Finally, the discussion of how dates appear in running text is IMHO only a basic exemplar of a much more pervasive issue: whenever we (and especially our programs/systems) interact with human beings in the "real world" the burden should be on *us*and*our*artifacts* to do the adapting to their way of doing/expressing things. That's as true in the design of physical workflow as it is in the design of computational artifacts. As you can tell from my email address, I work for a big company that employs lots of people in lots of places/cultures to do lots of work that must happen with high speed and low (preferably zero ;-) error rate. However, it is often the case that what works well in some contexts (either physically or computationally) is suboptimal in other settings. Finding the balance -- and perhaps I should really say "keeping the balance, in a constantly changing world" -- between standards enforcement and flexibility for local/personal preferences/needs is the underlying "fractal" challenge of which date formatting is just the tiniest tip of the iceberg. OBTW, let's not forget humor. See the random sig of the moment... -jn- -- Outside of a dog, a book is man's best friend. Inside of a dog, it's too dark to read. -- Groucho Marx FIX?PUNCTUATION?joel?dot?neely?at?fedex?dot?com

 [45/47] from: rotenca:telvia:it at: 18-Dec-2001 20:06


Hi Joel,
> "An article in the 23-Oct-2000 issue of the New York Times > ... talks about how Microsoft has eliminated words from its > thesaurus so as to "not suggest words that may have offensive > uses or provide offensive definitions for any words". Entering > a word like "idiot" yields no hits in Word 2000 unlike the > numerous hits in Word 97."
Perhaps Bill Gates has seen the South Park film.
> c) Humans are excellent at recognizing patterns, even in the > presence of noise, many kinds of errors, and considerable > variation (even of the never-seen-before kind). When I'm > writing to another human being, I can move quickly because > I can trust her/him to understant me even if I make a tpyo.
Humans have at least two levels of redundancy: syntactic and semantic. Humans understand much from contexts, and they can correct transmission errors understanding contexts. It is something that many could call "intuition". Without AI (if it is possible, i am all except an Hofstander fan), problems like telephone numbers are insoluble. --- Ciao Romano

 [46/47] from: joel:neely:fedex at: 18-Dec-2001 14:42


Hi, Romano, Romano Paolo Tenca wrote:
> > c) Humans are excellent at recognizing patterns, even in the > > presence of noise, many kinds of errors, and considerable
<<quoted lines omitted: 7>>
> am all except an Hofstander fan), problems like telephone numbers > are insoluble.
I'm sure we are in agreement. I've tended to state it as "not 100% solvable", but of course I'm always open to a nice way to get 95% without too much cost! -jn-

 [47/47] from: al:bri:xtra at: 19-Dec-2001 9:24


Carl Read wrote:
> Like to write the English dialect Andrew? (:
I'm still learning the English dialect... :-) How about the stock broker dialect? Stockbroker-Dialect [ sell all shares of Microsoft buy shares in Rebol - get loan if necessary - to maximum of $1100000000.00 ] I think specialist or expert language would be easier. Andrew Martin ICQ: 26227169 http://valley.150m.com/

Notes
  • Quoted lines have been omitted from some messages.
    View the message alone to see the lines that have been omitted