r3wp [groups: 83 posts: 189283]
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

World: r3wp

[Parse] Discussion of PARSE dialect

BrianW
22-Aug-2005
[329]
ah, thanks. Sunanda, that solution won't quite work if a #"*" appears 
without a match. I'll go look at NicomDoc
BrianH
22-Aug-2005
[330x2]
parse/all data [any [to "*" a: skip b: to "*" c: skip d: :a (change/part 
a rejoin ["<strong>" copy/part b c "</strong>"] d)] to end]
You can make it a little more complicated to add more markup types, 
but the basic structure is the same. The trick is the :a before the 
paren - otherwise it won't work, and you can crash older versions 
of REBOL.
Tomc
22-Aug-2005
[332x2]
something along the lines of   (untested)
;;; make the word set more restrictive if no space etc
;;; but this is most permissive for your example
word: complement charset "*"
rule: [
	skip to "*" 
	[copy item some word "*"(append output join[<tag> item </tag>])]
	| skip
BrianW
22-Aug-2005
[334x2]
That works nicely too! I'll look more at NicomDoc later, but BrianH's 
tip makes tests for "*test*" and "*test" pass
I'll have to explore Tomc's solution when I get back from my meeting. 
Thanks, folks
BrianH
22-Aug-2005
[336x5]
markup-chars: charset "*~"
non-markup: complement markup-chars
tag1: ["*" "<strong>" "~" "<i>"]
tag2: ["*" "</strong>" "~" "</i>"]
parse/all data [
    any non-markup
    any [

        ["*" a: skip b: to "*" c: skip d: | "~" a: skip b: to "~" c: skip 
        d: ] :a (
            change/part a rejoin [
                select tag1 copy/part a b
                copy/part b c
                select tag2 copy/part c d
            ] d
        ) any non-markup
    ]
    to end
]
No nesting, but with a little recursion and different start and end 
tags, this can be adapted to handle that too.
If you want to determine whether there have been any replacements, 
change the second any to some and parse will return true only when 
replacements have been made. Be careful to avois use of the markup 
characters in your replacement text.
avios: avoid
Whoops, an error. Change:

        ["*" a: skip b: to "*" c: skip d: | "~" a: skip b: to "~" c: skip 
        d: ] :a (
to:

        [a: "*" b: to "*" c: skip d: | a: "~" b: to "~" c: skip d: ] :a (

Silly me :(
Tomc
22-Aug-2005
[341]
w: complement charset "*"
rule: [	
	to "*" here: "*"
	opt[ 
		copy item some w "*" there:
		(change/part :here join "" [<strong> item </strong>] :there)
	]
]
parse/all str [some rule]
BrianH
22-Aug-2005
[342]
Tomc, that will crash older versions of REBOL, and not work on newer 
versions. You need to reset the parse position to before the change, 
before the paren where you make the change. Otherwise parse will 
be referencing a point off the end of the string at the end of the 
paren, before you can reset it. This used to crash REBOL so bad the 
interpreter disappeared.
Tomc
22-Aug-2005
[343]
brianh   please supply a str that fails on current versions, so I 
can see what you mean
BrianH
22-Aug-2005
[344]
To fix your example, put a :here after the first there: in your rule.
Tomc
22-Aug-2005
[345]
still havent found a string that fails , trying all the combos of 
*'s at the beginning end , middle ...
BrianH
22-Aug-2005
[346]
In your case you might not have a crash, because you are replacing 
a short text with a longer one. Still, it's good to remember that 
bug for future reference. It really tripped me up when I first came 
across it, back when it still used to crash REBOL.
Tomc
22-Aug-2005
[347]
yes, shortening the string you are parsing would pull the rug out 
from under the interperter,
(and I was aware that the string was being lengthened) 

note: setting the parse pointer back to :here will position you before 
the "*"  

you may be better off  with :here skip to gaurentee progress in the 
case the change fails
BrianH
22-Aug-2005
[348x5]
OK, I tried this: parse "abc" [to "bc" a: "bc" (change/part a "b" 
2)]

It returns true on View 1.3 and Core 2.6, but false on View 1.2 and 
Core 2.5.0.
If the change fails it will throw an error. The trick is to put off 
the paren performing the change until you have gone through enough 
rules to ensure that the paren contents will succeed.
Remember, for many platforms, Core 2.5.0 is the current version.
Here's a simplified version of my example that can handle multiple 
instances of multiple markup types and be adapted to different end 
tags (thanks Tomc for the idea!):

markup-chars: charset "*~"
non-markup: complement markup-chars
tag1: ["*" "<strong>" "~" "<i>"]
tag2: ["*" "</strong>" "~" "</i>"]
parse/all data [
    any non-markup
    any [

        ; This next block can be generated if you have many markup types...

        [a: copy b "*" copy c to "*" copy d "*" e: | a: copy b "~" copy c 
        to "~" copy d "~" e: ]
        :a (change/part a rejoin [tag1/:b c tag2/:d] e)
        any non-markup
    ]
    to end
]
Tomc: "you may be better off  with :here skip to gaurentee progress"


Put the skip after the paren and I may agree with you there. Of course 
you would skip the number of chars in the replacement text then.
BrianW
22-Aug-2005
[353x2]
Wow, I'm off getting bored at meetings, come back and you've been 
working hard! Thanks, folks.
Here's what I have right now:

		markup-chars: charset "*_@"
		non-markup: complement markup-chars
		inline-tags: [
			"*" "strong"
			"_" "em"
			"@" "code"
		]

		markup-rule: [
			any non-markup
			any [
				[ a: "*" b: to "*" c: skip d: |
				  a: "_" b: to "_" c: skip d: | 
				  a: "@" b: to "@" c: skip d: ] :a (
					change/part a rejoin [ 
						"<" select inline-tags copy/part a b ">"
						copy/part b c 
						"</" select inline-tags copy/part a b ">"
					] d
				) any non-markup
			]
			to end
		]
		parse text markup-rule
Tomc
22-Aug-2005
[355]
you almost certinly want parse/all
BrianW
22-Aug-2005
[356]
whoops
BrianH
22-Aug-2005
[357]
If you want to guarantee progress with my and your examples (and 
better support multichar markup tags) change the last
  any non-markup
to
  any non-markup | skip
and that would do it.
BrianW
22-Aug-2005
[358]
okay, here's a slightly tweaked version that uses a multichar markup 
tag:

        markup-chars: charset "[*_-:---]"
        non-markup: complement markup-chars
        inline-tags: [
            "*" "strong"
            "_" "em"
            "@" "code"
            "--" "small"
        ]

        markup-rule: [
            any non-markup
            any [
                [ a: "*" b: to "*" c: skip d: |
                  a: "_" b: to "_" c: skip d: | 
                  a: "@" b: to "@" c: skip d: |
                  a: "--" b: to "--" c: skip skip d: ] :a (
                    change/part a rejoin [ 
                        "<" select inline-tags copy/part a b ">"
                        copy/part b c 
                        "</" select inline-tags copy/part a b ">"
                    ] d
                ) any non-markup | skip
            ]
            to end
        ]
        parse/all text markup-rule
BrianH
22-Aug-2005
[359]
Your first charset only needs one -
BrianW
22-Aug-2005
[360]
It passes my simple tests, now I need to throw more interesting tests 
in (multiple tags on the same line, nested tags, whatever)

Thanks BrianH, I'll fix that.
BrianH
22-Aug-2005
[361]
Nested tags of the same type won't work at all unless the start and 
end tags are different, and they won't work here without either recursion 
or a an algorythm that does nesting counts. Be careful with that 
because you'd have to update those counts in parens and you can't 
backtrack through parens.
BrianW
22-Aug-2005
[362]
Lucky for me, the rules don't support nested tags of the same type.


* *strong* text* would probably parse as <strong> </strong>strong 
<strong> text</strong>
BrianH
22-Aug-2005
[363]
Note that my last example keeps track of both the start and eng tags, 
even though I don't need to with the markup chars I used.
BrianW
22-Aug-2005
[364]
I need to test for *strong and _emphasized_ text.* (for example)
BrianH
22-Aug-2005
[365]
Yours will test for *strong and _emphasized* text._ as well right 
now (for example)
BrianW
22-Aug-2005
[366]
works beautifully, no changes needed. You guys rule.
BrianH
22-Aug-2005
[367]
The generated html might not be pretty though :)
BrianW
22-Aug-2005
[368x2]
Pretty comes after I know it works, if at all. :-)
awesome, it even works for "**bold**" text!
Tomc
22-Aug-2005
[370]
and won't touch ***********************************************
BrianH
22-Aug-2005
[371]
Really? The rules look like they'd translate that to <strong></strong> 
repeated many times. It doesn't?
BrianW
22-Aug-2005
[372x4]
Lemme write a test and see
ah, it turns them into <b></b>pairs
10 - Not OK - Rampant asterisks are usually ignored
Expected <<p>***********************************************</p>>

Got <<p><b></b><b></b><b></b><b></b><b></b><b></b><b></b><b></b><b></b><b></b><b></b><strong></
strong>*</p>
Part of me wants to just ignore it for now and get on to other stuff.
BrianH
22-Aug-2005
[376x3]
Well, if you want exceptions, you gotta code them in. In this case, 
before the block of your markup rules, as an alternate.
Like this (at the beginning of the any block):
    ["**" any "*"] |
(Be back, off to a class)