A question about a regular expression

[1/14] from: lethalman::fyrebird::net at: 17-Oct-2004 10:22

Hello, (first sorry for my poor English) I've this string: "<a href='http://bla'>Test</a><a href='http://blabla'>Test2</a>" Now i would like to get Test and Test2. In Perl i use the following regexp: <a href='.*?'>(.+?)</a> So i get Test and Test2, but how can i do this in Rebol? The user will specify the two string between are both Test and Test2, like: Before: <a href='*'> After: </a> So i will get Test and Test2. Is there any way to do this? -- Fyrebird Hosting Provider - Technical Department

[2/14] from: carl:cybercraft at: 17-Oct-2004 9:33

On Sunday, 17-October-2004 at 10:22:25 you wrote,

>Hello, >(first sorry for my poor English)

<<quoted lines omitted: 8>>

>So i will get Test and Test2. >Is there any way to do this?

Assuming the format of the string remains the same (and I've understood you correctly), you could use something like the following, (though it's a bit quick and dirty)...

>> str: "<a href='http://bla'>Test</a><a>href='http://blabla'>Test2</a>"

== {<a href='http://bla'>Test</a><a>href='http://blabla'>Test2</a>}

>> blk: parse/all str "<>"

== ["" "a href='http://bla'" "Test" "/a" "" "a" "href='http://blabla'" "Test2" "/a"]

>> blk/3

== "Test"

>> blk/8

== "Test2" This would work if your user supplied text with spaces in it too...

>> str: "<a href='http://bla'>Test One</a><a>href='http://blabla'>Test2</a>"

== {<a href='http://bla'>Test One</a><a>href='http://blabla'>Test2</a>}

>> blk: parse/all str "<>"

== ["" "a href='http://bla'" "Test One" "/a" "" "a" "href='http://blabla'" "Test2" "/a"]

>> blk/3

== "Test One"

>> blk/8

== "Test2" A proper parse rule to create a tidier output could be written of course, but if the above does what you want then you might as well use it. -- Carl Read

[3/14] from: lethalman:fyrebird at: 17-Oct-2004 11:03

Carl Read wrote:

>On Sunday, 17-October-2004 at 10:22:25 you wrote, >>Hello,

<<quoted lines omitted: 48>>

>A proper parse rule to create a tidier output could be written of course, but if the above does what you want then you might as well use it. >-- Carl Read

Thanks for your reply, but there's a problem... If the user specifies: Before: <tr><td><a href='http://*'> After: </a></td></tr> It won't work... I need something more dynamic -- Fyrebird Hosting Provider - Technical Department

[4/14] from: SunandaDH:aol at: 17-Oct-2004 5:22

[lethalman--fyrebird--net]:

> Is there any way to do this?

There are lots of ways to do it. The best will depend on what other conditions need to be met (ie what else could be in the string being parsed) This works for the original example: x: "<a href='http://bla'>Test</a><a href='http://blabla'>Test2</a>" foreach item load/markup x [if string? item [print item]] Test Test2 Unlike the Perl, this will work if the anchor tag is more complex, eg: <a class="aaa" href="/index" style="color:blue">Test</a> But it will also find *all* strings in the input, not just those between anchor tags, not just anchor tags. That may not be what you want: x: "<div> <a href='http://bla'><strong>Test</strong</a><a href='http://blabla'>Test2</a></div>" foreach item load/markup x [if string? item [print item]] Test Test2 Sunanda

[5/14] from: lethalman::fyrebird::net at: 17-Oct-2004 11:28

[SunandaDH--aol--com] wrote:

>[lethalman--fyrebird--net]: >>Is there any way to do this?

<<quoted lines omitted: 17>>

>Test2 >Sunanda

Nice... i'm still new to Rebol, this is a great thing... I think it sould work fine, thanks! -- Fyrebird Hosting Provider - Technical Department

[6/14] from: carl:cybercraft at: 17-Oct-2004 10:45

On Sunday, 17-October-2004 at 11:03:18 you wrote,

>Thanks for your reply, but there's a problem... > >If the user specifies: >Before: <tr><td><a href='http://*'> >After: </a></td></tr> > >It won't work... >I need something more dynamic

Yes - I mis-read what you wanted. Sorry for that. Sunanda has put you on the right track though, so I expect you're making good progress now. -- Carl Read

[7/14] from: lethalman::fyrebird::net at: 17-Oct-2004 12:05

Carl Read wrote:

>On Sunday, 17-October-2004 at 11:03:18 you wrote, >>Thanks for your reply, but there's a problem...

<<quoted lines omitted: 9>>

>Yes - I mis-read what you wanted. Sorry for that. Sunanda has put you on the right track though, so I expect you're making good progress now. >-- Carl Read

Mmmm however there's a problem... If the user wants to get the string between "<td>Test: " and "</td>" It won't work... -- Fyrebird Hosting Provider - Technical Department

[8/14] from: gabriele:colellachiara at: 17-Oct-2004 12:51

Hi Lethalman, On Sunday, October 17, 2004, 12:05:50 PM, you wrote: L> Mmmm however there's a problem... L> If the user wants to get the string between "<td>Test: " and "</td>" L> It won't work... Basically you need to find the text between two known strings? start-string: "<td>Test: " end-string: "</td>" parse/all text [thru start-string copy wanted-string to end-string] print wanted-string Regards, Gabriele. -- Gabriele Santilli <[g--santilli--tiscalinet--it]> -- REBOL Programmer Amiga Group Italia sez. L'Aquila --- SOON: http://www.rebol.it/

[9/14] from: carl:cybercraft at: 17-Oct-2004 14:14

On Sunday, 17-October-2004 at 12:05:50 you wrote,

>Mmmm however there's a problem... >If the user wants to get the string between "<td>Test: " and "</td>" >It won't work...

No - probably not... If it's complex parsing you want, meaning you want to both select the text between certain tags and then select specific parts of that text as well, you'd be best to study up on proper REBOL parsing... http://www.rebol.com/docs/core23/rebolcore-15.html My approach would be to LOAD/MARKUP your file so it's placed into a block then use a PARSE rule to extract what you need. There's many examples in the docs there. Find one that does some of what you want and then build on it. -- Carl Read

[10/14] from: lethalman:fyrebird at: 17-Oct-2004 13:02

It's not so easy to do what i want... Example: I want to get the text that's between: "<td><a href='*'>Title: " and "</a></td>" With parsing i can't do it, because it doesn't accept wildcars, and i can't with markup because "Title: " it's a string out of html tags. So i don't need to select specified things (like with markup) but i need simply to get a text between two string by using wildcards too... Is it impossible? Do i need to do it in Perl/Python? I really would like to make this in Rebol... -- Fyrebird Hosting Provider - Technical Department

[11/14] from: gabriele:colellachiara at: 17-Oct-2004 13:05

Hi Lethalman, On Sunday, October 17, 2004, 1:02:28 PM, you wrote: L> It's not so easy to do what i want... It almost always is. ;) L> Example: L> I want to get the text that's between: "<td><a L> href='*'>Title: " and "</a></td>" L> With parsing i can't do it, because it doesn't accept L> wildcars, You only need "*" or something else too? Regards, Gabriele. -- Gabriele Santilli <[g--santilli--tiscalinet--it]> -- REBOL Programmer Amiga Group Italia sez. L'Aquila --- SOON: http://www.rebol.it/

[12/14] from: lethalman:fyrebird at: 17-Oct-2004 13:10

Gabriele Santilli wrote:

>Hi Lethalman, >On Sunday, October 17, 2004, 1:02:28 PM, you wrote:

<<quoted lines omitted: 8>>

>Regards, > Gabriele.

? is good too... :) PS: se vuoi parla anche italiano -- Fyrebird Hosting Provider - Technical Department

[13/14] from: gabriele:colellachiara at: 17-Oct-2004 13:58

Hi Lethalman, On Sunday, October 17, 2004, 1:10:12 PM, you wrote:

>>You only need "*" or something else too? >>

L> ? is good too... :) Ok. L> PS: se vuoi parla anche italiano In inglese ci capiscono anche gli altri. ;-) One way is to use FIND/ANY that supports wildcards. However, like most RE engines do, it can match too much:

>> find/any/tail "<a href='http://bla'>Test</a><a href='http://blabla'>Test2</a>" "<a href='*'>"

== "Test2</a>" Another way is to convert the strings to PARSE rules, and then use them.

>> wildcards: "*?" >> text: complement charset wildcards >> result: [] >> wild-rule: [some [copy str any text (append result str) ["*" (append result 'thru) | "?" (append result 'skip)]]] >> parse/all "<a href='*'>" wild-rule >> result

== ["<a href='" thru "'>"] Note that this won't work if your string has something like "*?" or "**" in it (it will work for "?*"). You can enhance it to support that if you think you need it. Now you can:

>> parse/all "<a href='http://bla'>Test</a><a href='http://blabla'>Test2</a>" [result mark:]

== false

>> mark

== "Test</a><a href='http://blabla'>Test2</a>" I think this should be enough to give you a start. :) Regards, Gabriele. -- Gabriele Santilli <[g--santilli--tiscalinet--it]> -- REBOL Programmer Amiga Group Italia sez. L'Aquila --- SOON: http://www.rebol.it/

[14/14] from: lethalman::fyrebird::net at: 17-Oct-2004 14:01

Gabriele Santilli wrote:

>Hi Lethalman, >On Sunday, October 17, 2004, 1:10:12 PM, you wrote:

<<quoted lines omitted: 38>>

>Regards, > Gabriele.

Yes i've already found the FIND/ANY solution, i thought there was another way more simple and clean. Thanks to everyone! -- Fyrebird Hosting Provider - Technical Department

Notes

Quoted lines have been omitted from some messages.
View the message alone to see the lines that have been omitted