Off-topic: regular expressions
[1/3] from: mailinglists:post at: 23-Sep-2000 14:31
This is a multi-part message in MIME format.
------=_NextPart_000_0009_01C0256B.015F3900
Content-Type: text/plain;
charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
Hello,
I'm frustrating myself terribly, trying to extract an URL using regular
expressions from the following string:
bla http://www.yahoo.com/blabla.html?this=insane - bla bla
Now how do I do that? I've tried and failed miserably, this is where I
got: (http://|www).*
Which prints out "http://www.yahoo.com/blabla.html?this=insane - bla
bla" - damn these regular expressions! Rebol's 'parse is way better for
this kind of thing, at least I can get 'parse to stop at the first
space!
Thanks in advance!
Regards,
Rachid
------=_NextPart_000_0009_01C0256B.015F3900
Content-Type: text/html;
charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META content="text/html; charset=iso-8859-1" http-equiv=Content-Type>
<META content="MSHTML 5.00.2920.0" name=GENERATOR>
<STYLE></STYLE>
</HEAD>
<BODY bgColor=#ffffff>
<DIV><FONT face=Tahoma size=2>Hello,</FONT></DIV>
<DIV> </DIV>
<DIV><FONT face=Tahoma size=2>I'm frustrating myself terribly, trying to extract
an URL using regular expressions from the following string:</FONT></DIV>
<DIV> </DIV>
<DIV><FONT face=Tahoma size=2>"bla <A
href="http://www.yahoo.com/blabla.html?this=insane">http://www.yahoo.com/blabla.html?this=insane</A>
- bla bla"</FONT></DIV>
<DIV> </DIV>
<DIV><FONT face=Tahoma size=2>Now how do I do that? I've tried and failed
miserably, this is where I got: (<A
href="http://|www">http://|www</A>).*</FONT></DIV>
<DIV> </DIV>
<DIV><FONT face=Tahoma size=2>Which prints out "<A
href="http://www.yahoo.com/blabla.html?this=insane">http://www.yahoo.com/blabla.html?this=insane</A>
- bla bla" - damn these regular expressions! Rebol's 'parse is way better for
this kind of thing, at least I can get 'parse to stop at the first
space!</FONT></DIV>
<DIV> </DIV>
<DIV><FONT face=Tahoma size=2>Thanks in advance!</FONT></DIV>
<DIV> </DIV>
<DIV><FONT face=Tahoma size=2>Regards,</FONT></DIV>
<DIV><FONT face=Tahoma size=2>Rachid</FONT></DIV></BODY></HTML>
------=_NextPart_000_0009_01C0256B.015F3900--
[2/3] from: bobr:dprc at: 23-Sep-2000 12:11
if it is a perl based regular expression to extract a URL
that you want, I have a line from Kehei.com wiki.pl (the non rebol one)
that will find a URL either at beginning of a line, after a space or
after a * or - . It also stops the parse at a space or a tag (<) start.
# Handle embedded URLs
s@(^|[\-\*\s])((news|http|ftp|gopher|https)\://([^\s<]+))@$1\<A
href\=\"$2" target=\"_top\"\>$2<\/A\>@go;
note that @ was used as a delimiter since nearly every other
punctuation char was used already in the innards.
s/match/repl/go; is
[s--match]@[repl--go];
a discussion about doing regular expressions can be found
in the rebol email archives at
ThreadHead: http://rebol.org/userlist/archive/83/585.html
ThreadTail: http://rebol.org/userlist/archive/86/224.html
;# mailto: [bobr--dprc--net]
At 02:31 PM 9/23/00 +0200, [mailinglists--post--com] wrote:
[3/3] from: mailinglists:post at: 23-Sep-2000 19:53
A BIG THANK YOU!
Really,
Rachid