Off-topic: regular expressions

[1/3] from: mailinglists:post at: 23-Sep-2000 14:31

This is a multi-part message in MIME format. ------=_NextPart_000_0009_01C0256B.015F3900 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Hello, I'm frustrating myself terribly, trying to extract an URL using regular expressions from the following string: bla http://www.yahoo.com/blabla.html?this=insane - bla bla Now how do I do that? I've tried and failed miserably, this is where I got: (http://|www).* Which prints out "http://www.yahoo.com/blabla.html?this=insane - bla bla" - damn these regular expressions! Rebol's 'parse is way better for this kind of thing, at least I can get 'parse to stop at the first space! Thanks in advance! Regards, Rachid ------=_NextPart_000_0009_01C0256B.015F3900 Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"> <HTML><HEAD> <META content="text/html; charset=iso-8859-1" http-equiv=Content-Type> <META content="MSHTML 5.00.2920.0" name=GENERATOR> <STYLE></STYLE> </HEAD> <BODY bgColor=#ffffff> <DIV><FONT face=Tahoma size=2>Hello,</FONT></DIV> <DIV> </DIV> <DIV><FONT face=Tahoma size=2>I'm frustrating myself terribly, trying to extract an URL using regular expressions from the following string:</FONT></DIV> <DIV> </DIV> <DIV><FONT face=Tahoma size=2>"bla <A href="http://www.yahoo.com/blabla.html?this=insane">http://www.yahoo.com/blabla.html?this=insane</A> - bla bla"</FONT></DIV> <DIV> </DIV> <DIV><FONT face=Tahoma size=2>Now how do I do that? I've tried and failed miserably, this is where I got: (<A href="http://|www">http://|www</A>).*</FONT></DIV> <DIV> </DIV> <DIV><FONT face=Tahoma size=2>Which prints out "<A href="http://www.yahoo.com/blabla.html?this=insane">http://www.yahoo.com/blabla.html?this=insane</A> - bla bla" - damn these regular expressions! Rebol's 'parse is way better for this kind of thing, at least I can get 'parse to stop at the first space!</FONT></DIV> <DIV> </DIV> <DIV><FONT face=Tahoma size=2>Thanks in advance!</FONT></DIV> <DIV> </DIV> <DIV><FONT face=Tahoma size=2>Regards,</FONT></DIV> <DIV><FONT face=Tahoma size=2>Rachid</FONT></DIV></BODY></HTML> ------=_NextPart_000_0009_01C0256B.015F3900--

[2/3] from: bobr:dprc at: 23-Sep-2000 12:11

if it is a perl based regular expression to extract a URL that you want, I have a line from Kehei.com wiki.pl (the non rebol one) that will find a URL either at beginning of a line, after a space or after a * or - . It also stops the parse at a space or a tag (<) start. # Handle embedded URLs s@(^|[\-\*\s])((news|http|ftp|gopher|https)\://([^\s<]+))@$1\<A href\=\"$2" target=\"_top\"\>$2<\/A\>@go; note that @ was used as a delimiter since nearly every other punctuation char was used already in the innards. s/match/repl/go; is [s--match]@[repl--go]; a discussion about doing regular expressions can be found in the rebol email archives at ThreadHead: http://rebol.org/userlist/archive/83/585.html ThreadTail: http://rebol.org/userlist/archive/86/224.html ;# mailto: [bobr--dprc--net] At 02:31 PM 9/23/00 +0200, [mailinglists--post--com] wrote:

[3/3] from: mailinglists:post at: 23-Sep-2000 19:53

A BIG THANK YOU! Really, Rachid