Parse problem

[1/14] from: patrick::philipot::laposte::net at: 7-Oct-2005 14:19

Hi List A parse problem. I would like to parse a text to replace any url in it. Example Initial text : Hey go visit http://www.me.org, it's great! Result : Hey go visit <a hrefhttp://www.me.org">http://www.me.org</a>, it's great!" Not so easy, isn't it ? -- Bye Pat665

[2/14] from: greggirwin:mindspring at: 7-Oct-2005 7:26

Hi Patrick, PP> A parse problem. I would like to parse a text to replace any url in it. PP> Example I hacked the link parser out of the IOS Conference reblet at one point, but I didn't do any text replacement with it, I just made it a little more general in that you could pass in a callback function. It still has some issues that should be addressed (e.g. it includes trailing commas on URLs), but it might give you something to build on. -- Gregg Pardon the formatting; watch for wrap. REBOL [] link-parser: context [ white-space: charset reduce [#" " newline tab cr #"<" #">"] non-white-space: complement white-space to-space: [some non-white-space | end] skip-to-next-word: [some non-white-space some white-space] link-rule: copy [] callback: none make-action: func [link] [ compose [ mark: (link) (either string? link [[to-space end-mark:]] []) (to-paren compose [callback copy/part mark end-mark]) any white-space ] ] make-link-rules: func [schemes] [ clear link-rule foreach scheme schemes [ repend link-rule [make-action scheme '|] ] append link-rule 'skip-to-next-word use [mark end-mark text offset] [bind link-rule 'mark] ] set 'parse-links func [ input [any-string!] action [any-function!] /with schemes [block!] "Block of scheme patterns to look for" ][ make-link-rules any [ schemes ["https://" "http://" "www." "ftp://" "ftp."] ] callback: :action error? try [parse/all input [any link-rule]] ] ] s: {Check out http://www.rebol.org And if you like that, you'll really like www.rebol.com Then, for more excitement, port your data to ftp://blah-blah-blah } parse-links s func [url] [print url] print "^/This pass will only look for FTP links^/" s: {Check out http://www.rebol.org And if you like that, you'll really like www.rebol.com Then, for more excitement, port your data to ftp://blah-blah-blah } parse-links/with s func [url] [print url] ["ftp://"] halt

[3/14] from: pwawood::mango::net::my at: 8-Oct-2005 14:23

Hi Patrick I'm no expert like Gregg so I couldn't figure out how to use parse to achieve what you wanted. So I wrote this simple function that may help. I haven't tested it fully but hope it helps. Regards Peter highlight-link: func [ {Wraps an <a href> tag around http urls in a string.} text [string!] "The string containing http urls." /local url-end "Used to workout where the url ends" ][ until [ text: find text "http:" ;; find a url if text ;; "http:" found [ ;; find char after url url-end: copy find/tail text "http:" until [ url-end: next url-end any [ (url-end = "") (#" " = first url-end) (#"," = first url-end) (#":" = first url-end) (#";" = first url-end) ] ] ;; insert the <a> tag either url-end = "" ;; url is at end of text [ insert text rejoin [ {<a href="} copy text {">} ] insert text: tail text "</a>" ][ ;; text after url url-end: first url-end insert text rejoin [ {<a href="} copy/part text find text url-end {">} ] text: find/tail text {">} ;; skip over insert text: find text url-end insert text "</a>" ] text = none ] text: head text ] ]

[4/14] from: Patrick::Philipot::laposte::net at: 8-Oct-2005 20:35

Bonjour Peter, Thank you for your code. Being a lazy guy, I was hoping to get advantage of block parsing. Block parsing is able to match a type, the type being url! in my case. Example :

>> ablock: [foo dummy http://www.me.com foo] >> parse ablock [ some [set u url! (print u) | skip]]

http://www.me.com So finally, my solution is : s: {Go see http://www.me.org, it is fabulous. And http://aaa.mypicture.com with my photos} bs: parse s none remove-each w bs [not parse to-block w [url!]] foreach w bs [ replace find s w w rejoin [{<a=href"} w {">} w {</a>}] ] print s Go see <a=href"http://www.me.org">http://www.me.org</a>, it is fabulous. And <a=href"http://aaa.mypicture.com">http://aaa.mypicture.com</a> with my photos I'm pretty sure this code is not optimized and that some guru could make a one liner of it. Anyway it will do the job the way I like. -- Ciao Patrick

[5/14] from: pwawood:mango:my at: 9-Oct-2005 12:30

Bonjour Patrick Thanks for posting the parse solution. It's certainly shorter than my do-it-yourself approach. At the moment it seems as though it doesn't work properly if an url is in the string twice:

>> s: {I love http://www.rebol.com, I hate http://www.rebol.com}

== {I love http://www.rebol.com, I hate http://www.rebol.com}

>> bs: parse s none

== ["I" "love" "http://www.rebol.com" "I" "hate" "http://www.rebol.com"]

>> remove-each w bs [not parse to-block w [url!]]

== ["http://www.rebol.com" "http://www.rebol.com"]

>> foreach w bs [

[ replace find s w w rejoin [{<a href="} w {">} w {</a>}] [ ] == {<a href="http://www.rebol.com">http://www.rebol.com</a>">http://www.rebol.com</ a> , I hate http://www.rebol.com} You need to add something to the foreach loop so that you don't find the first occurrence of the url when looking for the second. I came up with this, which no doubt can be vastly improved upon:

>> foreach w bs [

[ replace find s w w rejoin [{<a href="} w {">} w {</a>}] [ s: next find next find s w w [ ] == "ttp://www.rebol.com</a>"

>> s: head s

== {I love <a href="http://www.rebol.com">http://www.rebol.com</a>, I hate <a href="h ttp://www.rebol.com">http://www.rebol.com</a>} Salut Peter

[6/14] from: pwawood:mango:my at: 9-Oct-2005 13:22

Bonjour encore Patrick I realised that my simplistic function had an logic error so that it only would have processed the first url it came across. I've updated it now. highlight-links: func [ {Wraps an <a href> tag around http urls in a string.} text [string!] "The string containing http urls." /local url-end "Used to workout where the url ends" ][ while [find text "http://"] [ text: find text "http://" ;; find char after url url-end: copy find/tail text "http:" until [ url-end: next url-end any [ (url-end = "") (#" " = first url-end) (#"," = first url-end) (#":" = first url-end) (#";" = first url-end) ] ] ;; insert the <a> tag either url-end = "" ;; url is at end of text [ insert text rejoin [ {<a href="} copy text {">} ] insert text: tail text "</a>" ][ ;; text after url url-end: first url-end insert text rejoin [ {<a href="} copy/part text find text url-end {">} ] text: find/tail text {">} ;; skip over insert text: find text url-end insert text "</a>" ] ] text: head text ] Regards Peter

[7/14] from: Patrick::Philipot::laposte::net at: 9-Oct-2005 9:12

Hi Peter, Thank you for pointing out the duplicate problem. I have found an easy way to get rid of it. first I remove the duplicate with : bs: unique bs Then I replace all accurrences with replace/all. So the final draft is :

>> s: {Go see http://www.me.org, it is

{ fabulous. Really http://www.me.org is awesome! { And http://aaa.mypicture.com with my photos} == {Go see http://www.me.org, it is fabulous. Really http://www.me.org is awesome! And http://aaa.mypicture.com with my photos}

>> >> bs: parse s none

== ["Go" "see" "http://www.me.org" "it" "is" "fabulous." "Really" "http://www.me.org" "is" "awesome!" "And" "http:// aaa.mypicture.c...

>> >> remove-each w bs [not parse to-block w [url!]]

== ["http://www.me.org" "http://www.me.org" "http://aaa.mypicture.com"]

>> >> bs: unique bs

== ["http://www.me.org" "http://aaa.mypicture.com"]

>> >> foreach w bs [

[ replace/all find s w w rejoin [{<a=href"} w {">} w {</a>}] [ ] == {<a=href"http://aaa.mypicture.com">http://aaa.mypicture.com</a> with my photos}

>> >> print s

Go see <a=href"http://www.me.org">http://www.me.org</a>, it is fabulous. Really <a=href"http://www.me.org">http://www.me.org</a> is awesome! And <a=href"http://aaa.mypicture.com">http://aaa.mypicture.com</a> with my photos

Sometimes Rebol is so fun. I wish I had more time to play with it. -- Ciao Patrick

[8/14] from: Patrick:Philipot:laposte at: 9-Oct-2005 10:14

Hi List, As I was testing my parse function, I have found an annoying problem with to-block! is this a bug ? how to workaround it ? My function use teh following line remove-each w bs [not parse to-block w [url!]] And it comes that to-block do not like parens. See

>> to-block "(right"

** Syntax Error: Missing ) at end-of-script ** Near: (line 1) (right So there is some kind of evaluation here. On the other hand, this is ok.

>> to-block "(right)"

== [(right)] This is really bothering me, when Rebol is not fun at all :( Ideas and workaround appreciated. -- Ciao Patrick

[9/14] from: Patrick::Philipot::laposte::net at: 9-Oct-2005 11:14

Hi List, Previouly on "Parse problem" The problem was to found how to replace any URL found in a text by a valid HTML link <a href= ...>. You will find the complete answer in the following. However a second problem arised with to-block.

>> to-block "(foo"

** Syntax Error: Missing ) at end-of-script ** Near: (line 1) (foo It seems like a bug to me. My workaround is to catch the error to remove the "faulty" text from evaluation. Here is my code, if it may be of some interest ... ; pat665 handling URL link directly in the text ; ; ; transform-url: func [s [string!]] [ ; replace any URL found by a valid HTML link <a href= ...> bs: parse s none remove-each w bs [error? try [to-block w]] remove-each w bs [not parse to-block w [url!]] bs: unique bs foreach w bs [ replace/all find s w w rejoin [{<a href="} w {">} w {</a>}] ] s ] ; ; ; -- Ciao Patrick

[10/14] from: carl:cybercraft at: 9-Oct-2005 14:56

Hi Patrick, I had a quick attempt at your problem when you first posted it, and have to agree, it's not so easy. ;-) I did expect someone to come up with a good method using PARSE though, and am surprised they haven't. Anyway, I've had another go tonight and have worked out a method - one that uses a recursive approach. Essentially, to find the end of an URL, it looks for a space, return character or the end of the file and copies from the URL start to there. Then, if it didn't find the end of the file the copied text is itself parsed. This finds the URLs, but they may of course have commas and so on attached to their ends. It's an easy matter to strip them off though, giving you the true URL which you can then modify to replace the original. So, here's the code: There's three parts to it - a parse rule where the recursion is performed, a function to call it, strip any extra characters and replace the original and the main parsing routing which calls the function... str: {Hey go visit http://www.me.org, it's great! Or http://www.you.org or http://www.them.org or even http://www.us.org!} rule: [ some [ to "http://" copy text to " " (parse text rule) | to "http://" copy text to "^/" (parse text rule) | to "http://" copy text to end ] ] modify: does [ parse text rule ;-- Add characters to the FIND string in the next line to increase ;-- the types of characters you need to remove from the end of an URL. while [find {.,!)"} last text][remove back tail text] new-text: rejoin [{<a href="} text {">} text </a>] change/part s new-text length? text s: skip s -1 + length? new-text ] parse str [ some [ to "http://" s: copy text to " " (modify) :s | to "http://" s: copy text to "^/" (modify) :s | to "http://" s: copy text to end (modify) :s ] to end ] print str Let me know if this behaves for you! :-) I've just thought of one problem - if there's already an URL inside tags within the text, it'd screw them up. So this isn't a universal solution. And it should really be made into a nice, tide function. -- Carl Read. On Friday, 7-October-2005 at 14:19:52 Patrick Philipot wrote,

[11/14] from: volker::nitsch::gmail::com at: 9-Oct-2005 14:29

On 10/9/05, [Patrick--Philipot--laposte--net] <[Patrick--Philipot--laposte--net]> wrote:

> Hi List, > As I was testing my parse function, I have found an annoying problem

<<quoted lines omitted: 10>>

> == [(right)] > This is really bothering me, when Rebol is not fun at all :(

Remember to-block needs valid rebol-code, else it is helpless. Catch the error. !> attempt[to-block "(bang"] == none !> attempt[to-block "bang"] == [bang] !> if u: attempt[to-block "(bang"] [url? first u] == none !> if u: attempt[to-block "http://yep"] [url? u/1] == true !> attempt[load "http://yep"] == http://yep !> url? attempt[load "http://yep"] == true Some other versions to do it: setup: does[ s: {Go see http://www.me.org, it is fabulous. And http://aaa.mypicture.com with my photos. ftp://aaa.mypicture.com too} emit: func[val][repend "" val] ] setup ; Simplest in this case: while[ p: find s "http://" ][ emit copy/part s p set[url s] load/next p emit [build-tag [a href (url)] url </a>] ] emit s print emit "" ; Pure parse, but which chars allowed? setup url-chars: charset [#"a" - #"z" #"0" - #"9" "/.:"] ; which chars? parse/all s[ some[ s: to http:// p: copy url any url-chars ( emit copy/part s p emit [build-tag [a href (url)] url </a>] ) ] (emit s) ] print emit "" ; Using load/next in parse is coomplex, but still: setup parse/all s [ some[ s: to http:// p: ( emit copy/part s p set[url p] load/next p emit [build-tag [a href (url)] url </a>] ) :p ] (emit s) ] print emit "" ; With a kind of multiple 'to, different fonts for http/ftp, for demonstration : setup url-chars: charset [#"a" - #"z" #"0" - #"9" "/.:"] ; which chars? parse/all s[ some[ s: "http://" any url-chars p: ( url: copy/part s p emit [build-tag [a href (url)] <b> url </b> </a>] ) | s: "ftp://" any url-chars p: ( url: copy/part s p emit [build-tag [a href (url)] <i> url </i> </a>] ) | s: skip (emit s/1) ] ] print emit ""

> Ideas and workaround appreciated. > --

<<quoted lines omitted: 3>>

> To unsubscribe from the list, just send an email to > lists at rebol.com with unsubscribe as the subject.

-- -Volker Any problem in computer science can be solved with another layer of indirection. But that usually will create another problem. David Wheeler

[12/14] from: greggirwin::mindspring::com at: 9-Oct-2005 11:26

Hi Patrick,

>>> to-block "(foo"

PPln> ** Syntax Error: Missing ) at end-of-script PPln> ** Near: (line 1) (foo PPln> It seems like a bug to me. Not a bug. REBOL needs to convert the data to valid REBOL values, which (foo isn't. This comes up from time to time, but it's just a distinction we need to acknowledge; there are times when we'll need to use string parsing, rather than block parsing, even though it's more effort. -- Gregg

[13/14] from: Izkata::Comcast::net at: 9-Oct-2005 15:09

> Hi Patrick, >>>> to-block "(foo"

<<quoted lines omitted: 6>>

> to acknowledge; there are times when we'll need to use string parsing, > rather than block parsing, even though it's more effort.

There is a (kinda) workaround: (Remembered from a "why does a word have spaces" question years ago)

>> to-word "(foo"

== (foo

>> X: append [] to-word "(foo"

== [(foo]

>> ? X/1

X/1 is a word of value: (foo When you just 'to-block (or 'load) the string, Rebol sees the ( and thinks it's a paren, but if it doesn't matter, or a paren isn't what you want... Thar ye go! -Izzy

[14/14] from: inetw3::mindspring::com at: 10-Oct-2005 23:59

to-block mold "(foo" or... str: "(foo" if type? str = string! [ to-block mold str]

Notes

Quoted lines have been omitted from some messages.
View the message alone to see the lines that have been omitted