Parse problem
[1/14] from: patrick::philipot::laposte::net at: 7-Oct-2005 14:19
Hi List
A parse problem. I would like to parse a text to replace any url in it.
Example
Initial text :
Hey go visit http://www.me.org, it's great!
Result :
Hey go visit <a hrefhttp://www.me.org">http://www.me.org</a>, it's great!"
Not so easy, isn't it ?
--
Bye
Pat665
[2/14] from: greggirwin:mindspring at: 7-Oct-2005 7:26
Hi Patrick,
PP> A parse problem. I would like to parse a text to replace any url in it.
PP> Example
I hacked the link parser out of the IOS Conference reblet at one
point, but I didn't do any text replacement with it, I just made it a
little more general in that you could pass in a callback function. It
still has some issues that should be addressed (e.g. it includes
trailing commas on URLs), but it might give you something to build on.
-- Gregg
Pardon the formatting; watch for wrap.
REBOL []
link-parser: context [
white-space: charset reduce [#" " newline tab cr #"<" #">"]
non-white-space: complement white-space
to-space: [some non-white-space | end]
skip-to-next-word: [some non-white-space some white-space]
link-rule: copy []
callback: none
make-action: func [link] [
compose [
mark:
(link) (either string? link [[to-space end-mark:]] [])
(to-paren compose [callback copy/part mark end-mark])
any white-space
]
]
make-link-rules: func [schemes] [
clear link-rule
foreach scheme schemes [
repend link-rule [make-action scheme '|]
]
append link-rule 'skip-to-next-word
use [mark end-mark text offset] [bind link-rule 'mark]
]
set 'parse-links func [
input [any-string!]
action [any-function!]
/with
schemes [block!] "Block of scheme patterns to look for"
][
make-link-rules any [
schemes
["https://" "http://" "www." "ftp://" "ftp."]
]
callback: :action
error? try [parse/all input [any link-rule]]
]
]
s: {Check out http://www.rebol.org
And if you like that, you'll really like www.rebol.com
Then, for more excitement, port your data to ftp://blah-blah-blah
}
parse-links s func [url] [print url]
print "^/This pass will only look for FTP links^/"
s: {Check out http://www.rebol.org
And if you like that, you'll really like www.rebol.com
Then, for more excitement, port your data to ftp://blah-blah-blah
}
parse-links/with s func [url] [print url] ["ftp://"]
halt
[3/14] from: pwawood::mango::net::my at: 8-Oct-2005 14:23
Hi Patrick
I'm no expert like Gregg so I couldn't figure out how to use parse to
achieve what you wanted. So I wrote this simple function that may help.
I haven't tested it fully but hope it helps.
Regards
Peter
highlight-link: func [
{Wraps an <a href> tag around http urls in a string.}
text [string!]
"The string containing http urls."
/local
url-end
"Used to workout where the url ends"
][
until
[
text: find text "http:" ;; find a url
if text ;; "http:" found
[
;; find char after
url
url-end: copy find/tail text "http:"
until
[
url-end: next url-end
any
[
(url-end = "")
(#" " = first url-end)
(#"," = first url-end)
(#":" = first url-end)
(#";" = first url-end)
]
]
;; insert the <a> tag
either url-end = "" ;; url is at end
of text
[
insert text rejoin
[
{<a href="}
copy text
{">}
]
insert text: tail text "</a>"
][ ;; text after
url
url-end: first url-end
insert text rejoin
[
{<a href="}
copy/part text find text url-end
{">}
]
text: find/tail text {">} ;; skip over insert
text: find text url-end
insert text "</a>"
]
text = none
]
text: head text
]
]
[4/14] from: Patrick::Philipot::laposte::net at: 8-Oct-2005 20:35
Bonjour Peter,
Thank you for your code. Being a lazy guy, I was hoping to get
advantage of block parsing. Block parsing is able to match a type, the
type being url! in my case.
Example :
>> ablock: [foo dummy http://www.me.com foo]
>> parse ablock [ some [set u url! (print u) | skip]]
http://www.me.com
So finally, my solution is :
s: {Go see http://www.me.org, it is
fabulous. And http://aaa.mypicture.com with my photos}
bs: parse s none
remove-each w bs [not parse to-block w [url!]]
foreach w bs [
replace find s w w rejoin [{<a=href"} w {">} w {</a>}]
]
print s
Go see <a=href"http://www.me.org">http://www.me.org</a>, it is
fabulous. And <a=href"http://aaa.mypicture.com">http://aaa.mypicture.com</a> with my
photos
I'm pretty sure this code is not optimized and that some guru could
make a one liner of it. Anyway it will do the job the way I like.
--
Ciao
Patrick
[5/14] from: pwawood:mango:my at: 9-Oct-2005 12:30
Bonjour Patrick
Thanks for posting the parse solution. It's certainly shorter than my
do-it-yourself approach.
At the moment it seems as though it doesn't work properly if an url is in
the string twice:
>> s: {I love http://www.rebol.com, I hate http://www.rebol.com}
== {I love http://www.rebol.com, I hate http://www.rebol.com}
>> bs: parse s none
== ["I" "love" "http://www.rebol.com" "I" "hate" "http://www.rebol.com"]
>> remove-each w bs [not parse to-block w [url!]]
== ["http://www.rebol.com" "http://www.rebol.com"]
>> foreach w bs [
[ replace find s w w rejoin [{<a href="} w {">} w {</a>}]
[ ]
== {<a
href="http://www.rebol.com">http://www.rebol.com</a>">http://www.rebol.com</
a>
, I hate http://www.rebol.com}
You need to add something to the foreach loop so that you don't find the
first occurrence of the url when looking for the second.
I came up with this, which no doubt can be vastly improved upon:
>> foreach w bs [
[ replace find s w w rejoin [{<a href="} w {">} w {</a>}]
[ s: next find next find s w w
[ ]
== "ttp://www.rebol.com</a>"
>> s: head s
== {I love <a href="http://www.rebol.com">http://www.rebol.com</a>, I hate
<a href="h
ttp://www.rebol.com">http://www.rebol.com</a>}
Salut
Peter
[6/14] from: pwawood:mango:my at: 9-Oct-2005 13:22
Bonjour encore Patrick
I realised that my simplistic function had an logic error so that it
only would have processed the first url it came across. I've updated it
now.
highlight-links: func [
{Wraps an <a href> tag around http urls in a string.}
text [string!]
"The string containing http urls."
/local
url-end
"Used to workout where the url ends"
][
while [find text "http://"]
[
text: find text "http://"
;; find char
after url
url-end: copy find/tail text "http:"
until
[
url-end: next url-end
any
[
(url-end = "")
(#" " = first url-end)
(#"," = first url-end)
(#":" = first url-end)
(#";" = first url-end)
]
]
;; insert the <a> tag
either url-end = "" ;; url is at end of
text
[
insert text rejoin
[
{<a href="}
copy text
{">}
]
insert text: tail text "</a>"
][ ;; text after url
url-end: first url-end
insert text rejoin
[
{<a href="}
copy/part text find text url-end
{">}
]
text: find/tail text {">} ;; skip over insert
text: find text url-end
insert text "</a>"
]
]
text: head text
]
Regards
Peter
[7/14] from: Patrick::Philipot::laposte::net at: 9-Oct-2005 9:12
Hi Peter,
Thank you for pointing out the duplicate problem. I have found an easy
way to get rid of it.
first I remove the duplicate with :
bs: unique bs
Then I replace all accurrences with replace/all.
So the final draft is :
>> s: {Go see http://www.me.org, it is
{ fabulous. Really http://www.me.org is awesome!
{ And http://aaa.mypicture.com with my photos}
== {Go see http://www.me.org, it is
fabulous. Really http://www.me.org is awesome!
And http://aaa.mypicture.com with my photos}
>>
>> bs: parse s none
== ["Go" "see" "http://www.me.org" "it" "is" "fabulous." "Really" "http://www.me.org"
"is" "awesome!" "And" "http://
aaa.mypicture.c...
>>
>> remove-each w bs [not parse to-block w [url!]]
== ["http://www.me.org" "http://www.me.org" "http://aaa.mypicture.com"]
>>
>> bs: unique bs
== ["http://www.me.org" "http://aaa.mypicture.com"]
>>
>> foreach w bs [
[ replace/all find s w w rejoin [{<a=href"} w {">} w {</a>}]
[ ]
== {<a=href"http://aaa.mypicture.com">http://aaa.mypicture.com</a> with my photos}
>>
>> print s
Go see <a=href"http://www.me.org">http://www.me.org</a>, it is
fabulous. Really <a=href"http://www.me.org">http://www.me.org</a> is awesome!
And <a=href"http://aaa.mypicture.com">http://aaa.mypicture.com</a> with my photos
>>
Sometimes Rebol is so fun. I wish I had more time to play with it.
--
Ciao
Patrick
[8/14] from: Patrick:Philipot:laposte at: 9-Oct-2005 10:14
Hi List,
As I was testing my parse function, I have found an annoying problem
with to-block! is this a bug ? how to workaround it ?
My function use teh following line
remove-each w bs [not parse to-block w [url!]]
And it comes that to-block do not like parens. See
>> to-block "(right"
** Syntax Error: Missing ) at end-of-script
** Near: (line 1) (right
So there is some kind of evaluation
here. On the other hand, this is ok.
>> to-block "(right)"
== [(right)]
This is really bothering me, when Rebol is not fun at all :(
Ideas and workaround appreciated.
--
Ciao
Patrick
[9/14] from: Patrick::Philipot::laposte::net at: 9-Oct-2005 11:14
Hi List,
Previouly on "Parse problem"
The problem was to found how to replace any URL found in a text
by a valid HTML link <a href= ...>. You will find the complete answer
in the following.
However a second problem arised with to-block.
>> to-block "(foo"
** Syntax Error: Missing ) at end-of-script
** Near: (line 1) (foo
It seems like a bug to me. My workaround is to catch the error to
remove the "faulty" text from evaluation. Here is my code, if it may
be of some interest ...
; pat665 handling URL link directly in the text
;
;
;
transform-url: func [s [string!]] [
; replace any URL found by a valid HTML link <a href= ...>
bs: parse s none
remove-each w bs [error? try [to-block w]]
remove-each w bs [not parse to-block w [url!]]
bs: unique bs
foreach w bs [
replace/all find s w w rejoin [{<a href="} w {">} w {</a>}]
]
s
]
;
;
;
--
Ciao
Patrick
[10/14] from: carl:cybercraft at: 9-Oct-2005 14:56
Hi Patrick,
I had a quick attempt at your problem when you first posted it, and have to agree, it's
not so easy. ;-)
I did expect someone to come up with a good method using PARSE though, and am surprised
they haven't. Anyway, I've had another go tonight and have worked out a method - one
that uses a recursive approach.
Essentially, to find the end of an URL, it looks for a space, return character or the
end of the file and copies from the URL start to there. Then, if it didn't find the
end of the file the copied text is itself parsed. This finds the URLs, but they may
of course have commas and so on attached to their ends. It's an easy matter to strip
them off though, giving you the true URL which you can then modify to replace the original.
So, here's the code: There's three parts to it - a parse rule where the recursion is
performed, a function to call it, strip any extra characters and replace the original
and the main parsing routing which calls the function...
str: {Hey go visit http://www.me.org, it's great! Or http://www.you.org
or http://www.them.org or even http://www.us.org!}
rule: [
some [
to "http://" copy text to " " (parse text rule) |
to "http://" copy text to "^/" (parse text rule) |
to "http://" copy text to end
]
]
modify: does [
parse text rule
;-- Add characters to the FIND string in the next line to increase
;-- the types of characters you need to remove from the end of an URL.
while [find {.,!)"} last text][remove back tail text]
new-text: rejoin [{<a href="} text {">} text </a>]
change/part s new-text length? text
s: skip s -1 + length? new-text
]
parse str [
some [
to "http://" s: copy text to " " (modify) :s |
to "http://" s: copy text to "^/" (modify) :s |
to "http://" s: copy text to end (modify) :s
]
to end
]
print str
Let me know if this behaves for you! :-)
I've just thought of one problem - if there's already an URL inside tags within the text,
it'd screw them up. So this isn't a universal solution. And it should really be made
into a nice, tide function.
-- Carl Read.
On Friday, 7-October-2005 at 14:19:52 Patrick Philipot wrote,
[11/14] from: volker::nitsch::gmail::com at: 9-Oct-2005 14:29
On 10/9/05, [Patrick--Philipot--laposte--net] <[Patrick--Philipot--laposte--net]> wrote:
> Hi List,
> As I was testing my parse function, I have found an annoying problem
<<quoted lines omitted: 10>>
> == [(right)]
> This is really bothering me, when Rebol is not fun at all :(
Remember to-block needs valid rebol-code, else it is helpless.
Catch the error.
!> attempt[to-block "(bang"]
== none
!> attempt[to-block "bang"]
== [bang]
!> if u: attempt[to-block "(bang"] [url? first u]
== none
!> if u: attempt[to-block "http://yep"] [url? u/1]
== true
!> attempt[load "http://yep"]
== http://yep
!> url? attempt[load "http://yep"]
== true
Some other versions to do it:
setup: does[
s: {Go see http://www.me.org, it is
fabulous. And http://aaa.mypicture.com with my photos.
ftp://aaa.mypicture.com too}
emit: func[val][repend "" val]
]
setup
; Simplest in this case:
while[ p: find s "http://" ][
emit copy/part s p
set[url s] load/next p
emit [build-tag [a href (url)] url </a>]
]
emit s
print emit ""
; Pure parse, but which chars allowed?
setup
url-chars: charset [#"a" - #"z" #"0" - #"9" "/.:"] ; which chars?
parse/all s[
some[
s: to http:// p: copy url any url-chars (
emit copy/part s p
emit [build-tag [a href (url)] url </a>]
)
] (emit s)
]
print emit ""
; Using load/next in parse is coomplex, but still:
setup
parse/all s [
some[
s: to http:// p: (
emit copy/part s p
set[url p] load/next p
emit [build-tag [a href (url)] url </a>]
) :p
] (emit s)
]
print emit ""
; With a kind of multiple 'to, different fonts for http/ftp, for demonstration :
setup
url-chars: charset [#"a" - #"z" #"0" - #"9" "/.:"] ; which chars?
parse/all s[
some[
s: "http://" any url-chars p: (
url: copy/part s p
emit [build-tag [a href (url)] <b> url </b> </a>]
)
|
s: "ftp://" any url-chars p: (
url: copy/part s p
emit [build-tag [a href (url)] <i> url </i> </a>]
)
| s: skip (emit s/1)
]
]
print emit ""
> Ideas and workaround appreciated.
> --
<<quoted lines omitted: 3>>
> To unsubscribe from the list, just send an email to
> lists at rebol.com with unsubscribe as the subject.
--
-Volker
Any problem in computer science can be solved with another layer of
indirection. But that usually will create another problem.
David
Wheeler
[12/14] from: greggirwin::mindspring::com at: 9-Oct-2005 11:26
Hi Patrick,
>>> to-block "(foo"
PPln> ** Syntax Error: Missing ) at end-of-script
PPln> ** Near: (line 1) (foo
PPln> It seems like a bug to me.
Not a bug. REBOL needs to convert the data to valid REBOL values,
which (foo isn't.
This comes up from time to time, but it's just a distinction we need
to acknowledge; there are times when we'll need to use string parsing,
rather than block parsing, even though it's more effort.
-- Gregg
[13/14] from: Izkata::Comcast::net at: 9-Oct-2005 15:09
> Hi Patrick,
>>>> to-block "(foo"
<<quoted lines omitted: 6>>
> to acknowledge; there are times when we'll need to use string parsing,
> rather than block parsing, even though it's more effort.
There is a (kinda) workaround: (Remembered from a "why does a word have
spaces" question years ago)
>> to-word "(foo"
== (foo
>> X: append [] to-word "(foo"
== [(foo]
>> ? X/1
X/1 is a word of value: (foo
When you just 'to-block (or 'load) the string, Rebol sees the ( and thinks
it's a paren, but if it doesn't matter, or a paren isn't what you want...
Thar ye go!
-Izzy
[14/14] from: inetw3::mindspring::com at: 10-Oct-2005 23:59
to-block mold "(foo"
or...
str: "(foo"
if type? str = string! [ to-block mold str]
Notes
- Quoted lines have been omitted from some messages.
View the message alone to see the lines that have been omitted