Parse refactoring puzzle

[1/16] from: SunandaDH:aol at: 30-Jul-2003 2:10

I've just written a function which, though it works, annoys me as I'm sure if I knew a bit more about parse, I could do it in one parse rather than a parse in a loop. The problem is we have a string that has some "place-holders" (identified as !!xxx!! where xxx can be anything), eg: {Welcome !!name!! your address is !!street!! !!town!! !!city!! !!state!! !!country!!} Code before ours has made substitutions, but may have left some stray place-holders: {Welcome John-Paul your address is The Vatican !!town!! Roma !!state!! Italia} So I now want to remove the remaining place-holders: {Welcome John-Paul your address is The Vatican Roma Italia} (We don't need to clear the doubled spaces. We'll let the caller do a trim if that's what they want). The function I wrote, and some test data is below. Can anyone suggest a one-parse approach? Thanks! Sunanda. ;; ======================================= clear-place-holders: func [str [string!] /local out-str ph ][ out-str: copy str ;;work on copy so str if unchanged if we fail forever [ ph: none parse/all out-str [thru "!!" copy ph to "!!"] if none? ph [break] replace/all out-str join "!!" [ph "!!"] "" ] return out-str ] print clear-place-holders {no place holders in this string} print clear-place-holders {!!begin!!text!!end!!} print clear-place-holders {text!!middle!! more text} print clear-place-holders {!!bad-holder and some text} print clear-place-holders {!!good!holder, even if a little weird!!and some text} print clear-place-holders {same!!same!!!!same!!!!same!! place-holder!!same!! five!!same!! times} print clear-place-holders {leaves null place-holder !!!! in output, but another function might remove it. Either way is fine} ;; =======================================

[2/16] from: AJMartin:orcon at: 30-Jul-2003 19:17

Sunanda wrote:

> The function I wrote, and some test data is below. Can anyone suggest a

one-parse approach? [ Rebol [ Name: 'Remove_Place_Holders Title: "Remove Place Holders" File: %"Remove Place Holders.r" Author: "AJ Martin" Copyright: "Free!" Date: 30/July/2003 ] Remove_Place_Holders: function [String [string!]] [Mark Start Stop] [ Mark: {!!} parse/all String [ any [ Start: Mark thru Mark End: ( End: remove/part Start End ) :End | skip ] end ] String ] foreach [Test Goal] [ {Welcome John-Paul your address is The Vatican !!town!! Roma !!state!! Italia} {Welcome John-Paul your address is The Vatican Roma Italia} {no place holders in this string} {no place holders in this string} {!!begin!!text!!end!!} {text} {!!good!holder, even if a little weird!!and some text} {and some text} {leaves null place-holder !!!! in output, but another function might remove it. Either way is fine} {leaves null place-holder in output, but another function might remove it. Either way is fine} ] [ if Goal <> Actual: Remove_Place_Holders copy Test [ print [ "Problem!" newline "Test:" mold Test newline "Actual:" mold Actual newline "Goal:" mold Goal ] halt ] ] print "All OK!" halt ] And running it, produces: All OK!

Note that 'Remove_Place_Holders modifies it's input string; the test code supplies a copy so that we can what the problems are. Sunanda wrote:

> Code before ours has made substitutions, but may have left some stray

place-holders:

> {Welcome John-Paul your address is The Vatican !!town!! Roma !!state!! > Italia} > > So I now want to remove the remaining place-holders:

The substitution code could easily remove placeholders that don't have a value. I hope that helps! Andrew J Martin ICQ: 26227169 http://www.rebol.it/Valley/ http://Valley.150m.com/

[3/16] from: AJMartin:orcon at: 30-Jul-2003 19:22

Oops! Little problem with using the wrong word, which is fixed in this version: [ Rebol [ Name: 'Remove_Place_Holders Title: "Remove Place Holders" File: %"Remove Place Holders.r" Author: "AJ Martin" Copyright: "Free!" Date: 30/July/2003 Version: 1.1.0 ] Remove_Place_Holders: function [String [string!]] [Mark Start Stop] [ Mark: {!!} parse/all String [ any [ Start: Mark thru Mark Stop: ( Stop: remove/part Start Stop ) :Stop | skip ] end ] String ] foreach [Test Goal] [ {Welcome John-Paul your address is The Vatican !!town!! Roma !!state!! Italia} {Welcome John-Paul your address is The Vatican Roma Italia} {no place holders in this string} {no place holders in this string} {!!begin!!text!!end!!} {text} {!!good!holder, even if a little weird!!and some text} {and some text} {leaves null place-holder !!!! in output, but another function might remove it. Either way is fine} {leaves null place-holder in output, but another function might remove it. Either way is fine} ] [ if Goal <> Actual: Remove_Place_Holders copy Test [ print [ "Problem!" newline "Test:" mold Test newline "Actual:" mold Actual newline "Goal:" mold Goal ] halt ] ] print "All OK!" halt ] Andrew J Martin ICQ: 26227169 http://www.rebol.it/Valley/ http://Valley.150m.com/

[4/16] from: g:santilli:tiscalinet:it at: 30-Jul-2003 10:11

Hi Sunanda, On Wednesday, July 30, 2003, 8:10:29 AM, you wrote: Sac> So I now want to remove the remaining place-holders: Sac> {Welcome John-Paul your address is The Vatican Roma Italia} Sac> (We don't need to clear the doubled spaces. We'll let the caller do a trim if Sac> that's what they want). QAD, clear-place-holders: func [str [string!] /local out-str start finish] [ out-str: make string! length? str parse/all str [ any [start: to "!!" finish: 2 skip thru "!!" (insert/part tail out-str start finish)] start: to end (insert tail out-str start) ] out-str ] HTH, Gabriele. -- Gabriele Santilli <[g--santilli--tiscalinet--it]> -- REBOL Programmer Amiga Group Italia sez. L'Aquila --- SOON: http://www.rebol.it/

[5/16] from: AJMartin:orcon at: 30-Jul-2003 21:25

I realised that I could get rid of the '| (the 'any and 'to do the job). Remove_Place_Holders: function [String [string!]] [Mark Start Stop] [ Mark: {!!} parse/all String [ any [ to Mark Start: Mark thru Mark Stop: ( Stop: remove/part Start Stop ) :Stop ] to end ] String ] Andrew J Martin ICQ: 26227169 http://www.rebol.it/Valley/ http://Valley.150m.com/

[6/16] from: Christian:Ensel:GMX at: 30-Jul-2003 18:44

Hello Sunanda, if your problem won't become more complex than stated, then maybe

>> replace/all "... my !!string!! with !!delimiters!! ..." "!!" ""

== "... my string with delimiters ..." will help you getting the job done. Cheers, Christian

[7/16] from: antonr::iinet::net::au at: 31-Jul-2003 16:14

This version modifies the string in place (in case you have a very long string): clear-place-holders: func [str [string!] /local start ] [ probe parse/all str [ any [ ;(print "----") to "!!" start: ; move to first mark, set start to current index ;(prin index? start ?? start) 2 skip thru "!!" ; skip a character 2 times, move index beyond next mark finish: ; set finish to current index ;(prin index? finish ?? finish) :start ; set index back to start (remove/part start finish) ; remove text from start to finish ] to end ] str ] ; test foreach [example goal][ {hello !!name!! anton, your phone is !!phone!! 123456} {hello anton, your phone is 123456} {head!!abcd!efgh!!tail} {headabcd!efghtail} {head! !abcd!efgh!!tail} {head! !abcd!efgh!!tail} ][ print ["---^/" example] print clear-place-holders copy example ] Anton.

[8/16] from: g:santilli:tiscalinet:it at: 31-Jul-2003 10:06

Hi Anton, On Thursday, July 31, 2003, 8:14:16 AM, you wrote: A> This version modifies the string in place A> (in case you have a very long string): Notice that repeated REMOVE/PARTs are probably going to be much slower on a big string that copying it all. Though, it's native, so you'll probably only notice it on very long strings... Regards, Gabriele. -- Gabriele Santilli <[g--santilli--tiscalinet--it]> -- REBOL Programmer Amiga Group Italia sez. L'Aquila --- SOON: http://www.rebol.it/

[9/16] from: SunandaDH:aol at: 31-Jul-2003 14:06

Thanks Andrew, Anton, Christian, and Gabriele, Thanks for the various improvements to my first-cut code. Yes, I have very long strings -- the place-holders are embedded in template HTML pages, so several to many K is typical -- so fast solutions are what I need. Sunanda.

[10/16] from: andrew:martin:colenso:school at: 1-Aug-2003 8:07

Sunanda wrote:

> Yes, I have very long strings -- the place-holders are embedded in

template HTML pages, so several to many K is typical -- so fast solutions are what I need. If that's a concern, I'd test the various solutions to see which is the fastest. I'd also make the script make only one pass through the template and replace all the place-holders at once, including the ones that don't match. Andrew J Martin Attendance Officer & Information Systems Trouble Shooter Colenso High School Arnold Street, Napier. Tel: 64-6-8310180 ext 826 Fax: 64-6-8336759 http://colenso.net/scripts/Wiki.r?AJM http://www.colenso.school.nz/ DISCLAIMER: Colenso High School and its Board of Trustees is not responsible (or legally liable) for materials distributed to or acquired from user e-mail accounts. You can report any misuse of an e-mail account to our ICT Manager and the complaint will be investigated. (Misuse can come in many forms, but can be viewed as any material sent/received that indicate or suggest pornography, unethical or illegal solicitation, racism, sexism, inappropriate language and/or other issues described in our Acceptable Use Policy.) All outgoing messages are certified virus-free by McAfee GroupShield Exchange 5.10.285.0 Phone: +64 6 843 5095 or Fax: +64 6 833 6759 or E-mail: [postmaster--colenso--school--nz]

[11/16] from: joel:neely:fedex at: 31-Jul-2003 16:05

Hi, Sunanda, This sounds familiar, and I suggest that PARSE is overkill... [SunandaDH--aol--com] wrote:

> Yes, I have very long strings -- the place-holders are embedded in template > HTML pages, so several to many K is typical -- so fast solutions are what I > need. >

I've solved a similar problem in the past like so: 1) Put the tags and corresponding values in a block, as in: data: [ "!!name!!" "John Doe" "!!age!!" "23" "!!present!!" "new car" ] 2) Use a string for the message template, with embedded tags: message: { Hi, !!name!!, Happy birthday! On this day when you turn !!age!!, we'd like to celebrate with you. You should be receiving our present, a !!present!! in the mail. Every !!age!!-year-old should have a !!present!!, don't you think? Again, !!name!!, happy !!age!!! Your friends } 3) Use the following simple function to return a filled-in template based on the tag/value pairs: fill-it: func [msg [string!] stuff [block!] /local result] [ result: copy msg foreach [tag value] stuff [ replace/all result tag value ] result ] 4) Get results as follows:

>> print fill-it message data

Hi, John Doe, Happy birthday! On this day when you turn 23, we'd like to celebrate with you. You should be receiving our present, a new car in the mail. Every 23-year-old should have a new car, don't you think? Again, John Doe, happy 23! Your friends

If you start with a default data block default_data: [ "!!name!!" "" "!!age!!" "" "!!present!!" "" ] and clone and fill in the values for each use, then any value you don't replace is the default (which can be an empty string as above to make unused tags disappear). HTH -jn- -- ---------------------------------------------------------------------- Joel Neely joelDOTneelyATfedexDOTcom 901-263-4446 Counting lines of code is to software development as counting bricks is to urban development.

[12/16] from: SunandaDH:aol at: 1-Aug-2003 2:55

Hi Joel:

> This sounds familiar, and I suggest that PARSE is overkill...

Thanks for the comments! We're getting into systems design philosophy here, but my particular issue is that I want clear-place-holders to be generic; and. in the application I'm designing, that seems the best approach: ** There are lots of different bits of code before clear-place-holders (each a different cgi program with a different html template). The _only_ reason each of them would need to present a complete list of place-holders would be if clear-place-holders wasn't generic. ** The html-template writer (a human at this stage) can add place-holders for items that are not supported or substituted by the code. (They might just be doodling, thinking, "would !!todays-date!! look good here, or here?" before telling me that we should support the !!todays-date!! place-holder). All I know is that as the last step before printing the HTML page, I want to remove any stray place-holders that are left. The parse solutions (mine and others') decouple the clear-place-holders function from anything above it....With the one proviso that it has to trust that higher code has substituted "!" for any "!" that the webpage user may have typed into a field -- just to make sure we don't do place-holder removal when someone's address really happens to be "!!my house!!". (Your solution has the same problem, though it's less likely to be triggered as the webpage user would have to type the text of an actual place-holder). Sunanda.

[13/16] from: joel:neely:fedex at: 1-Aug-2003 8:21

Hi again, Sunanda, I think our mutual goal is to separate logic from presentation. As you imply, context is everything... [SunandaDH--aol--com] wrote:

> We're getting into systems design philosophy here, but my particular > issue is that I want clear-place-holders to be generic; and. in the > application I'm designing, that seems the best approach: >

I understand that the intended "use-case" would shape the requirements of the function; I'm just thinking out loud here (because I find it interesting to see how different solution strategies compare and contrast with each other). I tend to have a "message (block of tag/value pairs, object, etc.) that represents only the (unformatted) content under construction, with a step at the end that renders that content appropriately (via the template, in this case). If the processing is multiple stages, each stage can concern itself with only the portions of the content that are its business. It would, of course, be easy to prefix the entire pipeline with a factory that reads the template and constructs a default message, which is then passed along to the first "real" processing stage. This would also allow multiple factories, so that one default message replaces unused tags with empty strings, while another (for use during the development effort) would replace e.g. !!foo!! with **FOO HERE** to make visible the layout (or accidental processing skips). I've toyed with the idea (without the time to fully develop it) of having a "template compiler" that would move as much work out of the request-response cycle as possible; the TC would read a template file and create another file containing both the template and REBOL values for (or expressions to construct) the default message. That second file could be LOADed by the request-handling code, with the message being passed down through the pipline before being used to fill in the template at the end.

> The parse solutions ... decouple the clear-place-holders function from > anything above it....With the one proviso that it has to trust that

<<quoted lines omitted: 4>>

> triggered as the webpage user would have to type the text of an actual > place-holder).

Actually, I tend to favor distinct begin-tag/end-tag markers, such as Dear {{{NAME}}}, Happy birthday number {{{AGE}}} ... for a couple of reasons: 1) It further minimizes the likelihood of the template accidentally containing a tag (or worse, half of a tag!), 2) It (IMHO) is more visually distinct and unambiguous, 3) It is at least as easy to code the pattern match, but failures can be checked perhaps a bit easier (e.g. after finding all {{{...}}} occurrences, verify that no open or close markers remain in the template text, nor in the tag strings themselves.. Thanks for stimulating the previous interesting discussion! -jn-

[14/16] from: maximo::meteorstudios::com at: 1-Aug-2003 10:37

long mail, sorry. RE: Re: Parse refactoring puzzle

Hi, sunanda, I have been following this thread silently, but now that steel is public I will give you a little primer on the remark.r tool. REMARK.R is a complete html markup extension engine. Basically whenever the engine encounters one of the tag extensions, it recursively parses it and replaces the tag with whatever data is returned by that tag's local build-html function. What I like is that because the tags follow the html <> useage, they look just right in the html code and can easily be spotted. there are two distinct tag extension mechanisms: HTAGS: The system allows you to add new tags at will , just by adding files to a directory and then putting the html text you want in those files. Even though the htags are static code on disk, because the engine parses them at run-time, you can add dynamic tags in the static pages layout, and they will also be replaced, recursively. RTAGS: Dynamic tags which actually call some methods on user-defined prebuilt tags descriptors. for example <date!> replaces itself with the current date (at build time). You can add any amount of tags, just by adding an object in the rtags file. you can also supply additional parameters to rtags. I'll follow with more details, explanations and examples: --------------------------------------------------------- If you have a htag saved as footer.html which has this content: <p3><i>site updated on <date!></i></p> and you have a src html file which includes the following in it: <body> <p> some very usefull text right here... ;-) </p> <footer!> </body> then the system will first build the html code for the footer! tag (copying the footer.html content) it will then call the replace-tags on that code block, so that the <date!> will also be replaced. this goes on foreach tag extension it finds. by using rtags, I have even created a tag called <rebol! > which lets you execute rebol code right there and then, and uses the return value as the code! so the <date!> rtag could easily be coded like this too <rebol! now/date> The only thing which is not yet implemented (but which could be done easily) is to tie it up as a cgi executable OR to allow it to use data files or a databse as instead of prebuilt src html files. a work around would be something like a file with following contents: <rebol! global-values: make object! [name: "mr bright" age: 34 fav_color: blue] ""> <greeting-page!> and have a greeting-page.html in your htags directory which looks like so. <html> <body> <p> hello <username!> , </p> <p>your favorite color is <favcolor!> <agetext!>.</p> </body> </html> As you imagine <username!> <favcolor!> <agetext!> would be rtags which look up an object called global-values and return its data with some formatting if you want. to make it even more flexible, you might want to replace the previous source html by: <your-look-up-database-func! "mr. bright" ""> <greeting-page!> the look-up-database! would then be a function which fills the global-values with db data. If your cgi script writes the src-file to disk, calls remark.r and returns the generated file, you could use it for cgi right now. THe steel site uses it for each page... for an example of an advanced rtag, you may notice that the menu on the upper right changes depending on the item you clicked on (the current file is red and does not containt a link, since you are already on the file).. The menu <mainmenu!> is inserted in my header htag and is defined elsewhere and is an <rmenu!> tag. here is my complete html code for the steel's news page before it goes through remark: (I remove most of the contents, though) <header!> <h3>NEWS!</h3> <h4 class="newstitle">2003/07/30 - Site Creation</h4> [... NEWS CONTENT DATA HERE!!! ...] <BR> <BR> <BR> <BR> <BR> <custfooter!> you can see that I do not have to concentrate on the layout, its all in my header and footer htags. Since the header contains the menu, one is added to each page in the site. remark.r will be available next week, with some documentation and example site I hope. Its part of the retool section on the steel website. HTH! -max ----------- meteor Studios, T.D. ----------- Never Argue with an idiot. They will bring you down to their level and beat you with experience

[15/16] from: SunandaDH:aol at: 1-Aug-2003 11:44

Re: Parse refactoring puzzle

Hi Joel:

> I understand that the intended "use-case" would shape the requirements > of the function

Absolutely, -- and I am also cutting corners by producing a function that does just what I need rather than a generic industrial-strength one. For example, an industrial-strength clear-place-holders may need options to: ** only clear officially-sanctioned place-holders ** do "automatic" replacements for place-holders like !!date!! or !!user-name!! ** report stray place-holders left over (because, for some applications, that would indicate a serious error) ** behave differently depending on whether it is running test or production (e.g. test pops up a message, production silently writes to an error log)

> I tend to have a "message (block of tag/value pairs, object, etc.) that > represents only the (unformatted) content under construction,

That's a useful approach that doesn't really fit my case. For example, one of my place-holders is: !!dynamic-field!! That gets replaced by a HTML table row with an input field _and_ another !!dynamic-field!! place-holder. So the application uses this place-holder as it trundles through the database to dynamically create input fields. At the end, clear-place-holders, removes the !!dynamic-field!! place-holder. That's more of a dynamic structure than might be easy with a fixed table at the start. Hi Maxim:

> I have been following this thread silently, but now that steel is > public I will give you a little primer on the remark.r tool.

I think if I was starting again, I'd definitely look at Steel or REBOL Server Pages for HTML webpage creation. As it is, I've got a half-built set of tools and a load of HTML templates using the !!xx!! schema, and I'm stuck with that until I bite the bullet and refactor. Thanks both for the ideas and encouragement! Sunanda.

[16/16] from: tomc:darkwing:uoregon at: 1-Aug-2003 11:38

just a minor detail but when I make placeholders for use in html I use  , if the tag gets through untranslated it is not visable in a browser... also search for  afterwards to detect untranslated or broken tags. On Fri, 1 Aug 2003 [SunandaDH--aol--com] wrote:

Notes

Quoted lines have been omitted from some messages.
View the message alone to see the lines that have been omitted