Mailing List Archive: 49091 messages
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

Need Some PARSE Help

 [1/3] from: gerardcote:sympatico:ca at: 17-Oct-2005 20:28


Hi everybody, in an effort to augment the interest of a friend for REBOL I recently tried to create a simple datamining app that could analyze theatre information about films presentation days and hours. The site from which I retrieve the information comes from the french site http://cinemaquebec.com). In fact for the moment my biggest problem come from the fact that I don't fully understand the way PARSE works when it encounters newline characters. Let me give a simplified example extracted from the site to illustrate my point: t4: { Fri.: 1:00, 3:00, 7:00 Sat., sun., mon., tue., wed., thu.: 10:00am, 1:00, 3:00, 9:00, 10:00} Here we have one day (Fri.) followed by a colon(:) followed again by 3 times. Right after this cycle is done again with not one but 6 days separated by (,) again followed by a colon (:) and 5 other times. I wrote a block of relatively simple rules that apply well against this simple example. Here is the result I get from the parse:
>> parse t4 rules2/expr
which-day: "Fri." 4 Hour: "1" 1 Min: "00" 2 which-hour: " 1:00" 5 Hour: "3" 1 Min: "00" 2 which-hour2: " 3:00" 5 Hour: "7" 1 Min: "00" 2 which-hour2: " 7:00 " 6 which-days: "Fri.: 1:00, 3:00, 7:00 " 23 which-day: "Sat." 4 which-day2: " sun." 5 which-day2: " mon." 5 which-day2: " tue." 5 which-day2: " wed." 5 which-day2: " thu." 5 Hour: "10" 2 Min: "00" 2 which-hour: " 10:00" 6 Hour: "1" 1 Min: "00" 2 which-hour2: " 1:00" 5 Hour: "3" 1 Min: "00" 2 which-hour2: " 3:00" 5 Hour: "9" 1 Min: "00" 2 which-hour2: " 9:00" 5 Hour: "10" 2 Min: "00" 2 which-hour2: " 10:00" 6 which-days2: {Sat., sun., mon., tue., wed., thu.: 10:00am, 1:00, 3:00, 9:00, 10:00} 68 film-hours: { Fri.: 1:00, 3:00, 7:00 Sat., sun., mon., tue., wed., thu.: 10:00am, 1:00, 3:00, 9 :00, 10:00} ---------------------------------------------------------- == true Now I include my parse rules if I want to let those interested understand the way I did. (for convenience I also attach them to this msg.) You'll notice the many PRINTs to help me navigate in parallel with parse. rules2: make object! [ expr: [copy film-hours film-hours-rules (print ["film-hours: " mold film-hours newline "----------------------------------------------------------" newline]) to end ] film-hours-rules: [copy which-days days-group (print ["which-days: " mold which-days length? which-days]) any [copy which-days2 days-group (print ["which-days2: " mold which-days2 length? which-days2]) ] ] days-group: [copy which-day day (print ["which-day: " mold which-day length? which-day]) any ["," copy which-day2 day (print ["which-day2: " mold which-day2 length? which-day2]) ] ":" copy which-hour show-hour (print ["which-hour: " mold which-hour length? which-hour]) 0 1 "am" any ["," copy which-hour2 show-hour (print ["which-hour2: " mold which-hour2 length? which-hour2]) 0 1 "am" ] ] digit: charset [#"0" - #"9"] hour: [digit 0 1 digit] minutes: [digit digit] show-hour: [copy this-hour hour (print ["Hour:" mold this-hour length? this-hour]) ":" copy this-min minutes (print ["Min:" mold this-min length? this-min])] english-day: ["Fri." |"Sat." |"Sun." |"Mon." |"Tue." |"Wed." |"Thu." |"Every day"] french-day: ["Ven." |"Sam." |"Dim." |"Lun." |"Mar." |"Mer." |"Jeu." |"Tous les jours"] day: ["Fri." |"Sat." |"Sun." |"Mon." |"Tue." |"Wed." |"Thu." |"Every day"] ] Now my problem is stated as this: When I submit a broken (newline) set of data in the form of a new t4 as follows, my rules no more work: t4: { Fri.: 1:00, 3:00, 7:00 Sat., sun., mon., tue., wed., thu.: 10:00am, 1:00, 3:00, 9:00, 10:00} The new results are now more like this:
>> parse t4 rules2/expr
which-day: "Fri." 4 Hour: "1" 1 Min: "00" 2 which-hour: " 1:00" 5 Hour: "3" 1 Min: "00" 2 which-hour2: " 3:00" 5 Hour: "7" 1 Min: "00" 2 which-hour2: " 7:00" 5 which-days: "Fri.: 1:00, 3:00, 7:00" 22 film-hours: " Fri.: 1:00, 3:00, 7:00" ---------------------------------------------------------- == true The second part of results have been chopped. Later this chopped part mixed with the next title film when I complete my rules to get the title after the last presentation time Any help is appreciated. Regards, Gerard -- Binary/unsupported file stripped by Ecartis -- -- Type: text/x-rebol -- File: parse-film-times.r

 [2/3] from: Tom::Conlin::gmail::com at: 17-Oct-2005 21:36


Hi Gerard, these can usually be fixed by using parse's /all refinement and handeling white space yourself. I find I almost allways do this when I am doing more than simple string splitting. make a rule that accepts white space and include it at all the places you need it. .... ws: charset [#" " #"^-" #"^/"] .... english-day: ["Fri." |"Sat." |"Sun." |"Mon." |"Tue." |"Wed." |"Thu." |"Every day" some ws] .... parse/all t4 rules2/expr .... Gerard Cote wrote:

 [3/3] from: gerardcote::sympatico::ca at: 18-Oct-2005 21:26


Thanks, Tom I'll try to push this problem as far as possible without having to manage spacing by myself - replacing/all "^/" with " " (single space). This has been done and now there are other subtleties that I must cope with. At first they looked as parse flaws (reliability problem) but it seems that the origin of some problems has to do with inconsistencies in the input. Nevertheless when I apply other optional rules, I can't eliminate their effect - at least for the moment. I'll try harder and if I can't find it myself I'll ask help again. Regards, Gerard
> Hi Gerard, > these can usually be fixed by using parse's /all refinement and
<<quoted lines omitted: 10>>
> parse/all t4 rules2/expr > ...
Thank you Tom. I'm sure I will need your strategy for some part of my work but for the moment I went around. While I was waiting for an answer, I tried to replace/all every newline with a single space. Now the next problem seems to be more of a limitation of PARSE than anything else but I am misplaced to judge by myself about that. I explain. In chapter 15 of the Rebol core manual I saw that I can ask alternatives to the "to" word as in : parse string [ "a" | "the" to "phone" (print "answer") | to "radio" (print "listen") | to "tv" (print "watch") ] answer At first sight my problem seems similar but Parse refuses to recognise my rule :

Notes
  • Quoted lines have been omitted from some messages.
    View the message alone to see the lines that have been omitted