Mailing List Archive: 49091 messages
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

[REBOL] Re: XML / dialects

From: joel:neely:fedex at: 7-Jan-2002 6:29

HI, again, Petr, I found a sample... Petr Krenzelok wrote:
> ... and there are examples in the book of how to create one, > in some language called OmniMark... >
Errol Chopping of MIT has an on-line tutorial for OmniMark at http://clio.mit.csu.edu.au/omnimark/ In his first chapter there are a couple of small samples to demonstrate OmniMark As an example, suppose a text file called 'timetable.dat' contains the complete timetable for a large university. A tiny fragment of the file is shown below, the actual file is very large and covers several hundred subjects taught in several hundred rooms throughout any academic week. EEB121 THE E/C PROFESSION: AN INTRO Subject co-ordinator: L. Harrison L Mon 1300 - 1350 S15 - 2.05 T1 Wed 1400 - 1450 C02 - 112 T2 Wed 1300 - 1350 C02 - 112 T2 Thu 1300 - 1350 S01 - 102 T1 Thu 0900 - 0950 S01 - 101 T3 Thu 1000 - 1050 S01 - 101 T3 Thu 1400 - 1450 C03 - 403 EEB322 ISSUES IN CARE & EDUCATION Subject co-ordinator: T. Simpson L Tue 0900 - 0950 S01 - 102 T1 Tue 1100 - 1250 C08 - 1.04 T2 Tue 1400 - 1550 C08 - 1.04 A list of all the times a particular room (say S01-102) is used might be needed. Finding this information is difficult to do manually because the whole timetable is sorted by subject, not by room. To find, collect and display the list of times we need to find all occurrences of the sequence 'S01 - 102' in the file and output the day and time information for these occurrences. By inspection we can identify some patterns which can be used to design the search: - the room information sequence occurs on a line of text; - each line starts with a one or two character code; - the day and time is before the room; - each day name is three letters; - the time is four digits, a space, a hyphen, a space and another four digits. An OmniMark find rule to locate and capture the day and time information might be: [Code Sample: C01T05a.xom] 001 process 002 submit file "timetable.dat" 003 004 find line-start any{2} white-space+ 005 (letter{3} white-space+ 006 digit{4} white-space+ 007 "-" 008 white-space+ 009 digit{4}) => dayAndTime 010 white-space+ "S01 - 102" 011 output "%x(dayAndTime)%n" 012 013 find any 014 Here the 'find any' rule (on line 13) consumes all characters not found by the first find rule so that the only output is that delivered by the statement on line 11; that is, all the days and times used for room S01-102. I assume that this example aims to give a feel for the notation; it certainly doesn't impress me with power. In Perl, for example, one can write: open (TIMES, "timetable.dat"); while (<TIMES>) { if (/..\s*([a-z]{3}\s\d{4}\s-\s\d{4})\sS01 - 102/) { print "$1\n"; } } The second example is a bit more interesting... As well as parsing, OmniMark allows any SGML or XML document to be translated into any other arbitrary format. A fragment of XML is shown below. It contains a group of people: <!DOCTYPE PEOPLE SYSTEM "people.dtd"> <PEOPLE> <NAME>Mary Smith</NAME> <CITY PCODE="2795">Bathurst</CITY> <COUNTRY>Australia</COUNTRY> <NAME>Wally Wallpaper</NAME> <CITY PCODE="2222">Hurstville</CITY> <COUNTRY>Australia</COUNTRY> <NAME>Sam Widge</NAME> <CITY PCODE="1234">Bangalore</CITY> <COUNTRY>India</COUNTRY> </PEOPLE> An OmniMark program containing element rules can be written to process this XML. As a trivial example, the following rules output all the peoples' names and postcodes. Each name and corresponding postcode is placed on a separate line and a tab character is inserted between the name and the postcode. The output file is thus a tab-delimited file which could easily be imported into a spreadsheet. [Code Sample: C01T06a.xom] 001 process 002 do xml-parse document 003 scan file "people.xml" 004 output "%c" 005 done 006 007 element people 008 output "%c" 009 010 element name 011 output "%c" 012 output "%t" 013 014 element country 015 suppress 016 017 element city 018 output "%v(pcode)%n" 019 suppress 020 With this kind of process, the XML (or SGML) data is streamed into OmniMark and is parsed against a DTD. Then each element is fed to the program. As the program sees each element, one of the element rules is fired and does the appropriate work with the element's content and/or attributes. Even without too much previous knowledge of SGML, XML or OmniMark the program should be reasonably easy to follow; the symbol %c is a reference to the content of each element and the %v symbol is a reference to the value of an attribute. The statement 'suppress' avoids firing rules for the content of an element. Note that the programmer does not need to worry about low level details like finding angle brackets, element names or attributes in the raw data - OmniMark handles all of this and leaves the programmer with the high-level task of doing something with the information. Maybe someone would enjoy coding the equivalent of both examples in REBOL... -jn- -- ; sub REBOL {}; sub head ($) {@_[0]} REBOL [] # despam: func [e] [replace replace/all e ":" "." "#" "@"] ; sub despam {my ($e) = @_; $e =~ tr/:#/.@/; return "\n$e"} print head reverse despam "moc:xedef#yleen:leoj" ;