AltME groups: search
Help · search scripts · search articles · search mailing listresults summary
world | hits |
r4wp | 5907 |
r3wp | 58701 |
total: | 64608 |
results window for this page: [start: 30301 end: 30400]
world-name: r3wp
Group: Parse ... Discussion of PARSE dialect [web-public] | ||
Maxim: 24-Dec-2008 | so I'll go back to the batcave and continue working on remark v2, and some other stuff... I want to release since a long time | |
BrianH: 29-Dec-2008 | The real advantage to the TO/THRU enhancement comes when it lets you avoid creating charsets, which are a lot less useful with Unicode. It should be pretty easy to implement. | |
BrianH: 29-Dec-2008 | I think that the proposals are more than Carl was thinking they would be - apparently he had forgotten the previous proposal lists. I don't think that it will be too much of a problem though, as there are not really that many proposals that are likely to be accepted. Some are competing proposals, of which only one would be chosen. Also, there aren't that many proposals overall - they are just thoroughly specified. | |
GiuseppeC: 29-Dec-2008 | Lets see how things evolves. Proposal are very interesting as they would easy a lot of work on building parse rules. Everything is silent apart some blog messages where I have read for the first time the word "Beta" connected with REBOL3. | |
BrianH: 29-Dec-2008 | My main concern is that Carl's main requirements of the proposal process have been ignored in some cases: - That the proposals be concisely specified. The Purpose and Importance statements should be one sentence each. - That there be no discussion of theory. - That there be no specification of equivalent rules. - That all discussions happen outside of the wiki. - That this is a proposals page, not documentation. | |
BrianH: 29-Dec-2008 | As it is, I hope Carl will read a paper that long when he gets to the point of taking on PARSE. | |
Janko: 31-Jan-2009 | aha, I remeber I learned a lot from that green page too.. thanks for links so far , I will read the pages and hopefully I will find something related to the problems I have | |
Janko: 31-Jan-2009 | or in terms of Brett's examples: == true >> a: copy "dog cat" parse a [ ANY [ thru "dog" (print 1) | thru "cat" (print 2) ] ] 1 2 == true >> a: copy "cat dog" parse a [ ANY [ thru "dog" (print 1) | thru "cat" (print 2) ] ] 1 == true | |
Janko: 31-Jan-2009 | basically similar problem that last time as I see now.. so by looking at that mailing list answers I have 2 solutions ... I use parse 3 times on a string.. or maybe I use Ladislav's parseen which he said solves this.. but I don't yet know how :) | |
Janko: 31-Jan-2009 | S WORKS IF IN THIS ORDER =heading= {comment some comment} - line 1 - line 2 -------------> <h1>heading</h1> <p>comment some comment</p> <li>line 1<li> <li>line 2</li> THIS DOESN'T WORK =heading= {comment some comment} =heading= - line 1 - line 2 =heading= {comment some comment} ADDITIONAL (SIMILAR) PROBLEM - line 1 + line 2 + line 3 - line 4 + line 5 -----------------> <li class="a">line 1</li> <li class="a">line 2</li> ... | |
Oldes: 31-Jan-2009 | >> ? complement USAGE: COMPLEMENT value DESCRIPTION: Returns the one's complement value. COMPLEMENT is an action value. ARGUMENTS: value -- (Type: logic number char tuple binary string bitset image) >> ? union USAGE: UNION set1 set2 /case /skip size DESCRIPTION: Creates a new set that is the union of the two arguments. UNION is a native value. ARGUMENTS: set1 -- first set (Type: series bitset) set2 -- second set (Type: series bitset) REFINEMENTS: /case -- Use case-sensitive comparison /skip -- Treat the series as records of fixed size size -- (Type: integer) >> | |
Janko: 31-Jan-2009 | uh, that is some advanced parse :) .. I will need a couple of days to think it through | |
Brock: 31-Jan-2009 | I'll try to explain complement. I like to think of a charset being a list of valid chars that can be tested for. However, say you need all characters of the alphabet minus a few. Instead of defining multiple ranges of characters as in charset "A-FH-K N-T V-Wa-z0-9" which effectively skips the chars G L & U, you could simply state complement[GLU], which would exclude these three characters from the charset but include all others. | |
Brock: 31-Jan-2009 | If there's something more specific or a technically better way to state the above please ad your infput | |
Janko: 1-Feb-2009 | Very interesting, both versions (Oldes and Steeve) , thanks a lot.. I think I understood most of it now | |
Oldes: 1-Feb-2009 | Is there any better way how to change the main parse rules during parse like this one? (just a simple example..in real life the lexers would be more complicated :) d: charset "0123456789" lexer1: [copy x 1 skip (probe x if x = "." [lexer: lexer2]) | end skip] lexer2: [copy x some d (probe x lexer: lexer1) | end skip] lexer: lexer1 parse "abcd.123efgh" [ some [() lexer]] | |
Steeve: 1-Feb-2009 | Not really Oldes... but what is your purpose ? isn't that a little obfuscated again You said it's just an example, but why can't you use the normal way ? I would like to know... parse "..." [ some [ #"." lexer2 | lexer1 ] ] | |
Maarten: 2-Feb-2009 | This weekend I got an interesting idea: algebraic (and recursive) data types are well known for their ability to implement parsers. And they are a great data modeling tool. E.g: data Bill = Name BankAccount | Company CreditCard data CreditCard = CVC2 CCNumber CCExpiryDate However, the opposite also holds, i.e you can model data domain using named parse rules without actions just as easy. Now, what if you would combine two dialects: one to define data structures and a separate one to attach actions. E.g. Post: [ message [string!] author [string!] timestamp [date!] ] Comments: [ some posts] blog [ 1 post comments] action 'JSON 'Post [ .... the action to convert the Post to JSON here ...] action 'XHTML 'POST [ ..... the action to convert Post to XHTML here...] process some-data 'JSON -> this gives back the data processed as for the JSON actions. It is a bit SAX like, with the difference that this models classes of action and separates them from the data in stead of scattering some lose actions. And, the data modeling still holds. | |
Maarten: 2-Feb-2009 | Then make actions for data to go to JSON, XML, XHTML, back and forth to a database,.... | |
Maarten: 3-Feb-2009 | Chris: 1) Yes, actually, that would be yhe idea 2) I think the data dialect would be a strict subset of parse, forcing you to use set-word/parse-rule pairs Hence, the set-words are available in the action. | |
Graham: 9-Feb-2009 | They call it a big Mac not a big Mc ... odd | |
Janko: 14-Feb-2009 | hi, it's me again with parse problems... I need this concretely to parse out web-page meta tags.. but I distilled the problem out of it to a minimal example.. | |
Janko: 14-Feb-2009 | doc1: "start A 1 end start B 2 end" how can you get value of 2 out | |
Janko: 14-Feb-2009 | It works with a because it's first , but becasuse it enters the "parse" with it and then doesn't match it doesn't again test the B >> parse doc1 [ "start" "A" copy R to "end" (print R) to end ] 1 == true >> parse doc1 [ "start" "B" copy R to "end" (print R) to end ] == false | |
Janko: 14-Feb-2009 | >> doc1: "start A 1 end xyz B 2 end" ;; in this case it must not take 2 == "start A 1 end xyz B 2 end" >> parse doc1 [ "start" thru "B" copy R to "end" (print R) to end ] ;; but it will that's why I can't u se to\thru 2 == true | |
Anton: 14-Feb-2009 | some ["start" ["A" | "B"] copy R to "end" "end"] | |
Janko: 14-Feb-2009 | ups ... my example above is wrong .. just a sec | |
Janko: 14-Feb-2009 | (this is the right example .. I forgot to use thru above so second wouldn't pass anyway... but result is the same) >> doc1: "start A 1 end start B 2 end" == "start A 1 end start B 2 end" >> parse doc1 [ thru "start" "A" copy R to "end" (print R) to end ] 1 == true >> parse doc1 [ thru "start" "B" copy R to "end" (print R) to end ] == false >> parse doc1 [ SOME [ thru "start" "B" copy R to "end" (print R) to end ] ] == false | |
Anton: 14-Feb-2009 | Is there anything expected between "start" and "A", for instance ? | |
Anton: 14-Feb-2009 | parse doc1 [some [thru "start" ["A" | "B"] copy R to "end" (?? R) "end"]] | |
Janko: 14-Feb-2009 | hm.. just a sec so I try few things | |
Janko: 14-Feb-2009 | Your solution, I thought it won't work if I reverse order of A and B in the string but it seems it does. I would need to know which one is A and B but I think this can be solved by setting some word ( ) inside [ A | B] ... so basically it seems to work... I think I can apply this way also to my concrete problem which is this | |
Janko: 14-Feb-2009 | ( I need to parse meta tags description and keywords and abstract if they exist -- they can come in any order, there can be one or multiple spaces/newlines/tabs between tag arguments, there can be " or ' used as argument="asdasd" ) >> doc2: {<head> { <title>Dragonicum.com - making the right business connections !</title> { <meta name="keywords" content="Company Directory, Join Us, Advanced Search, Trade Leads, Forum, Trade S { hows, Advertising, Translation, fair trade, trade portal, business to business, trade leads, trade even { ts, china export, china manufacturer" /> { <meta name="description" content="New international trade portal and company directory for Asia, Europe { and North America. Our priority No.1 is to create and maintain a safe, well lit business-to-business m { arketplace, by assisting our members in identifying new trustworthy business partners!" /> { <link rel="stylesheet" href="style/blue_main.css" type="text/css" />} == {<head> <title>Dragonicum.com - making the right business connections !</title> <meta name="keywords" content="Company Directory... >> T: "" parse doc [ thru "<meta" "name=" skip "keywords" skip "content=" m: skip (m1: first m ) copy T to m1 to end ] print T Company Directory, Join Us, Advanced Search, Trade Leads, Forum, Trade Shows, Advertising, Translation, fair trade, trade portal, business to business, trade leads, trade events, china export, china manufacturer >> T: "" parse doc [ thru "<meta" "name=" skip "description" skip "content=" m: skip (m1: first m ) copy T to m1 to end ] print T >> ( as you see because keywords are first it works for them , but doesn't for description , they can be in different order in other document etc) | |
Janko: 14-Feb-2009 | maybe your solution for A | B would work.. I will try | |
Janko: 14-Feb-2009 | yes :) thanks a lot! | |
Janko: 14-Feb-2009 | >> T: K: D: "" parse doc [ SOME [ thru "<meta" "name=" skip [ "description" (V: 'D) | "keywords" (V: 'K)] skip "content=" m: skip (m1: first m ) copy T to m1 (set V T) ] to end ] ?? K ?? D K: {Company Directory, Join Us, Advanced Search, Trade Leads, Forum, Trade Shows, Advertising, Translation, fair trade, trade portal, business to business, tr ade leads, trade events, china export, china manufacturer} D: {New international trade portal and company directory for Asia, Europe and North America. Our priority No.1 is to create and maintain a safe, well lit busi ness-to-business marketplace, by assisting our members in identifying new trustworthy business partners!} == {New international trade portal and company directory for Asia, Europe and North America. Our priority No.1 is to create and mai... >> | |
Janko: 14-Feb-2009 | I intended to make a blogpost .. "REBOL parse challenge" and present this problem and ask if people can provide solutions in other languages that would be more elgant ... (in similar note as the "arc challenge" ... now that it seems even more hard nut to crack I should probably really do it .. does anyone think this would be easy to solve using the conventional language? (I think not) | |
Janko: 14-Feb-2009 | hm.. would this be nicely solvable with a regex? .. I think it would be quite a pain by using regular string functions like strpos substr etc... having the same requirenments (one or more spaces/tabs/newlines " or ' , undefined order) | |
Anton: 14-Feb-2009 | I don't know - I only learn regex when I have to .. then a short time later I forget. | |
Anton: 14-Feb-2009 | What would you build a state machine with, which would generate so much code ? | |
Anton: 14-Feb-2009 | You say "state machines ... require more code". What code ? Obviously, you can build a state machine in any language, but I guess I'm wondering what ... ohh... I'm so tired after all those cheese sandwiches.... | |
Anton: 14-Feb-2009 | Anyway, I think I understand what you're saying. A state machine is big and clunky, expressing everything you don't want to hear about, while parse allows you to express your target more directly, cutting through anything you don't want without having to specify it. | |
Janko: 14-Feb-2009 | I don't know the exact term for this but I build many parsers for things like xml, wiki text and some other custom things in various lower level langauges using simple state machine (at least that's how I called it)... To my understanding you can parse anything with something like that, also structured nested data with it but it of course takes some more coding than this rebol solution... what I mean as a state machine is a loop that accepts characters or words and has a predefined number of states and code for what to do at each state and when to switch to another state etc.. | |
Anton: 14-Feb-2009 | The first one without any quotes causes a little bit of a problem (solvable). | |
Anton: 14-Feb-2009 | You have to use a variable to store which one was used, then parse until that character is encountered again. | |
Anton: 14-Feb-2009 | Is this a surprise ? >> parse "abc" [some ["b" | "c" | "a"]] == true | |
Anton: 14-Feb-2009 | Yes, it takes a little while to become familiar with parse. | |
Janko: 14-Feb-2009 | this does surprise me a little , but I am not sure if this was the problem or something else, because I hrought I tried with some and all things | |
Anton: 14-Feb-2009 | It means, basically: SOME: Do this 1 or more times, until fail or end is reached: [Try "b", if that fails, try "c". If that fails, try "a"] <--- Given "a" "b" "c", this rule always succeeds. | |
Janko: 14-Feb-2009 | ( the problem is at things where things repeat adn I don't know in which order they will appear .. I had this problem with parsing something like simplified wiki text ) >> a: "start1 1 end start2 2 end start1 3 end" == "start1 1 end start2 2 end start1 3 end" >> parse a [ SOME [ [ thru "start2" | thru "start1" ] copy T to "end" (print T) ] to end ] 2 3 == true >> parse a [ SOME [ [ thru "start1" | thru "start2" ] copy T to "end" (print T) ] to end ] 1 3 == true ( to not give impression I have only problems with parse, I used parse to solve many things that would be headhurting any other way... these and problem upthere are just cases where I got into trouble) | |
Anton: 14-Feb-2009 | apiece: [copy T to "end" (?? T)] parse a [some [thru "start2" apiece | thru "start1" apiece] to end] | |
Janko: 14-Feb-2009 | This is basically not a problem , as I solve these things wiht multiple passes and it works more than fast enought for me that way also ... I think this problem would not exist if in case of [ .. | .. | .. ] parse would check all options and take the one stat is least characters away from current position (that comes true the first) .. but this would most probably slow down the parse and you would loose the feature that you define "priority" with [ .. | .. | .. ] now .. so maybe if there would be a different | for this | |
Janko: 14-Feb-2009 | ( I have to go to eat... will be back .. thanks a lot for before) | |
Janko: 14-Feb-2009 | hm.. really thanks for this example.. I took it as unsolvable, but this is totaly elegant way to solve it .. I will need to think on this a little and do some more examples to difest it :) thanks | |
Oldes: 14-Feb-2009 | If you need to parse complex structures, like the marup language, you should use charsets and not 'to or 'thru commands... for example you cannot say that tag starts with < and ends with > because such a tag is valid as well: <input value="<>"> The 'to and 'thru commands are useful, if you, for example, do datamining and don't care to parse all page structure to get just a bit of information from it. | |
amacleod: 22-Feb-2009 | Is there a way to force parse to inclose results in {} instead of double quotes "" regardless of length? | |
MaxV: 20-Mar-2009 | Hello everybody! I have a problem. I need to extract email addresses from a big text like bla bla [me-:-demo-:-com] bla bla ... <[you-:-example-:-org]> etc. [he-:-italy-:-it] There is possible to obtain a text with all the addresses withou the "<" and ">"? | |
Pekr: 20-Mar-2009 | Here's absolutly terrible parser - it does NOT follow RFC, allow any combination of alpha chars, dots, one @ char, and the same, once again to the next space char ... space: #" " mailchar: charset [#"0" - #"9" #"A" - #"Z" #"a" - #"z" ".-"] at-char: #"@" email: [ space start: some mailchar at-char some mailchar end: space (print copy/part start end) ] str: "afadfa adfa asdfasdfa fd [asdfas-:-adfadf-:-adfa-adfadfsda-:-com] adfafaf a af" parse/all str [any [email | skip]] | |
btiffin: 20-Mar-2009 | It would be nice if REBOL could LOAD foreign! data. :) Hint hint wink wink. And being here in a public REBOL forum I might get in trouble for suggesting this one. $ grep -o -E '\b[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,4}\b' files... | |
Geomol: 20-Mar-2009 | Brian, you can probably do that grep with a few CHARSET and PARSE in REBOL. | |
btiffin: 20-Mar-2009 | And actually I think it's wrong anyway ... as it should be. Posting regex in a REBOL forum ... shame on me. ;) | |
swall: 27-Mar-2009 | I'm having trouble parsing the "none" datatype from within blocks. The following example illustrates my problem (hopefully): junk: [none [1 2 [3 4]]] parse/all junk [none (print ["nothing"]) text: (print ["text:" mold text]) set b block! (print ["block:" mold b])] This produces the following output: nothing text: [none [1 2 [3 4]]] == false Notice that the block doesn't get parsed. It seems that parse ignores "none" tokens rather than extracting them from the input stream. If I put a number in place of none and parse for "number!", then the block does indeed get parsed. Is this a bug or an oversight? Or am I just confused? | |
Izkata: 27-Mar-2009 | 'none isn't a datatype - none! is: >> parse/all junk [none! (print ["nothing"]) text: (print ["text:" mold text]) set b block! (print ["block:" mold b])] nothing text: [[1 2 [3 4]]] block: [1 2 [3 4]] == true | |
Izkata: 27-Mar-2009 | Ah, forgot to copy that part - I'd done "junk/1: none" to make sure it was a none value | |
Henrik: 29-Mar-2009 | it's just a serialized version of none!, so you can load it as a real none value instead of a word. | |
[unknown: 5]: 29-Mar-2009 | Pavel, this also works with datatypes. For example: >> mold/all string! == "#[datatype! string!]" This is useful if your loading values from a file. This way your sure to set a value to a string datatype! when desired. | |
Janko: 15-Apr-2009 | Hi, I have one question .. can you somehow break out of some loop by rebol code .. for example parse [ aa zzz cc ] [ some [ set W word! ( ?? W if equal? W 'zzz [ break ] ) ] ] ... that break doesn't work that way, but is there some way to do this? I need to compare W with a runtime value | |
Janko: 15-Apr-2009 | I solved it in a way that I can just return out of whole function (with return) at that point so it's ok .. first I had it thought out in a way that I would need to exit the some [ ] loop but continue parsing .. error probably wouldn't work that way either? This is now my code..match: match func [ data rules ] [ parse rules [ SOME [ set L lit-word! ( either equal? L reduce first data [ data: next data ] [ return false ] ) | set W word! ( set :W first data data: next data ) ] ] ] | |
Ammon: 16-Apr-2009 | ; Here's one way to do it... >> digit: charset "1234567890" == make bitset! #{ 000000000000FF03000000000000000000000000000000000000000000000000 } >> rule: [s: some digit e: (print copy/part s e) | h: #"a" (h: tail h) :h | skip ] == [s: some digit e: (print copy/part s e) | h: #"a" (h: tail h) :h | skip] >> parse "12b34c56a78" [any rule] 12 34 56 == true | |
Dockimbel: 16-Apr-2009 | Another possible way is by setting at runtime a [break] rule : branch-rule: [ ] parse [ aa zzz cc ] [ some [ set W word! ( ?? W if equal? W 'zzz [ branch-rule: [ break ] ] ) branch-rule ] ] | |
shadwolf: 16-Apr-2009 | charset create a "mask" in bitset form to be compared to the curent item read from the string | |
shadwolf: 16-Apr-2009 | some digit since digit is a bitset containing the binary image of what you looking for (numbers char from 1 to | |
shadwolf: 16-Apr-2009 | the equivalent lame would be someting like foreach a string [ either find? "1234567890" a [ append e a ][probe e clear e ] ] | |
Ammon: 17-Apr-2009 | Essentially what I'm doing with the above code is simply skipping to the end of the parse input when a given rule is matched. This works because a get-word in the parse rules sets the current parse input. The get-word can be any value of the same type as the original parse input. You can't set the parse input to a string! if a block! was provided to parse to start with. | |
Graham: 23-Apr-2009 | I'd like to take an english sentence and tidy it up. I want to automatically apply english grammar to it ... so capitalize the first letter after a period, and remove extraneous spaces eg. a comma after a space. Anyone done anything like this with 'parse? | |
Ammon: 24-Apr-2009 | Not yet but I've been thinking about it for quite a while now... I think I have a pretty good idea what the parse rules should look like but I haven't written any code for it yet. | |
Steeve: 24-Apr-2009 | Good start... letter: charset [#"a" - #"z" #"A" - #"Z"] dirt: complement letter word: [some letter] clean: [here: dirt :here (remove here)] space: [here: (insert here #" ") skip] capital: [here: letter (uppercase/part here 1)] sentence: [ some [ capital opt word break | clean ] any [ [#";" | #","] any clean space word | #"." any clean space capital opt word | #" " word | clean ] ] parse/all text: {test test . test;; test ..test } sentence probe text >>"Test test. Test; test. Test" | |
Steeve: 24-Apr-2009 | for #"'" you should add a rule to remove spaces | |
Steeve: 24-Apr-2009 | with that you supress unwanted spaces. it' s a good day --> "it's a good day" | |
Steeve: 24-Apr-2009 | so don't add ""'" as a vali | |
Graham: 24-Apr-2009 | Also, I think have to add ' to the letter charset because words ending in s can have a trailing ' for possession ... | |
Steeve: 24-Apr-2009 | but what if they have inserted a space after or before ' | |
Steeve: 24-Apr-2009 | hum ok, but you could handle that specif case with a different rule | |
Steeve: 24-Apr-2009 | parse is just amazing for such simple grammar. A simple add and it's doing all you want. | |
Pekr: 3-May-2009 | Have I found a parse bug? 1) >> parse/all {zybc} [ some ["b" break | "y" break | skip] copy result thru "c" (print result)] bc == true 2) >> parse/all {zybc} [ some ["b" break| "y" break | skip] copy result thru "c" (print result)] ** Script Error: break| has no value ** Near: parse/all "zybc" [some ["b" break| "y" break | skip] copy result thru "c" (print result)] 3) >> parse/all {zybc} [ some ["b" break | "y" break| skip] copy result thru "c" (print result)] == false Such stupid bugs are really making the testing process difficult. I wondered at least 5 minutes, why the result of case 3 was wrong, and then I tried to add space behind the second break, and the code was corrected. How is that second break| does not report error? ;-) | |
shadwolf: 3-May-2009 | 3) is like 2) you put a | to close of the second break. I noticed on rebol 2 strange reactions with find multi-case too | |
Pekr: 3-May-2009 | yes, you might be right doc. But - it is really very difficult to track down for user. It almost looks like scanner bug, but it is not. What actually happens in the case 3) is, that "break|" is being considered a regular word, which just does not have value. Stating that, it also means that 'skip is not part of OR expression. So, 'some block fails on not matching "y" .... | |
Graham: 16-May-2009 | Here's a parse question for the experts. | |
Graham: 16-May-2009 | If I have a document with headings eg. a: b: .. z: and text optionally under each heading ... would it be possible to use parse to collect all the text from each heading if the headings are in any order and some headings with no text are optionally missing? | |
Maxim: 16-May-2009 | now was that a question of the "can you give me the solution" kind? | |
Graham: 16-May-2009 | It's a little complicated because the headers can have spaces in them. | |
Graham: 16-May-2009 | now if you have a rule copy text [ to "a:" | to "b:" .... ] but if b: occurs before a: in the text, then you will include a header in copied text | |
Graham: 16-May-2009 | yes, headers start on a newline and terminate in ":" | |
Graham: 16-May-2009 | No, there can be a ":" in the content | |
Graham: 16-May-2009 | but you know what the headers are ... so that's not a big problem. | |
Maxim: 16-May-2009 | can you give the name of some the headers... or an example.... so far it looks like a really simple rule to me. | |
Maxim: 16-May-2009 | I can assume it starts at a header? | |
Graham: 16-May-2009 | So, I am trying to create an object from a semi structured document where the object elements are in any order or missing. | |
Graham: 16-May-2009 | I guess I can do it without using parse .. just replace all the headers with a mark, that allows me to split off all the sections, and then i can match the sections with all the section headers. | |
Steeve: 17-May-2009 | There is no reason, the content is enclosed in a string before being loaded. If it fails, it's because the whole grammar has changed |
30301 / 64608 | 1 | 2 | 3 | 4 | 5 | ... | 302 | 303 | [304] | 305 | 306 | ... | 643 | 644 | 645 | 646 | 647 |