AltME groups: search

Help · search scripts · search articles · search mailing list

results summary

world	hits
r4wp	5907
r3wp	58701
total:	64608

results window for this page: [start: 30301 end: 30400]

world-name: r3wp

Group: Parse ... Discussion of PARSE dialect [web-public]
Maxim: 24-Dec-2008	so I'll go back to the batcave and continue working on remark v2, and some other stuff... I want to release since a long time
BrianH: 29-Dec-2008	The real advantage to the TO/THRU enhancement comes when it lets you avoid creating charsets, which are a lot less useful with Unicode. It should be pretty easy to implement.
BrianH: 29-Dec-2008	I think that the proposals are more than Carl was thinking they would be - apparently he had forgotten the previous proposal lists. I don't think that it will be too much of a problem though, as there are not really that many proposals that are likely to be accepted. Some are competing proposals, of which only one would be chosen. Also, there aren't that many proposals overall - they are just thoroughly specified.
GiuseppeC: 29-Dec-2008	Lets see how things evolves. Proposal are very interesting as they would easy a lot of work on building parse rules. Everything is silent apart some blog messages where I have read for the first time the word "Beta" connected with REBOL3.
BrianH: 29-Dec-2008	My main concern is that Carl's main requirements of the proposal process have been ignored in some cases: - That the proposals be concisely specified. The Purpose and Importance statements should be one sentence each. - That there be no discussion of theory. - That there be no specification of equivalent rules. - That all discussions happen outside of the wiki. - That this is a proposals page, not documentation.
BrianH: 29-Dec-2008	As it is, I hope Carl will read a paper that long when he gets to the point of taking on PARSE.
Janko: 31-Jan-2009	aha, I remeber I learned a lot from that green page too.. thanks for links so far , I will read the pages and hopefully I will find something related to the problems I have
Janko: 31-Jan-2009	or in terms of Brett's examples: == true >> a: copy "dog cat" parse a [ ANY [ thru "dog" (print 1) \| thru "cat" (print 2) ] ] 1 2 == true >> a: copy "cat dog" parse a [ ANY [ thru "dog" (print 1) \| thru "cat" (print 2) ] ] 1 == true
Janko: 31-Jan-2009	basically similar problem that last time as I see now.. so by looking at that mailing list answers I have 2 solutions ... I use parse 3 times on a string.. or maybe I use Ladislav's parseen which he said solves this.. but I don't yet know how :)
Janko: 31-Jan-2009	S WORKS IF IN THIS ORDER =heading= {comment some comment} - line 1 - line 2 -------------> <h1>heading</h1> <p>comment some comment</p> <li>line 1<li> <li>line 2</li> THIS DOESN'T WORK =heading= {comment some comment} =heading= - line 1 - line 2 =heading= {comment some comment} ADDITIONAL (SIMILAR) PROBLEM - line 1 + line 2 + line 3 - line 4 + line 5 -----------------> <li class="a">line 1</li> <li class="a">line 2</li> ...
Oldes: 31-Jan-2009	>> ? complement USAGE: COMPLEMENT value DESCRIPTION: Returns the one's complement value. COMPLEMENT is an action value. ARGUMENTS: value -- (Type: logic number char tuple binary string bitset image) >> ? union USAGE: UNION set1 set2 /case /skip size DESCRIPTION: Creates a new set that is the union of the two arguments. UNION is a native value. ARGUMENTS: set1 -- first set (Type: series bitset) set2 -- second set (Type: series bitset) REFINEMENTS: /case -- Use case-sensitive comparison /skip -- Treat the series as records of fixed size size -- (Type: integer) >>
Janko: 31-Jan-2009	uh, that is some advanced parse :) .. I will need a couple of days to think it through
Brock: 31-Jan-2009	I'll try to explain complement. I like to think of a charset being a list of valid chars that can be tested for. However, say you need all characters of the alphabet minus a few. Instead of defining multiple ranges of characters as in charset "A-FH-K N-T V-Wa-z0-9" which effectively skips the chars G L & U, you could simply state complement[GLU], which would exclude these three characters from the charset but include all others.
Brock: 31-Jan-2009	If there's something more specific or a technically better way to state the above please ad your infput
Janko: 1-Feb-2009	Very interesting, both versions (Oldes and Steeve) , thanks a lot.. I think I understood most of it now
Oldes: 1-Feb-2009	Is there any better way how to change the main parse rules during parse like this one? (just a simple example..in real life the lexers would be more complicated :) d: charset "0123456789" lexer1: [copy x 1 skip (probe x if x = "." [lexer: lexer2]) \| end skip] lexer2: [copy x some d (probe x lexer: lexer1) \| end skip] lexer: lexer1 parse "abcd.123efgh" [ some [() lexer]]
Steeve: 1-Feb-2009	Not really Oldes... but what is your purpose ? isn't that a little obfuscated again You said it's just an example, but why can't you use the normal way ? I would like to know... parse "..." [ some [ #"." lexer2 \| lexer1 ] ]
Maarten: 2-Feb-2009	This weekend I got an interesting idea: algebraic (and recursive) data types are well known for their ability to implement parsers. And they are a great data modeling tool. E.g: data Bill = Name BankAccount \| Company CreditCard data CreditCard = CVC2 CCNumber CCExpiryDate However, the opposite also holds, i.e you can model data domain using named parse rules without actions just as easy. Now, what if you would combine two dialects: one to define data structures and a separate one to attach actions. E.g. Post: [ message [string!] author [string!] timestamp [date!] ] Comments: [ some posts] blog [ 1 post comments] action 'JSON 'Post [ .... the action to convert the Post to JSON here ...] action 'XHTML 'POST [ ..... the action to convert Post to XHTML here...] process some-data 'JSON -> this gives back the data processed as for the JSON actions. It is a bit SAX like, with the difference that this models classes of action and separates them from the data in stead of scattering some lose actions. And, the data modeling still holds.
Maarten: 2-Feb-2009	Then make actions for data to go to JSON, XML, XHTML, back and forth to a database,....
Maarten: 3-Feb-2009	Chris: 1) Yes, actually, that would be yhe idea 2) I think the data dialect would be a strict subset of parse, forcing you to use set-word/parse-rule pairs Hence, the set-words are available in the action.
Graham: 9-Feb-2009	They call it a big Mac not a big Mc ... odd
Janko: 14-Feb-2009	hi, it's me again with parse problems... I need this concretely to parse out web-page meta tags.. but I distilled the problem out of it to a minimal example..
Janko: 14-Feb-2009	doc1: "start A 1 end start B 2 end" how can you get value of 2 out
Janko: 14-Feb-2009	It works with a because it's first , but becasuse it enters the "parse" with it and then doesn't match it doesn't again test the B >> parse doc1 [ "start" "A" copy R to "end" (print R) to end ] 1 == true >> parse doc1 [ "start" "B" copy R to "end" (print R) to end ] == false
Janko: 14-Feb-2009	>> doc1: "start A 1 end xyz B 2 end" ;; in this case it must not take 2 == "start A 1 end xyz B 2 end" >> parse doc1 [ "start" thru "B" copy R to "end" (print R) to end ] ;; but it will that's why I can't u se to\thru 2 == true
Anton: 14-Feb-2009	some ["start" ["A" \| "B"] copy R to "end" "end"]
Janko: 14-Feb-2009	ups ... my example above is wrong .. just a sec
Janko: 14-Feb-2009	(this is the right example .. I forgot to use thru above so second wouldn't pass anyway... but result is the same) >> doc1: "start A 1 end start B 2 end" == "start A 1 end start B 2 end" >> parse doc1 [ thru "start" "A" copy R to "end" (print R) to end ] 1 == true >> parse doc1 [ thru "start" "B" copy R to "end" (print R) to end ] == false >> parse doc1 [ SOME [ thru "start" "B" copy R to "end" (print R) to end ] ] == false
Anton: 14-Feb-2009	Is there anything expected between "start" and "A", for instance ?
Anton: 14-Feb-2009	parse doc1 [some [thru "start" ["A" \| "B"] copy R to "end" (?? R) "end"]]
Janko: 14-Feb-2009	hm.. just a sec so I try few things
Janko: 14-Feb-2009	Your solution, I thought it won't work if I reverse order of A and B in the string but it seems it does. I would need to know which one is A and B but I think this can be solved by setting some word ( ) inside [ A \| B] ... so basically it seems to work... I think I can apply this way also to my concrete problem which is this
Janko: 14-Feb-2009	( I need to parse meta tags description and keywords and abstract if they exist -- they can come in any order, there can be one or multiple spaces/newlines/tabs between tag arguments, there can be " or ' used as argument="asdasd" ) >> doc2: {<head> { <title>Dragonicum.com - making the right business connections !</title> { <meta name="keywords" content="Company Directory, Join Us, Advanced Search, Trade Leads, Forum, Trade S { hows, Advertising, Translation, fair trade, trade portal, business to business, trade leads, trade even { ts, china export, china manufacturer" /> { <meta name="description" content="New international trade portal and company directory for Asia, Europe { and North America. Our priority No.1 is to create and maintain a safe, well lit business-to-business m { arketplace, by assisting our members in identifying new trustworthy business partners!" /> { <link rel="stylesheet" href="style/blue_main.css" type="text/css" />} == {<head> <title>Dragonicum.com - making the right business connections !</title> <meta name="keywords" content="Company Directory... >> T: "" parse doc [ thru "<meta" "name=" skip "keywords" skip "content=" m: skip (m1: first m ) copy T to m1 to end ] print T Company Directory, Join Us, Advanced Search, Trade Leads, Forum, Trade Shows, Advertising, Translation, fair trade, trade portal, business to business, trade leads, trade events, china export, china manufacturer >> T: "" parse doc [ thru "<meta" "name=" skip "description" skip "content=" m: skip (m1: first m ) copy T to m1 to end ] print T >> ( as you see because keywords are first it works for them , but doesn't for description , they can be in different order in other document etc)
Janko: 14-Feb-2009	maybe your solution for A \| B would work.. I will try
Janko: 14-Feb-2009	yes :) thanks a lot!
Janko: 14-Feb-2009	>> T: K: D: "" parse doc [ SOME [ thru "<meta" "name=" skip [ "description" (V: 'D) \| "keywords" (V: 'K)] skip "content=" m: skip (m1: first m ) copy T to m1 (set V T) ] to end ] ?? K ?? D K: {Company Directory, Join Us, Advanced Search, Trade Leads, Forum, Trade Shows, Advertising, Translation, fair trade, trade portal, business to business, tr ade leads, trade events, china export, china manufacturer} D: {New international trade portal and company directory for Asia, Europe and North America. Our priority No.1 is to create and maintain a safe, well lit busi ness-to-business marketplace, by assisting our members in identifying new trustworthy business partners!} == {New international trade portal and company directory for Asia, Europe and North America. Our priority No.1 is to create and mai... >>
Janko: 14-Feb-2009	I intended to make a blogpost .. "REBOL parse challenge" and present this problem and ask if people can provide solutions in other languages that would be more elgant ... (in similar note as the "arc challenge" ... now that it seems even more hard nut to crack I should probably really do it .. does anyone think this would be easy to solve using the conventional language? (I think not)
Janko: 14-Feb-2009	hm.. would this be nicely solvable with a regex? .. I think it would be quite a pain by using regular string functions like strpos substr etc... having the same requirenments (one or more spaces/tabs/newlines " or ' , undefined order)
Anton: 14-Feb-2009	I don't know - I only learn regex when I have to .. then a short time later I forget.
Anton: 14-Feb-2009	What would you build a state machine with, which would generate so much code ?
Anton: 14-Feb-2009	You say "state machines ... require more code". What code ? Obviously, you can build a state machine in any language, but I guess I'm wondering what ... ohh... I'm so tired after all those cheese sandwiches....
Anton: 14-Feb-2009	Anyway, I think I understand what you're saying. A state machine is big and clunky, expressing everything you don't want to hear about, while parse allows you to express your target more directly, cutting through anything you don't want without having to specify it.
Janko: 14-Feb-2009	I don't know the exact term for this but I build many parsers for things like xml, wiki text and some other custom things in various lower level langauges using simple state machine (at least that's how I called it)... To my understanding you can parse anything with something like that, also structured nested data with it but it of course takes some more coding than this rebol solution... what I mean as a state machine is a loop that accepts characters or words and has a predefined number of states and code for what to do at each state and when to switch to another state etc..
Anton: 14-Feb-2009	The first one without any quotes causes a little bit of a problem (solvable).
Anton: 14-Feb-2009	You have to use a variable to store which one was used, then parse until that character is encountered again.
Anton: 14-Feb-2009	Is this a surprise ? >> parse "abc" [some ["b" \| "c" \| "a"]] == true
Anton: 14-Feb-2009	Yes, it takes a little while to become familiar with parse.
Janko: 14-Feb-2009	this does surprise me a little , but I am not sure if this was the problem or something else, because I hrought I tried with some and all things
Anton: 14-Feb-2009	It means, basically: SOME: Do this 1 or more times, until fail or end is reached: [Try "b", if that fails, try "c". If that fails, try "a"] <--- Given "a" "b" "c", this rule always succeeds.
Janko: 14-Feb-2009	( the problem is at things where things repeat adn I don't know in which order they will appear .. I had this problem with parsing something like simplified wiki text ) >> a: "start1 1 end start2 2 end start1 3 end" == "start1 1 end start2 2 end start1 3 end" >> parse a [ SOME [ [ thru "start2" \| thru "start1" ] copy T to "end" (print T) ] to end ] 2 3 == true >> parse a [ SOME [ [ thru "start1" \| thru "start2" ] copy T to "end" (print T) ] to end ] 1 3 == true ( to not give impression I have only problems with parse, I used parse to solve many things that would be headhurting any other way... these and problem upthere are just cases where I got into trouble)
Anton: 14-Feb-2009	apiece: [copy T to "end" (?? T)] parse a [some [thru "start2" apiece \| thru "start1" apiece] to end]
Janko: 14-Feb-2009	This is basically not a problem , as I solve these things wiht multiple passes and it works more than fast enought for me that way also ... I think this problem would not exist if in case of [ .. \| .. \| .. ] parse would check all options and take the one stat is least characters away from current position (that comes true the first) .. but this would most probably slow down the parse and you would loose the feature that you define "priority" with [ .. \| .. \| .. ] now .. so maybe if there would be a different \| for this
Janko: 14-Feb-2009	( I have to go to eat... will be back .. thanks a lot for before)
Janko: 14-Feb-2009	hm.. really thanks for this example.. I took it as unsolvable, but this is totaly elegant way to solve it .. I will need to think on this a little and do some more examples to difest it :) thanks
Oldes: 14-Feb-2009	If you need to parse complex structures, like the marup language, you should use charsets and not 'to or 'thru commands... for example you cannot say that tag starts with < and ends with > because such a tag is valid as well: <input value="<>"> The 'to and 'thru commands are useful, if you, for example, do datamining and don't care to parse all page structure to get just a bit of information from it.
amacleod: 22-Feb-2009	Is there a way to force parse to inclose results in {} instead of double quotes "" regardless of length?
MaxV: 20-Mar-2009	Hello everybody! I have a problem. I need to extract email addresses from a big text like bla bla [me-:-demo-:-com] bla bla ... <[you-:-example-:-org]> etc. [he-:-italy-:-it] There is possible to obtain a text with all the addresses withou the "<" and ">"?
Pekr: 20-Mar-2009	Here's absolutly terrible parser - it does NOT follow RFC, allow any combination of alpha chars, dots, one @ char, and the same, once again to the next space char ... space: #" " mailchar: charset [#"0" - #"9" #"A" - #"Z" #"a" - #"z" ".-"] at-char: #"@" email: [ space start: some mailchar at-char some mailchar end: space (print copy/part start end) ] str: "afadfa adfa asdfasdfa fd [asdfas-:-adfadf-:-adfa-adfadfsda-:-com] adfafaf a af" parse/all str [any [email \| skip]]
btiffin: 20-Mar-2009	It would be nice if REBOL could LOAD foreign! data. :) Hint hint wink wink. And being here in a public REBOL forum I might get in trouble for suggesting this one. $ grep -o -E '\b[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,4}\b' files...
Geomol: 20-Mar-2009	Brian, you can probably do that grep with a few CHARSET and PARSE in REBOL.
btiffin: 20-Mar-2009	And actually I think it's wrong anyway ... as it should be. Posting regex in a REBOL forum ... shame on me. ;)
swall: 27-Mar-2009	I'm having trouble parsing the "none" datatype from within blocks. The following example illustrates my problem (hopefully): junk: [none [1 2 [3 4]]] parse/all junk [none (print ["nothing"]) text: (print ["text:" mold text]) set b block! (print ["block:" mold b])] This produces the following output: nothing text: [none [1 2 [3 4]]] == false Notice that the block doesn't get parsed. It seems that parse ignores "none" tokens rather than extracting them from the input stream. If I put a number in place of none and parse for "number!", then the block does indeed get parsed. Is this a bug or an oversight? Or am I just confused?
Izkata: 27-Mar-2009	'none isn't a datatype - none! is: >> parse/all junk [none! (print ["nothing"]) text: (print ["text:" mold text]) set b block! (print ["block:" mold b])] nothing text: [[1 2 [3 4]]] block: [1 2 [3 4]] == true
Izkata: 27-Mar-2009	Ah, forgot to copy that part - I'd done "junk/1: none" to make sure it was a none value
Henrik: 29-Mar-2009	it's just a serialized version of none!, so you can load it as a real none value instead of a word.
[unknown: 5]: 29-Mar-2009	Pavel, this also works with datatypes. For example: >> mold/all string! == "#[datatype! string!]" This is useful if your loading values from a file. This way your sure to set a value to a string datatype! when desired.
Janko: 15-Apr-2009	Hi, I have one question .. can you somehow break out of some loop by rebol code .. for example parse [ aa zzz cc ] [ some [ set W word! ( ?? W if equal? W 'zzz [ break ] ) ] ] ... that break doesn't work that way, but is there some way to do this? I need to compare W with a runtime value
Janko: 15-Apr-2009	I solved it in a way that I can just return out of whole function (with return) at that point so it's ok .. first I had it thought out in a way that I would need to exit the some [ ] loop but continue parsing .. error probably wouldn't work that way either? This is now my code..match: match func [ data rules ] [ parse rules [ SOME [ set L lit-word! ( either equal? L reduce first data [ data: next data ] [ return false ] ) \| set W word! ( set :W first data data: next data ) ] ] ]
Ammon: 16-Apr-2009	; Here's one way to do it... >> digit: charset "1234567890" == make bitset! #{ 000000000000FF03000000000000000000000000000000000000000000000000 } >> rule: [s: some digit e: (print copy/part s e) \| h: #"a" (h: tail h) :h \| skip ] == [s: some digit e: (print copy/part s e) \| h: #"a" (h: tail h) :h \| skip] >> parse "12b34c56a78" [any rule] 12 34 56 == true
Dockimbel: 16-Apr-2009	Another possible way is by setting at runtime a [break] rule : branch-rule: [ ] parse [ aa zzz cc ] [ some [ set W word! ( ?? W if equal? W 'zzz [ branch-rule: [ break ] ] ) branch-rule ] ]
shadwolf: 16-Apr-2009	charset create a "mask" in bitset form to be compared to the curent item read from the string
shadwolf: 16-Apr-2009	some digit since digit is a bitset containing the binary image of what you looking for (numbers char from 1 to
shadwolf: 16-Apr-2009	the equivalent lame would be someting like foreach a string [ either find? "1234567890" a [ append e a ][probe e clear e ] ]
Ammon: 17-Apr-2009	Essentially what I'm doing with the above code is simply skipping to the end of the parse input when a given rule is matched. This works because a get-word in the parse rules sets the current parse input. The get-word can be any value of the same type as the original parse input. You can't set the parse input to a string! if a block! was provided to parse to start with.
Graham: 23-Apr-2009	I'd like to take an english sentence and tidy it up. I want to automatically apply english grammar to it ... so capitalize the first letter after a period, and remove extraneous spaces eg. a comma after a space. Anyone done anything like this with 'parse?
Ammon: 24-Apr-2009	Not yet but I've been thinking about it for quite a while now... I think I have a pretty good idea what the parse rules should look like but I haven't written any code for it yet.
Steeve: 24-Apr-2009	Good start... letter: charset [#"a" - #"z" #"A" - #"Z"] dirt: complement letter word: [some letter] clean: [here: dirt :here (remove here)] space: [here: (insert here #" ") skip] capital: [here: letter (uppercase/part here 1)] sentence: [ some [ capital opt word break \| clean ] any [ [#";" \| #","] any clean space word \| #"." any clean space capital opt word \| #" " word \| clean ] ] parse/all text: {test test . test;; test ..test } sentence probe text >>"Test test. Test; test. Test"
Steeve: 24-Apr-2009	for #"'" you should add a rule to remove spaces
Steeve: 24-Apr-2009	with that you supress unwanted spaces. it' s a good day --> "it's a good day"
Steeve: 24-Apr-2009	so don't add ""'" as a vali
Graham: 24-Apr-2009	Also, I think have to add ' to the letter charset because words ending in s can have a trailing ' for possession ...
Steeve: 24-Apr-2009	but what if they have inserted a space after or before '
Steeve: 24-Apr-2009	hum ok, but you could handle that specif case with a different rule
Steeve: 24-Apr-2009	parse is just amazing for such simple grammar. A simple add and it's doing all you want.
Pekr: 3-May-2009	Have I found a parse bug? 1) >> parse/all {zybc} [ some ["b" break \| "y" break \| skip] copy result thru "c" (print result)] bc == true 2) >> parse/all {zybc} [ some ["b" break\| "y" break \| skip] copy result thru "c" (print result)] Script Error: break\| has no value Near: parse/all "zybc" [some ["b" break\| "y" break \| skip] copy result thru "c" (print result)] 3) >> parse/all {zybc} [ some ["b" break \| "y" break\| skip] copy result thru "c" (print result)] == false Such stupid bugs are really making the testing process difficult. I wondered at least 5 minutes, why the result of case 3 was wrong, and then I tried to add space behind the second break, and the code was corrected. How is that second break\| does not report error? ;-)
shadwolf: 3-May-2009	3) is like 2) you put a \| to close of the second break. I noticed on rebol 2 strange reactions with find multi-case too
Pekr: 3-May-2009	yes, you might be right doc. But - it is really very difficult to track down for user. It almost looks like scanner bug, but it is not. What actually happens in the case 3) is, that "break\|" is being considered a regular word, which just does not have value. Stating that, it also means that 'skip is not part of OR expression. So, 'some block fails on not matching "y" ....
Graham: 16-May-2009	Here's a parse question for the experts.
Graham: 16-May-2009	If I have a document with headings eg. a: b: .. z: and text optionally under each heading ... would it be possible to use parse to collect all the text from each heading if the headings are in any order and some headings with no text are optionally missing?
Maxim: 16-May-2009	now was that a question of the "can you give me the solution" kind?
Graham: 16-May-2009	It's a little complicated because the headers can have spaces in them.
Graham: 16-May-2009	now if you have a rule copy text [ to "a:" \| to "b:" .... ] but if b: occurs before a: in the text, then you will include a header in copied text
Graham: 16-May-2009	yes, headers start on a newline and terminate in ":"
Graham: 16-May-2009	No, there can be a ":" in the content
Graham: 16-May-2009	but you know what the headers are ... so that's not a big problem.
Maxim: 16-May-2009	can you give the name of some the headers... or an example.... so far it looks like a really simple rule to me.
Maxim: 16-May-2009	I can assume it starts at a header?
Graham: 16-May-2009	So, I am trying to create an object from a semi structured document where the object elements are in any order or missing.
Graham: 16-May-2009	I guess I can do it without using parse .. just replace all the headers with a mark, that allows me to split off all the sections, and then i can match the sections with all the section headers.
Steeve: 17-May-2009	There is no reason, the content is enclosed in a string before being loaded. If it fails, it's because the whole grammar has changed

30301 / 64608

[304]