r3wp [groups: 83 posts: 189283]
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

World: r3wp

[!RebGUI] A lightweight alternative to VID

Steeve
11-Apr-2007
[5827x3]
ladislav, i gan 1 second while intializing the dictionary with this 
train function:

train: func [
	words [block!]
	/local keys values pos i
][
	keys: to hash! unique words
	values: head insert/dup cp [] 0 2 * length? keys

 i: 0 loop length? keys [change values first keys values: skip values 
 2 keys: next keys]
	values: head values
	keys: head keys
	foreach word words [
			pos: 2 * index? find keys word 
			poke values pos 1 + pick values pos
	]
	sort/reverse/skip/compare values 2 2
	values
]
the first 'sort in your train function consume a lot of time
anyway, my function is not perfect ...either
Ashley
11-Apr-2007
[5830]
The 'correct function is *very* slow, takes almost 4 seconds to do 
the following:

	>> correct "caustic"


Also, where is the bin file sourced from and are other languages 
supported?
Steeve
11-Apr-2007
[5831x3]
yeah all languages are supported, it's the good thing with this algo
you juste have to replace the corpus by another bin file
i mean, all "latin" languages are supported
Ashley
11-Apr-2007
[5834]
And a good collection of bin files would be located at?
Steeve
11-Apr-2007
[5835x4]
hey, they are just plain text books
just take a book well writed
what is your language ?
take a classic
Ashley
11-Apr-2007
[5839]
Queen's English ;)
Steeve
11-Apr-2007
[5840]
use books.google.com to find a good one
Ashley
11-Apr-2007
[5841]
I'm wondering whether, for RebGUI, this approach is actually "better" 
than the current soundex approach? If it means having to pick arbitrary 
literary works for each language then what's the point? I was hoping 
that someone had pre-generated some "standardized" dicts somewhere 
I could grab.
Steeve
11-Apr-2007
[5842x3]
i dunno
soundex is faster
but more complicated
Ashley
11-Apr-2007
[5845]
? It's less than a dozen lines of code!
Steeve
11-Apr-2007
[5846x2]
oh really ?
right
Ashley
11-Apr-2007
[5848]
http://trac.geekisp.com/rebgui/browser/rebgui-edit.r?format=raw
Ladislav
12-Apr-2007
[5849]
steeve: don't forget, that the 1s difference in TRAIN speed does 
not actually matter, since it is just a dictionary preparation. What 
matters is the speed of the CORRECT function
Graham
12-Apr-2007
[5850]
how fast is the correct function?
Ladislav
12-Apr-2007
[5851x2]
the version I just sent to the ML is now almost 4 times faster than 
Cyphre's
on his example it is about 9 words/s
Pekr
12-Apr-2007
[5853]
Graham - Ashley is right - all I did wat I added 'table name to the 
behavior tabbing category ....
Ladislav
12-Apr-2007
[5854]
Soundex is a language-specific algorithm, unfortunately
Graham
12-Apr-2007
[5855]
So soundex is specific to English ?
Ashley
12-Apr-2007
[5856x2]
Seems so: http://en.wikipedia.org/wiki/Soundex
Looking at the new algorithm again, a lot of effort goes into weighting 
word use (the values block) which in turn is only used in one place 
... to determine the "best" candidate replacement. Given that a spell-checker 
just needs to return a sorted candidate list and the remainder of 
the algorithm handles this quite well, I'd be quite happy with a 
cut-down version that excludes the whole dict/values logic. That 
way I could use the existing dictionary files I have sourced (sans 
soundex codes).
Ladislav
12-Apr-2007
[5858]
it is easy to cut down - just remove the WORDS and TRAIN functions 
and replace their use by reading the DICT data
Graham
12-Apr-2007
[5859]
So, to create a specialized dictionary, you can train it on eg. a 
medical text ?
Ladislav
12-Apr-2007
[5860x2]
yes, sure, you can use any text you want
I suppose medical text can contain a lot of latin words e.g.
Steeve
12-Apr-2007
[5862]
alea jacta est
Graham
12-Apr-2007
[5863x2]
not really ...
might have 100 years ago
Graham
13-Apr-2007
[5865x2]
Wonder how easy it would be to do real time spell checking?
as Firefox does ...
btiffin
13-Apr-2007
[5867]
Isn't that just look up and highlight...quite a bit easier/faster 
than offering corrections?
Ashley
13-Apr-2007
[5868]
how easy it would be to do real time spell checking

 ... with a cut-down version of Ladislav's code (mainly limiting edits 
 to one char distance) speed is no longer an issue. I'm looking at 
 using draw to red underline all spelling errors in a text face and 
 then only popping up the spellcheck box on right-clicking a misspelled 
 word.


Problem is synchronizing red underlines with scrolling. R3's rich 
text support would make this trivia [by comparison] to implement.
Steeve
13-Apr-2007
[5869]
R3 will support rich text format ? where is this claimed ?
Ashley
13-Apr-2007
[5870]
http://www.rebol.com/article/0123.htmland follow-up references in 
some of Carl's blogs.
Steeve
13-Apr-2007
[5871]
old speech from Carl
Ashley
13-Apr-2007
[5872]
Yea of little faith! ;) We'll know after the Devcon one way or the 
other.
Steeve
13-Apr-2007
[5873x2]
it's no so difficult to synchronize draw effects with scrolling
i've done it many times
Ashley
13-Apr-2007
[5875]
The source code for area is here:

	http://trac.geekisp.com/rebgui/browser/widgets/area.r?format=raw

feel free to propose a solution.
Steeve
13-Apr-2007
[5876]
ah ah