Mailing List Archive: 49091 messages
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

[REBOL] Re: Frequency of phrases

From: tomc:darkwing:uoregon at: 23-Aug-2002 11:50

On Thu, 22 Aug 2002, Louis A. Turk wrote:
> Hi rebols, > > Goal: To find the length and frequency of use of all the unique phrases in > a text file. > > Phrase: A phrase will be defined as a string len characters long and with a > space at each end. All phrases 100 characters long are to be processed > first, then all phrases of length len - 1 and so on until len = 5. > > The text file: To simplify things, manually place a space at the beginning > and at the end of the file to be processed. To further simplify things, > place a space before all punctuation marks. > > Achieving this goal is proving to be quite a bit more complicated then I > thought at first, and will be extremely time consuming if not done properly. > > What is the best way to do this?
ask on the list then pick your solution
> Louis
quick and dirty rebol[] buf: read %<whatever> replace/all buf "^/" " " replace/all buf "." " ." replace/all buf "!" " !" replace/all buf "?" " ?" replace/all buf " " " " insert buf " " append buf " " end: index? next find/reverse find/last buf " " " " hsh: make hash! (length? buf) cnt: 0 phr: copy "" fub: copy "" while [(index? buf) < end] [ fub: find next find next find buf " " " " " " phr: trim copy/part buf either fub [fub][fub: back tail buf] while[all[(length? phr) < 101 (length? parse phr none) > 2 not tail? fub] ][ cnt: select hsh phr either cnt [change next find hsh phr (cnt + 1)] [append hsh reduce[:phr 1]] fub: next find fub " " either fub [phr: trim copy/part buf fub] [fub: tail buf] ] buf: next find buf " " ] shsh: copy [] foreach [k v] hsh [append/only shsh reduce[k v] ] sort/compare shsh func[a b][a/2 > b/2] foreach sh shsh [print sh] -------------------------------------------- not so sure I would want phrases to span over sentence ending puncuation but that is what you asked for