View script | License | Download documentation as: HTML or editable |
Download script | History | Other scripts by: peterwood · sunanda |
[0.05] 16.637k
Documentation for: make-word-list.rscript: make-word-lisr.r title: List words in a string author: Peter W A Wood date: 2-Apr-2007 Version: 1.0.0 purposemake-word-list.r lists all the unique words in a document. make-word-list.r is used inside skimp.r - the simple keyword management program that is used in REBOL.org for many of its search indexes. I would like to thank Sunanda without whom this script wouldn't have been started, tested, optimised or documented. 1. usageUSAGE: make-word-list config content index-name /for-search ARGUMENTS: config -- changes to the default configuration (Type: object or none) (See below) content -- the string for which words are to be extracted REFINEMENTS: /for-search -- The character specified as the "not-prefix" is not removed from the front of words 1.1. configuration objectThe configuration object used by make-word-list function provides almost complete control over what makes a word. You only need to supply changes not all the entries in the configuration object. For example, the following configuration object will only recognise words starting with "a" my-config: make object! [ word-start: charset [#"a"] probe make-word-list my-config "aword bword cword dad"The result would be ["aword" "ad"] If you are happy with the default settings, you can supply none instead of a parameter object.
1.2. /for-search refinementThere are many different reasons why you may want to extract the words from a string. Two of the most common are:
The two uses may in your application need to behave slightly differently, especially with regard to handling the not-prefix. As an example, the default make-word-list function acts differently when given a string that contains tildes -- a leading tilde is preserved with the /for-search refinement: >> make-word-list none "I have some ~tildes in t~~~his ~string~" == ["have" "i" "in" "some" "string~" "tildes" "t~~~his"] >> make-word-list/for-search none "I have some ~tildes in t~~~his ~string~" == ["have" "i" "in" "some" "t~~~his" "~string~" "~tildes"] |