|View script||License||Download documentation as: HTML or editable|
|Download script||History||Other scripts by: peterwood · sunanda|
Script Library: 1238 scripts
Documentation for: make-word-list.r
script: make-word-lisr.r title: List words in a string author: Peter W A Wood date: 2-Apr-2007 Version: 1.0.0
make-word-list.r lists all the unique words in a document.
make-word-list.r is used inside skimp.r - the simple keyword management program that is used in REBOL.org for many of its search indexes.
I would like to thank Sunanda without whom this script wouldn't have been started, tested, optimised or documented.
USAGE: make-word-list config content index-name /for-search ARGUMENTS: config -- changes to the default configuration (Type: object or none) (See below) content -- the string for which words are to be extracted REFINEMENTS: /for-search -- The character specified as the "not-prefix" is not removed from the front of words
1.1. configuration object
The configuration object used by make-word-list function provides almost complete control over what makes a word. You only need to supply changes not all the entries in the configuration object. For example, the following configuration object will only recognise words starting with "a"
my-config: make object! [ word-start: charset [#"a"] probe make-word-list my-config "aword bword cword dad"The result would be ["aword" "ad"] If you are happy with the default settings, you can supply none instead of a parameter object.
1.2. /for-search refinement
There are many different reasons why you may want to extract the words from a string. Two of the most common are:
The two uses may in your application need to behave slightly differently, especially with regard to handling the not-prefix. As an example, the default make-word-list function acts differently when given a string that contains tildes -- a leading tilde is preserved with the /for-search refinement:
>> make-word-list none "I have some ~tildes in t~~~his ~string~" == ["have" "i" "in" "some" "string~" "tildes" "t~~~his"] >> make-word-list/for-search none "I have some ~tildes in t~~~his ~string~" == ["have" "i" "in" "some" "t~~~his" "~string~" "~tildes"]