Documention for: split.r
Created by: greggirwin
 on: 6-May-2012
Format: text/editable
Downloaded on: 30-Apr-2025

=== SPLIT

Given an integer as the dlm parameter, SPLIT will break the series
up into pieces of that size.
	
	print mold split "1234567812345678" 4
	;== ["1234" "5678" "1234" "5678"]
	
If the series can't be evenly split, the last value will be shorter.
	
	print mold split "1234567812345678" 3
	;== ["123" "456" "781" "234" "567" "8"]
	print mold split "1234567812345678" 5
	;== ["12345" "67812" "34567" "8"]
	
Given an integer as dlm, and using the /INTO refinement, it breaks
the series into n pieces, rather than pieces of length n.
	
	print mold split/into [1 2 3 4 5 6] 2
	;== [[1 2 3] [4 5 6]]
	print mold split/into "1234567812345678" 2
	;== ["12345678" "12345678"]
	
If the series can't be evenly split, the last value will be longer.
	
	print mold split/into "1234567812345678" 3
	;== ["12345" "67812" "345678"]
	print mold split/into "1234567812345678" 5
	;== ["123" "456" "781" "234" "5678"]

If dlm is a block containing only integer values, those values 
determine the size of each piece returned. That is, each piece
can be a different size.
		
	print mold split [1 2 3 4 5 6] [2 1 3]
	;== [[1 2] [3] [4 5 6]]
	print mold split "1234567812345678" [4 4 2 2 1 1 1 1]
	;== ["1234" "5678" "12" "34" "5" "6" "7" "8"]
	print mold split first [(1 2 3 4 5 6 7 8 9)] 3
	;== [(1 2 3) (4 5 6) (7 8 9)]
	print mold split #{0102030405060708090A} [4 3 1 2]
	;== [#{01020304} #{050607} #{08} #{090A}]

If the total of the dlm sizes is less than the length of the series,
the extra data will be ignored.
	
	print mold split [1 2 3 4 5 6] [2 1]
	;== [[1 2] [3]]
	
If you have extra dlm sizes after the series data is exhausted, you
will get empty values.
	
	print mold split [1 2 3 4 5 6] [2 1 3 5]
	;== [[1 2] [3] [4 5 6] []]
	
If the last dlm size would return more data than the series contains,
it returns all the remaining series data, and no more.
	
	print mold split [1 2 3 4 5 6] [2 1 6]
	;== [[1 2] [3] [4 5 6]]
	
Negative values can be used to skip in the series without returning
that part:
	
    print mold split [1 2 3 4 5 6] [2 -2 2]
    ;== [[1 2] [5 6]]

Char or any-string values can be used for simple splitting, much as
you would with parse/all, but with different behavior for strings
that have embedded quotes.
		
	print mold split "abc,de,fghi,jk" #","
	;== ["abc" "de" "fghi" "jk"]
	print mold split "abc<br>de<br>fghi<br>jk"<br>
	;== ["abc" "de" "fghi" "jk"]
	
\note
The following are not supported under R2 yet. Ladislav's
PARSE enhancements may be used to support them in the future.
    
http://www.rebol.org/view-archive-script.r?script=parseen.r&version=2
    
/note

If you want to split at more than one character value, you can use
a charset/bitset. 
	
	print mold split "abc|de/fghi:jk" charset "|/:"
	;== ["abc" "de" "fghi" "jk"]

And for even more control, you can use simple parse rules.
	
	print mold split "abc^M^Jde^Mfghi^Jjk" [crlf | #"^M" | newline]
	;== ["abc" "de" "fghi" "jk"]
	print mold split "abc     de fghi  jk" [some #" "]
	;== ["abc" "de" "fghi" "jk"]