[REBOL] Re: Bitset!
From: greggirwin:mindspring at: 19-Jun-2004 6:40
Hi Patrick,
PPln> ...is it possible to know what characters are encoded in a bitset?
Below is an older script by Brett Handley, which may work for you as
is, or as a starting point to customize.
-- Gregg
; WATCH FOR WRAP!
REBOL [
Title: "Bitset tools"
Date: 17-Aug-2000
File: %bitsets.r
Author: "Brett Handley"
Email: [brett--codeconscious--com]
Purpose: {
Define functions that relate to bitsets.
}
]
describe-bitset: function [
"Describes a bitset and returns a form suitable for charset."
bitset [bitset!]
/string "Return result as a string not a block. Implies /norange."
/norange "Returns result without using ranges for block results."
/longest "Returns the longest ranges instead of -nice- ranges."
; Instead of having multiple functions, this function uses recursion to
; implement the different functionalities.
][
result test-char
range-start range-end range-spec
nice-ranges tmp-charset
][
; Norange is ignored if result should be a string.
if string [ norange: true ]
; Longest is ignored if we are not going to collect into ranges.
if norange [ longest: false ]
;
; Return all the characters that are included.
;
if norange [
either string [
result: copy {}
][
result: copy []
]
for i 0 255 1 [
if parse/all to-string test-char: to-char i reduce [ bitset ] [
append result test-char
]
]
RETURN result
]
;
; Collect characters into the longest ranges we can.
;
; Outer loop interates over each possible range.
; Inner loop test each character for inclusion in the range.
if longest [
; First get all the characters involved.
result: head describe-bitset/norange bitset
range-start: result
while [not tail? range-start][
range-end: range-start
until [
range-end: next range-end
either any [
tail? range-end
not equal?
(offset? range-start range-end)
(subtract first range-end first range-start)
] [true] [false]
]
; Test is it is a worthwhile range.
if all [
greater? offset? range-start range-end 1
greater? subtract last back range-end first range-start 2
] [
; Replace individual characters with a range.
range-spec: reduce [first range-start to-word "-" first back range-end]
remove/part range-start range-end
range-end: insert range-start range-spec
]
range-start: range-end
]
RETURN head result
]
;
; Segment the given bitset into "nice" ranges.
;
result: make block! 1
nice-ranges: [
#"0" #"9"
#"A" #"Z"
#"a" #"z"
]
; Get our "nice" ranges first.
foreach [rs re] nice-ranges [
append result
describe-bitset/longest
intersect bitset charset compose [(rs) - (re)]
]
; Get everything and dump it at the end.
tmp-charset: charset {}
foreach [rs re] nice-ranges [
tmp-charset: union tmp-charset (charset compose [(rs) - (re)])
]
append result describe-bitset/longest intersect bitset complement tmp-charset
RETURN result
]
describe-object-bitsets: function [
"Reports the character sets used by an object."
obj [object!]
] [result field] [
result: make block! 1
foreach w first :obj [
if equal? type? field: get in obj w bitset! [
append result w
append/only result describe-bitset field
]
]
result
]
; Examples
; Three different ways to describe the digit charset of xml-language.
; describe-bitset xml-language/digit
; describe-bitset/norange xml-language/digit
; describe-bitset/string xml-language/digit
;
; Lets highlight the continuous ranges of the alpha-num charset of xml-language.
; describe-bitset/longest xml-language/alpha-num
;
; How many characters does the digit bitset represent?
; length? describe-bitset/norange xml-language/digit
;
; What is the difference between these two fields of xml-language?
; describe-bitset difference xml-language/data-chars xml-language/data-chars-qt2
;
; Show me all the bitset fields of xml-language.
; foreach [bn bs] describe-object-bitsets xml-language [print [bn mold bs]]
;
; Sanity test for my function.
; equal? ( charset describe-bitset complement charset {} ) ( complement charset {} )