TTF Parser (was: help needed from graphics GURUS ?)
[1/1] from: greggirwin::mindspring::com at: 11-Sep-2002 10:49
Hi Gang,
Here's a crude, quickly hacked, TTF parser that will extract the English
name of the font from the file, add it to a block of font names, and print
it out to the console. TTF files are insanely flexible and can contain all
kinds of platform and language information. This basic piece was easy enough
to get working, and if the font-requester project gets going, it could
certainly be enhanced and improved as we figure out what it needs.
--Gregg
P.S. WATCH FOR LINE WRAP!
REBOL [
Title: "ttf-parser"
File: %ttf-parser.r
Author: "Gregg Irwin"
Email: [greggirwin--acm--org]
Purpose: {Parse TrueType Font files.}
]
null-buff: func [
{Returns a null-filled string buffer of the specified length.}
len [integer!]
][
head insert/dup make string! len #"^@" len
]
buff-to-num: func [buf /big-endian] [
either big-endian [
to integer! to binary! buf
][
to integer! to binary! head reverse buf
]
]
;The following data types are used in the TrueType font file.
;All TrueType fonts use Motorola-style byte ordering (Big Endian):
;BYTE: ;8-bit unsigned integer.
;CHAR: ;8-bit signed integer.
;USHORT: ;16-bit unsigned integer.
;SHORT: ;16-bit signed integer.
;ULONG: ;32-bit unsigned integer.
;LONG: ;32-bit signed integer.
;FIXED: ;32-bit signed fixed-point number (16.16)
;FUNIT Smallest measurable distance in the em space.
;FWORD 16-bit signed integer (SHORT) that describes a quantity in FUnits.
;UFWORD Unsigned 16-bit integer (USHORT) that describes a quantity in
FUnits.
;F2DOT14 16-bit signed fixed number with the low 14 bits of fraction (2.14).
;The TrueType font file begins at byte 0 with the Offset Table.
table-directory: make object! [
version: ;Fixed 0x00010000 for version 1.0.
num-tables: ;USHORT Number of tables.
search-range: ;USHORT (Maximum power of 2 <= numTables) x 16.
entry-selector: ;USHORT Log2(maximum power of 2 <= numTables).
range-shift: ;USHORT NumTables x 16 - searchRange.
none
]
;This is followed at byte 12 by the Table Directory entries.
; Entries in the Table Directory must be sorted in ascending order by tag.
table-directory-entry: make object! [
tag: ;ULONG 4-byte identifier.
checkSum: ;ULONG CheckSum for this table.
offset: ;ULONG Offset from beginning of TrueType font file.
length: ;ULONG Length of this table.
none
]
;The Table Directory makes it possible for a given font to contain only
;those tables it actually needs. As a result there is no standard value
;for numTables.
comment {
Tags are the names given to tables in the TrueType font file. At
present,
all tag names consist of four characters, though this need not be the
case.
Names with less than four letters are allowed if followed by the
necessary
trailing spaces. A list of the currently defined tags follows.
}
;Required Tables
;Tag Name
; required-tables: [
; cmap "character to glyph mapping"
; glyf "glyph data"
; head "font header"
; hhea "horizontal header"
; hmtx "horizontal metrics"
; loca "index to location"
; maxp "maximum profile"
; name "naming table"
; post "PostScript information"
; OS/2 "OS/2 and Windows specific metrics"
; ]
;Optional Tables
;Tag Name
; optional-tables: [
; cvt "Control Value Table"
; EBDT "Embedded bitmap data"
; EBLC "Embedded bitmap location data"
; EBSC "Embedded bitmap scaling data"
; fpgm "font program"
; gasp "grid-fitting and scan conversion procedure (grayscale)"
; hdmx "horizontal device metrics"
; kern "kerning"
; LTSH "Linear threshold table"
; prep "CVT Program"
; PCLT "PCL5"
; VDMX "Vertical Device Metrics table"
; vhea "Vertical Metrics header"
; vmtx "Vertical Metrics"
; ]
comment {
Other tables may be defined for other platforms and for future
expansion.
Note that these tables will not have any effect on the scan converter.
Tags for these tables must be registered with Apple Developer Technical
Support. Tag names consisting of all lower case letters are reserved for
Apple's use. The number 0 is never a valid tag name.
}
; name table
name-table: make object! [
format: ;USHORT Format selector (=0).
num-records: ;USHORT Number of NameRecords that follow n.
offset: ;USHORT Offset to start of string storage (from start of
table).
none
records: copy [] ;The NameRecords.
string-data: none ;(Variable) Storage for the actual string data.
]
name-record: make object! [
platform: ;USHORT Platform ID.
encoding-id: ;USHORT Platform-specific encoding ID.
language-id: ;USHORT Language ID.
name-id: ;USHORT Name ID.
string-length: ;USHORT String length (in bytes).
string-offset: ;USHORT String offset from start of storage area (in
bytes).
none
]
; platform-ids: [
; 0 Apple-Unicode ""
; 1 Macintosh "Script manager code"
; 2 ISO "ISO encoding"
; 3 Microsoft "Microsoft encoding"
; ]
;
; ; ?encoding ids are only used with Microsoft platform?
; encoding-ids: [
; 0 "Undefined character set or indexing scheme"
; 1 "UGL character set with Unicode indexing scheme"
; ]
;
; language-ids: [
; ; lots of stuff here. Not sure I want to tackle it right now.
; ]
;
; name-ids: [
; 0 ;Copyright notice.
; 1 ;Font Family name
; 2 ;Font Subfamily name; for purposes of definition, this is assumed
to address style (italic, oblique) and weight (light, bold, black, etc.)
only. A font with no particular differences in weight or style (e.g. medium
weight, not italic and fsSelection bit 6 set) should have the string
Regular
stored in this position.
; 3 ;Unique font identifier
; 4 ;Full font name; this should simply be a combination of strings 1
and 2. Exception: if string 2 is "Regular," then use only string 1. This is
the font name that Windows will expose to users.
; 5 ;Version string. In n.nn format.
; 6 ;Postscript name for the font.
; 7 ;Trademark; this is used to save any trademark notice/information
for this font. Such information should be based on legal advice. This is
distinctly separate from the copyright.
; ]
;===============================================
;== Set your files here for testing
font-dir: %//windows/fonts/
files: copy []
foreach file read font-dir [
if %.ttf = suffix? file [
append files file
]
]
;===============================================
integer-to-version: func [value [integer!]][
add
to integer! divide value 65536
divide (value and 65535) 10
]
get-ttf-num: func [data offset length] [
buff-to-num/big-endian copy/part at data offset length
]
font-names: copy []
foreach file files [
tables: copy []
data: read/binary join font-dir file
; Main table directory
td: make table-directory []
td/version: get-ttf-num data 0 4
td/num-tables: get-ttf-num data 5 2
td/search-range: get-ttf-num data 7 2
td/entry-selector: get-ttf-num data 9 2
td/range-shift: get-ttf-num data 11 2
; print [
; "version=" integer-to-version td/version
; "num-tables=" td/num-tables
; "search-range=" td/search-range
; "entry-selector=" td/entry-selector
; "range-shift=" td/range-shift
; ]
; table directory entries for each table
repeat i td/num-tables [
tbl-pos: i - 1 * 16 + 13 ; 16 = struct length, 13 = offset from BOF
entry: make table-directory-entry [
tag: to-word to-string copy/part at data tbl-pos + 0 4
checksum: get-ttf-num data tbl-pos + 4 4
offset: get-ttf-num data tbl-pos + 8 4
length: get-ttf-num data tbl-pos + 12 4
]
append tables entry
]
;foreach table tables [print [table/offset table/length table/tag]]
; Do a brute force search for the name table
name-table-data: none
foreach table tables [
if table/tag = 'name [
name-table-data: copy/part at data table/offset + 1 table/length
break
]
]
if name-table-data [
;print length? name-table-data
name-tbl: make name-table [
format: get-ttf-num name-table-data 0 2
num-records: get-ttf-num name-table-data 3 2
offset: get-ttf-num name-table-data 5 2
string-data: to-string copy at name-table-data offset
]
;probe name-tbl
repeat i name-tbl/num-records [
tbl-pos: i - 1 * 24 + 7 ; 24 = struct length, 7 = offset from
table start
entry: make name-record [
platform: get-ttf-num name-table-data tbl-pos + 0 2
encoding-id: get-ttf-num name-table-data tbl-pos + 2 2
language-id: get-ttf-num name-table-data tbl-pos + 4 2
name-id: get-ttf-num name-table-data tbl-pos + 6 2
string-length: get-ttf-num name-table-data tbl-pos + 8 2
string-offset: get-ttf-num name-table-data tbl-pos + 10 2
; This is my extension for testing
string-data: to-string copy/part at name-table-data 1 +
name-tbl/offset + string-offset string-length
]
append name-tbl/records entry
; Use English string for testing here
; name-id 4 = full font name
; language-id 0 = english
if all [entry/name-id = 4 entry/language-id = 0] [
append font-names entry/string-data
print entry/string-data
]
]
]
]
halt