UnicodeEncode filter
This page applies to Harlequin v13.1r0 and later; both Harlequin Core and Harlequin MultiRIP.
The implementation of the UnicodeEncode
filter has two required parameters, /From
and /To
, and two optional parameters,/ByteOrderMark
and /Substitute
.
From | string or name (required) | The name of the encoding used for the source data written to the encoding filter. The encoding names are in any form that ICU accepts (so are not case sensitive; spaces and hyphens are ignored). The encoding name can also be |
To | string or name (required) | The name of the encoding for the data output by the encoding filter and are in any form that ICU accepts (so are not case sensitive; spaces and hyphens are ignored). |
ByteOrderMark | Boolean (optional) | An optional parameter, BOMs appearing after the start of input are never removed from the output are always re-encoded (that is, replaced with the substitute character, if specified; the output encoding cannot represent the BOM). |
Substitute | string (optional) | A string containing a single character in the output encoding that is used in the output stream for any input character that is invalidly encoded or cannot be represented by the output encoding. If the string is present but empty, then any input characters that have an invalid encoding or cannot be represented in the output encoding is dropped from the output. If this entry is not specified and the input character is invalidly encoded or not representable in the output encoding, then the filter raises an error. |
The forms that ICU accepts are either IBM-defined encoding names, or aliases. Most of these are direct mappings of other encoding names. So, if you had the full converter set installed, you could do:
1024 string dup
<<
/From /Unicode
/To (Shift-JIS)
/ByteOrderMark false
/Substitute (x)
>> /UnicodeEncode filter dup
(filename) (r) file 1024 string readline pop
writestring flushfile
% Convert the first line of the file from any Unicode encoding to Shift-JIS,
% substituting "x" for unencodable characters, leaving it on the operand stack.
To convert from UTF-16 to UTF-8 (both Unicode forms), and ensure that there is a Byte Order Mark, you might do:
1024 string dup
<<
/From /UTF-16
/To /UTF-8
/ByteOrderMark true
>> /UnicodeEncode filter dup
(filename) (r) file 1024 string readline pop
writestring flushfile
Some limitations follow:
- Harlequin RIP SDK does not necessarily include the Unicode conversion base functions that this filter use.
- The converters installed depend on the RIP; OEMs can include new converters of their own by creating a new ICU converter package, and including it in the ICU data bundle.
- Global Graphics has never documented he procedure to add new converters.
- The procedure to add new converters differs between ICU 3.0 (in 7.x RIPs) and ICU 3.4 (in some recent RIPs).