• Re: Composite fonts for Unicode strings (was: Final request for feedback)

    From David Newall@davidn@davidnewall.com to comp.lang.postscript on Mon Feb 28 21:39:30 2022
    From Newsgroup: comp.lang.postscript

    Hi Carlos,

    On 26/2/22 11:44, Carlos wrote:
    A simpler approach is to reencode the UTF-8 string

    What an elegant decoder; and I like the iterator with its clever use of
    an array.

    Invalid sequences should produce U+FFFD. Add:

    /unget {
    load 0 get dup 0 get dup 0 gt
    { 1 sub 0 exch put } { pop pop } ifelse
    } def

    and then only two changes:

    pop 16#FFFD 0 % invalid sequence

    and

    6 bitshift nextch not { pop 16#FFFD exit } if
    dup 2#11000000 and 2#10000000 ne
    { /nextch unget pop 16#FFFD exit } if
    2#00111111 and add

    It still accepts overlong sequences but gives output consistent with the
    input.

    Regards,

    David
    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From David Newall@davidn@davidnewall.com to comp.lang.postscript on Mon Feb 28 22:26:53 2022
    From Newsgroup: comp.lang.postscript

    Hi Carlos,

    On 26/2/22 11:44, Carlos wrote:
    It is possible to create a tree of composite fonts, where each byte in
    a UTF-8 sequence dispatches to the next font, and the last one picks
    the glyph.

    Thank you for the clearest example of composite fonts that I've ever
    seen. Unfortunately, they lose useful cshow (only the last byte of each character is pushed on stack) and don't work at all with kshow.

    It's an intriguing idea but I'm not sure where to go with it.

    What I'm currently working on fails when exceeding 64K glyphs (Adobe
    PostScript array and dictionary implementation limits) and a composite
    font gets past that, but not when simply transforming a standard font
    into a composite font (CharStrings limit.)

    Regards,

    David
    --- Synchronet 3.21d-Linux NewsLink 1.2