Forum: Too Lazy BBS

Re: Composite fonts for Unicode strings (was: Final request for feedback)

From David Newall@davidn@davidnewall.com to comp.lang.postscript on Mon Feb 28 21:39:30 2022

From Newsgroup: comp.lang.postscript

Hi Carlos,

On 26/2/22 11:44, Carlos wrote:

A simpler approach is to reencode the UTF-8 string

What an elegant decoder; and I like the iterator with its clever use of
an array.

Invalid sequences should produce U+FFFD. Add:

/unget {
load 0 get dup 0 get dup 0 gt
{ 1 sub 0 exch put } { pop pop } ifelse
} def

and then only two changes:

pop 16#FFFD 0 % invalid sequence

and

6 bitshift nextch not { pop 16#FFFD exit } if
dup 2#11000000 and 2#10000000 ne
{ /nextch unget pop 16#FFFD exit } if
2#00111111 and add

It still accepts overlong sequences but gives output consistent with the
input.

Regards,

David
--- Synchronet 3.21d-Linux NewsLink 1.2

From David Newall@davidn@davidnewall.com to comp.lang.postscript on Mon Feb 28 22:26:53 2022

From Newsgroup: comp.lang.postscript

Hi Carlos,

On 26/2/22 11:44, Carlos wrote:

It is possible to create a tree of composite fonts, where each byte in
a UTF-8 sequence dispatches to the next font, and the last one picks
the glyph.

Thank you for the clearest example of composite fonts that I've ever
seen. Unfortunately, they lose useful cshow (only the last byte of each character is pushed on stack) and don't work at all with kshow.

It's an intriguing idea but I'm not sure where to go with it.

What I'm currently working on fails when exceeding 64K glyphs (Adobe
PostScript array and dictionary implementation limits) and a composite
font gets past that, but not when simply transforming a standard font
into a composite font (CharStrings limit.)

Regards,

David
--- Synchronet 3.21d-Linux NewsLink 1.2

Who's Online

System Info

Sysop:	Amessyroom
Location:	Fayetteville, NC
Users:	63
Nodes:	6 (0 / 6)
Uptime:	492946:26:13
Calls:	840
Files:	1,300
D/L today:	5 files (1,241K bytes)
Messages:	260,694