From Newsgroup: comp.lang.postscript
Hi Carlos,
On 26/2/22 11:44, Carlos wrote:
A simpler approach is to reencode the UTF-8 string
What an elegant decoder; and I like the iterator with its clever use of
an array.
Invalid sequences should produce U+FFFD. Add:
/unget {
load 0 get dup 0 get dup 0 gt
{ 1 sub 0 exch put } { pop pop } ifelse
} def
and then only two changes:
pop 16#FFFD 0 % invalid sequence
and
6 bitshift nextch not { pop 16#FFFD exit } if
dup 2#11000000 and 2#10000000 ne
{ /nextch unget pop 16#FFFD exit } if
2#00111111 and add
It still accepts overlong sequences but gives output consistent with the
input.
Regards,
David
--- Synchronet 3.21d-Linux NewsLink 1.2