• Speed or space

    From David Newall@davidn@davidnewall.com to comp.lang.postscript on Wed Feb 9 01:32:35 2022
    From Newsgroup: comp.lang.postscript

    Still working on utf8show and have classic space vs. time tradeoff.

    I can store the UnicodeEncoding map in a Dictionary, which is blindingly
    fast, or in a sparse array, which is more compact but slower to access.
    Every glyph painted needs a lookup in the map.

    ---Bytes Used-- --Savings-- -Time(ms)-- S:D ms
    Map Source Dict Sparse Bytes % Dict Sparse Ratio AdobeGlyphList(1) 222472 180798 41686 19% 161 2968 18.45 FreeSerif.gn2(2) 442320 356006 86314 20% 604 2856 4.73 UnifontMedium.gn2(3) 3357832 2173806 1184026 35% 156 949 6.08

    (1) AdobeGlyphList from ghostscript v9.50 on Ubuntu 20.04.3.

    (2) Generated by fontforge from FreeSerif.sfd extracted from
    freefont-src-20120503.tar.gz from ftp.gnu.org/gnu/freefont.

    (3) Generated by fontforge from unifont-14.0.01.ttf extracted
    from unifont-14.0.01.tar.gz from ftp.gnu.org/gnu/unifont/.

    Tests were performed using Ghostscript 9.5 on Ubuntu 20.04.3 with
    Intel(R) Core(TM) i7-1165G7 @ 2.80GHz. Glyphs for UNICODE values 0
    through 150,000 were retrieved from each map five times.

    I don't know why accessing the sparse Adobe map is so much slower than
    the other two. Average retrieval time is under 4ns; for UnifontMedium
    it's under 1.3ns.

    I'm going to implement it one way only. Which should it be?
    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From John Reiser@vendor@BitWagon.com to comp.lang.postscript on Wed Feb 9 09:59:17 2022
    From Newsgroup: comp.lang.postscript

    On 2/8/22 06:32, David Newall wrote:
    Still working on utf8show and have classic space vs. time tradeoff.
    [[snip]]
    I'm going to implement it one way only.-a Which should it be?

    Small space is more important than blindingly-fast. Think of containers
    in the cloud, where stingy RAM allocations abound and demand paging
    is horrible or impossible.

    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From Carlos@carlos@cvkm.cz to comp.lang.postscript on Thu Feb 10 15:28:10 2022
    From Newsgroup: comp.lang.postscript

    On Wed, 9 Feb 2022 01:32:35 +1100
    David Newall <davidn@davidnewall.com> wrote:

    Still working on utf8show and have classic space vs. time tradeoff.

    I can store the UnicodeEncoding map in a Dictionary, which is
    blindingly fast, or in a sparse array, which is more compact but
    slower to access. Every glyph painted needs a lookup in the map.

    I don't know what a sparse array is in this context. Is it like the
    input array to unicodefont? In that case maybe it could be optimized,
    by, idk, putting code points in entries 0, 2, 4, 6, ... to speed up
    lookup, and the corresponding arrays with glyph names (or just the main
    glyph name) in entries 1, 3, 5, 7, ..., or something like that.

    ---Bytes Used-- --Savings-- -Time(ms)-- S:D ms
    Map Source Dict Sparse Bytes % Dict Sparse Ratio AdobeGlyphList(1) 222472 180798 41686 19% 161 2968 18.45 FreeSerif.gn2(2) 442320 356006 86314 20% 604 2856 4.73 UnifontMedium.gn2(3) 3357832 2173806 1184026 35% 156 949 6.08

    (1) AdobeGlyphList from ghostscript v9.50 on Ubuntu 20.04.3.

    (2) Generated by fontforge from FreeSerif.sfd extracted from
    freefont-src-20120503.tar.gz from ftp.gnu.org/gnu/freefont.

    (3) Generated by fontforge from unifont-14.0.01.ttf extracted
    from unifont-14.0.01.tar.gz from ftp.gnu.org/gnu/unifont/.

    Tests were performed using Ghostscript 9.5 on Ubuntu 20.04.3 with
    Intel(R) Core(TM) i7-1165G7 @ 2.80GHz. Glyphs for UNICODE values 0
    through 150,000 were retrieved from each map five times.

    I don't know why accessing the sparse Adobe map is so much slower than
    the other two. Average retrieval time is under 4ns; for UnifontMedium
    it's under 1.3ns.

    I'm going to implement it one way only. Which should it be?

    Contrary to John's opinion, I don't think space matters much,
    especially if it's only 3 MB. So I'd choose speed. But then, I don't
    know anything about Postscript printers or other low capacity devices.

    C.

    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From David Newall@davidn@davidnewall.com to Carlos on Wed Feb 16 14:29:57 2022
    From Newsgroup: comp.lang.postscript

    On 11/2/22 01:28, Carlos wrote:
    I don't know what a sparse array is in this context.

    Thanks for your feedback and thoughts.

    I deliberately chose a term that suggests the implementation without
    pinning it down so that I could ask about speed vs space without getting
    bogged down over how it was implemented.

    Here's how I implemented it:

    A "sparse" array contains values indexed by integers, and efficiently
    permits missing values. Each element in the array is a sub-array
    containing the index of the first value followed by one or more values.

    For example: [[8 v8 v9 v10] [14 v14 v15] [37 v37]] is a sparse array
    containing values at index 8, 9, 10, 14, 15 and 37.

    The sparseget procedure retrieves an element from a sparse array. If
    the array contains a value at that index, the value and true is pushed
    on the stack, otherwise false is pushed.

    I compared normal versus packed arrays and found that packing saved
    negligible space while substantially increasing access time, so sparse
    arrays are not packed.

    In terms of performance versus a dictionary (for UnicodeEncoding map)
    accessing a sparse array takes around 5 times longer, which sounds awful
    but (on my computer) that's sub-1ns access versus 4ns, so I don't think
    the speed is an issue.

    One inexplicable result is that the sparse array generated from the
    built-in AdobeGlyphList (Ghostscript 9.50 on Ubuntu 20.04 with Intel
    i7-1165G7 @ 2.80Mhz) takes 25 times as long as the equivalent
    dictionary. I don't understand why so much slower compared with maps
    from fonts. Clutching at straws, I initially thought it might be
    because AdobeGlyphList is built-in and maps from (non-builtin) fonts
    aren't, but that can't be the case. For benchmarks, I load the
    dictionary and sparse arrays from external files so, although derived
    from the built-in dictionary, they are unlikely to be stored in common.
    I also tried changing the names (added "xx" to them) and the result
    stands. It's a mystery to me.

    I'm inclined to go for space over speed but am willing to listen to
    reason if people feel strongly that speed is more important.
    --- Synchronet 3.21d-Linux NewsLink 1.2