• Paper: dividing by seven

    From John R Levine@johnl@taugh.com to comp.compilers on Fri Apr 10 12:04:13 2026
    From Newsgroup: comp.compilers

    Optimization of 32-bit Unsigned Division by Constants on 64-bit Targets

    Shigeo Mitsunari, Takashi Hoshino

    Granlund and Montgomery proposed an optimization method for unsigned
    integer division by constants [3]. Their method (called the GM method in
    this paper) was further improved in part by works such as [1] and [7], and
    is now adopted by major compilers including GCC, Clang, Microsoft
    Compiler, and Apple Clang. However, for example, for x/7, the generated
    code is designed for 32-bit CPUs and therefore does not fully exploit
    64-bit capabilities. This paper proposes an optimization method for 32-bit unsigned division by constants targeting 64-bit CPUs. We implemented
    patches for LLVM/GCC and achieved speedups of 1.67x on Intel Xeon w9-3495X (Sapphire Rapids) and 1.98x on Apple M4 (Apple M-series SoC) in the microbenchmark described later. The LLVM patch has already been merged
    into llvm:main [6], demonstrating the practical applicability of the
    proposed method.

    https://arxiv.org/abs/2604.07902

    Regards,
    John Levine, johnl@taugh.com, Taughannock Networks, Trumansburg NY
    Please consider the environment before reading this e-mail. https://jl.ly
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From anton@anton@mips.complang.tuwien.ac.at to comp.compilers on Mon Apr 13 05:48:46 2026
    From Newsgroup: comp.compilers

    John R Levine <johnl@taugh.com> writes:
    Optimization of 32-bit Unsigned Division by Constants on 64-bit Targets

    Shigeo Mitsunari, Takashi Hoshino

    Granlund and Montgomery proposed an optimization method for unsigned
    integer division by constants [3]. Their method (called the GM method in
    this paper) was further improved in part by works such as [1] and [7], and
    is now adopted by major compilers including GCC, Clang, Microsoft
    Compiler, and Apple Clang. However, for example, for x/7, the generated
    code is designed for 32-bit CPUs and therefore does not fully exploit
    64-bit capabilities. This paper proposes an optimization method for 32-bit >unsigned division by constants targeting 64-bit CPUs. We implemented
    patches for LLVM/GCC and achieved speedups of 1.67x on Intel Xeon w9-3495X >(Sapphire Rapids) and 1.98x on Apple M4 (Apple M-series SoC) in the >microbenchmark described later. The LLVM patch has already been merged
    into llvm:main [6], demonstrating the practical applicability of the
    proposed method.

    https://arxiv.org/abs/2604.07902

    In a paper [ertl19kps] I looked at full-width division by multiplying
    with the double-width (128 bits on 64-bit targets) reciprocal , and
    that gave a small reduction in latency (over the conventional approach
    of using a single-width multiplication with corrections) in some
    unsigned cases (e.g., n/7). I did not look into half-width division,
    because the programming language I am interested in does not have
    half-width arithmetics, but of course one can apply the same idea in
    this context by multiplying with the full-width reciprocal. It's
    interesting that the benefits are much higher in this context.

    @InProceedings{ertl19kps,
    author = {M. Anton Ertl},
    title = {Integer Division by Multiplying with the
    Double-Width Reciprocal},
    crossref = {kps19},
    pages = {75--84},
    url = {https://www.complang.tuwien.ac.at/papers/ertl19kps.pdf},
    url-slides = {https://www.complang.tuwien.ac.at/papers/ertl19kps-slides.pdf},
    abstract = {Earlier work on integer division by multiplying with
    the reciprocal has focused on multiplying with a
    single-width reciprocal, combined with a correction
    and followed by a shift. The present work explores
    using a double-width reciprocal to allow getting rid
    of the correction and shift.}
    }

    @Proceedings{kps19,
    title = {20. Kolloquium Programmiersprachen und Grundlagen
    der Programmierung (KPS)},
    booktitle = {20. Kolloquium Programmiersprachen und Grundlagen
    der Programmierung (KPS)},
    year = {2019},
    key = {kps19},
    editor = {Martin Pl\"umicke and Fayez Abu Alia},
    url = {https://www.hb.dhbw-stuttgart.de/kps2019/kps2019_Tagungsband.pdf}
    }

    - anton
    --
    M. Anton Ertl
    anton@mips.complang.tuwien.ac.at
    http://www.complang.tuwien.ac.at/anton/
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From robin51@robin51@dodo.com.au to comp.compilers on Sun Apr 19 00:19:52 2026
    From Newsgroup: comp.compilers

    Might be useful to look at Hacker's Delight (H. S. Warren, Jr).

    On 11/04/2026 02:04, John R Levine wrote:
    Optimization of 32-bit Unsigned Division by Constants on 64-bit Targets

    Shigeo Mitsunari, Takashi Hoshino
    ...
    https://arxiv.org/abs/2604.07902

    [It's an amazing book that belongs on every compiler developer's shelf. Here's a
    version of the first edition, of dubious legality, at the Internet archive: https://dn710003.ca.archive.org/0/items/random-papers9/Henry%20S.%20Warren%20-%20Hacker%27s%20delight-Addison-Wesley%20%282003%29.pdf
    -John]
    --- Synchronet 3.21f-Linux NewsLink 1.2