From Newsgroup: comp.compilers
John R Levine <
johnl@taugh.com> writes:
Optimization of 32-bit Unsigned Division by Constants on 64-bit Targets
Shigeo Mitsunari, Takashi Hoshino
Granlund and Montgomery proposed an optimization method for unsigned
integer division by constants [3]. Their method (called the GM method in
this paper) was further improved in part by works such as [1] and [7], and
is now adopted by major compilers including GCC, Clang, Microsoft
Compiler, and Apple Clang. However, for example, for x/7, the generated
code is designed for 32-bit CPUs and therefore does not fully exploit
64-bit capabilities. This paper proposes an optimization method for 32-bit >unsigned division by constants targeting 64-bit CPUs. We implemented
patches for LLVM/GCC and achieved speedups of 1.67x on Intel Xeon w9-3495X >(Sapphire Rapids) and 1.98x on Apple M4 (Apple M-series SoC) in the >microbenchmark described later. The LLVM patch has already been merged
into llvm:main [6], demonstrating the practical applicability of the
proposed method.
https://arxiv.org/abs/2604.07902
In a paper [ertl19kps] I looked at full-width division by multiplying
with the double-width (128 bits on 64-bit targets) reciprocal , and
that gave a small reduction in latency (over the conventional approach
of using a single-width multiplication with corrections) in some
unsigned cases (e.g., n/7). I did not look into half-width division,
because the programming language I am interested in does not have
half-width arithmetics, but of course one can apply the same idea in
this context by multiplying with the full-width reciprocal. It's
interesting that the benefits are much higher in this context.
@InProceedings{ertl19kps,
author = {M. Anton Ertl},
title = {Integer Division by Multiplying with the
Double-Width Reciprocal},
crossref = {kps19},
pages = {75--84},
url = {
https://www.complang.tuwien.ac.at/papers/ertl19kps.pdf},
url-slides = {
https://www.complang.tuwien.ac.at/papers/ertl19kps-slides.pdf},
abstract = {Earlier work on integer division by multiplying with
the reciprocal has focused on multiplying with a
single-width reciprocal, combined with a correction
and followed by a shift. The present work explores
using a double-width reciprocal to allow getting rid
of the correction and shift.}
}
@Proceedings{kps19,
title = {20. Kolloquium Programmiersprachen und Grundlagen
der Programmierung (KPS)},
booktitle = {20. Kolloquium Programmiersprachen und Grundlagen
der Programmierung (KPS)},
year = {2019},
key = {kps19},
editor = {Martin Pl\"umicke and Fayez Abu Alia},
url = {
https://www.hb.dhbw-stuttgart.de/kps2019/kps2019_Tagungsband.pdf}
}
- anton
--
M. Anton Ertl
anton@mips.complang.tuwien.ac.at
http://www.complang.tuwien.ac.at/anton/
--- Synchronet 3.21f-Linux NewsLink 1.2