This package contains a header-only C/C++ library for optimizing integer
division. Integer division is one of the slowest instructions on most CPUs,
e.g. on current x64 CPUs a 64-bit integer division has a latency of up to 90
clock cycles whereas a multiplication has a latency of only 3 clock cycles.
libdivide allows you to replace expensive integer division instructions by a
sequence of shift, add and multiply instructions that will calculate the
integer division much faster.
On current CPUs you can get a speedup of up to 10x for 64-bit integer division
and a speedup of up to to 5x for 32-bit integer division when using libdivide.
libdivide also supports SSE2, AVX2 and AVX512 vector division which provides an
even larger speedup.