Computing the Multiplicative Inverse for Optimizing Integer Division 

One of the interesting problems about reverseengineering code is the rather strange transitions and optimizing compiler can do. Perhaps one of the strangest transformations is the optimization of division by a compiletime constant integer. Because integer division is one of the slowest operations in the computer, the performance is tremendously improved if the division can be done by other means. The simplest solution comes about when the divisor is a power of 2, because then it can be done by a shift (but this is not quite right in general, and doesn't work for signed division without special provision, more on this later). But you can also optimize division by any other integer divisor by computing a multiplier and shift value to perform the actual division.
The trick is to represent the multiplier as a binary fraction. When you multiply a 32bit integer times the binary fraction, you get a 64bit result, whose highorder 32 bits are the quotient and the loworder 32 bits are the remainder. For example, to divide a signed integer by 13, the following code does the job:
Divisor  Dividend  Code 
13 
[esp+4] 
mov eax, 1321528399; 4ec4ec4fH imul DWORD PTR [esp+4] sar edx, 2 mov eax, edx shr eax, 31 add eax, edx 
So where did that strange multiplier come from? Well, the algorithm is given in http://www.hackersdelight.org/HDcode/magic.c, and essentially this code computes the values a and sh such that
While normally there is no advantage to computing this value at runtime, if you are about to do a tight, intense computation where you don't have a compile time constant divisor, it will make the runtime loop run substantially faster (perhaps an order of magnitude to a factor of 50). The file magic.cpp in the downloadable project contains this algorithm, so you can easily incorporate it into your project.
The value a is referred to in this essay as the multiplicative inverse.
In the pattern shown above, there are two components: the division algorithm, and the signcompensation algorithm. The first three lines of code are the division algorithm, and the parameters can be found by using the downloadable program attached, as shown:
Typing the divisor into the top left box computes the multiplier and sar instruction value. Typing the multiplier found it a piece of code, as an 8digit hex number, into the lower left box, and giving the sar value, will reveal the original divisor. Note that these are automatically filled in when the divisor is typed into the top left box.
After the imul instruction, the result of eax * 0x4ec4ec4f is found in edx:eax.
The remaining three instructions deal with keeping the value correct if the dividend is negative. The value will be one too high if the result is negative; therefore, the value is adjusted by 1 if it is negative. This is done by shifting the quotient (edx) right by 31 bits using a logical shift right (shr instruction). The result is 0 if the value is positive and 1 if the value is negative. By doing the add instruction, the quotient is adjusted according to the sign bit, and left in eax.
Division in signed or unsigned arithmetic is available. Note that in some cases, the multiplicative inverse is identical in both signed and unsigned arithmetic; in other cases, it can be quite different.
Clicking the copy button () will place the entire contents of the edit control to the clipboard.
Note that for any given code example, the source value and the result register may be different. In this particular case, the dividend was in [esp+4] and the intended result register is eax. The code generator just refers to the abstract operand dividend. You may substitute your operand source, such as [esp+4], or use the TEXTEQU to define a replacement, e.g.,
dividend TEXTEQU [esp+4]
The basic trick here is to treat the multiplier as the fractional part of a fixedpoint integer value. The highorder 32 bits are the integer part, and the loworder 32 bits are the fraction part. Because we normalize the result so there is only a fraction part, we can work with two 32bit values, but the theoretical 64bit representation is given as
where the highorder value has 32 bits and represents the integer, and the loworder 32 bits represents the fraction, and the bits are interpreted as inverse powers of 2. Unlike a floating point number, the number is not represented with a "hidden" implicit 1=bit. The divisor is normalized to a 4bit normalization, and the shift constant sh derived in the formula above is the additional factor to be used to compensate for the normalization. The inverter program computes the loworder 32bit value of the number expressed in the above formula, becaue the integer part is always treated as being 0.
Division of an integer by a power of 2 is trivial; just shift right. To divide by 2^{n}, shift right n bits. Well, only if the integer is positive. As we saw above, there has to be code to compensate for the sign bit. For example, to divide by 4, you would think that a sar 2 would do the job. But in signed arithmetic, it doesn't.
So the code below will handle this:
cdq
and edx, 3
add eax, edx
sar eax, 2
The dividend is placed in the eax register. It is then signextended into the edx register by the cdq instruction. The value 2^{n}1 is computed as a bit mask, and the loworder bits of the sign extension are masked off. This produces a "rounding" value which is either 0 or a sequence of 1bits equal in length to the shift distance. This value is added to eax and then the shift by n bits to do the division by 2^{n}.
To simplify the discussion, we will show only the loworder 8 bits, where the highorder bit is the sign bit. The values are 2's complement. So if I show 1110 1001, the actual value was 1111 1111 1111 1111 1111 1111 1110 1001. The computation shown here is 23/4, which should be 5. Suppose I do 23/4; the expected result is 5.
However, if we just do a shift right by 2 bits, and the dividend is negative, we get the erroneous computation
We fix this by adding the three instructions shown; one to get the sign value, two more trivial ones to do the rounding, and then the actual shift right, illustrated in detail below.
Instruction 
edx 
eax 

Binary 
Decimal 

Input state 
???? ???? 
1110 1001 
23 
cdq 
1111 1111 
1110 1001 
23 
and edx, 3 
0000 0011 
1110 1001 
23 
add eax, edx 
0000 0011 
1110 1100 
20 
sar eax, 2 
0000 0011 
1111 1011 
5 
For unsigned divide, a logical shift right (shr) of the dividend will suffice. An unsigned divideby8 would be simply
mov eax, dividend shr eax, 3t
The source code for the Multiplicative Inverse Calculator can be downloaded. It is a Visual Studio 2008 project.
Alverson, Robert, “Integer Division Using Reciprocals” (1991) http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.33.1710  
Granlund, Torbjörn and Montgomery, Peter, L., “Division by Invariant Integers Using Multiplication” (1994) http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.1.2556  
From Hacker’s Delight, (Addison Wesley, 2003) [Note: some browsers will show the code unformatted as a single big block; you will have to reformat it]  
http://www.hackersdelight.org/HDcode/magic.c  
http://www.hackersdelight.org/HDCode/magicu.c  
http://www.hackersdelight.org/divcMore.pdf 
The views expressed in these essays are those of the author, and in no way represent, nor are they endorsed by, Microsoft.