The best bit counting algorithm as far as I know is the one invented by folks at Stanford University, which is always O(1).
int bitcount(int n)
{
int cnt = 0;
n = n - ((n >> 1) & 0x55555555);
n = (n & 0x33333333) + ((n >> 2) & 0x33333333);
return (((n + (n >> 4)) & 0x0F0F0F0F) * 0x01010101) >> 24;
}
or: 2(a+1) - 2a = 2a
2(a+2) - 2a = 2a * 22 - 2a = 2a (4 - 1) = 3*2a
The rest is actually manipulation to count this a[0] + a[1] + ... + a[31]
int bitcount(int n)
{
int cnt = 0;
n = n - ((n >> 1) & 0x55555555);
n = (n & 0x33333333) + ((n >> 2) & 0x33333333);
return (((n + (n >> 4)) & 0x0F0F0F0F) * 0x01010101) >> 24;
}
For example, after I compile it to x86 Assembly:
.cfi_startproc
movl 4(%esp), %eax
movl %eax, %edx
sarl %edx
andl $1431655765, %edx
subl %edx, %eax
movl %eax, %edx
sarl $2, %eax
andl $858993459, %edx
andl $858993459, %eax
addl %edx, %eax
movl %eax, %edx
sarl $4, %edx
addl %edx, %eax
andl $252645135, %eax
imull $16843009, %eax, %eax
sarl $24, %eax
ret
.cfi_endproc
But, how does actually the algorithm work?
Ok, let try a simple one. Assume n is a 4-bit number, and the bits can be represented as a set such that n= {a,b,c,d}., where a,b,c.. can only have either 0 or 1. The decimal value of the n is: 8*a + 4*b + 2*c + d. Total number of bit one is: a + b + c +d.
For example, for n=0b1101, a=1, b=1, c=0, d=1 (which in decimal is 8*1 + 4*1 + 2*0 + 1 = 13), and total number of bit one is a+b+c+d = 1 + 1 + 0 + 1 = 3. So far so good?
Now, we know that to count the 1-bits is as simple as: a+b+c+d. But, wait.... n itself is not a+b+c+d, but 8*a + 4*b + 2*c + d. Ok, let's conquer this step-by-step.
If we shift n one bit to the right the LSbit is gone and other numbers just divided by two, so n becomes 4*a + 2*b + c, right? Now substract this to the original n.
n = 8*a + 4*b + 2*c + d
n>>1 = 0 + 4*a + 2*b + c
----------------------------- -
= 0 + 4*a + 2*b + c + d
That's a good direction! Now if (n>>1) is written in the 4-bit nibble it is actually 0 + 4a + 2b + c. If I just want to subtract 4a and c, we have to mask out 2*b. so we mask it with binary 0101 (0x5), so we get only 4a + c:
n = 8*a + 4*b + 2*c + d
(n>>1)&0x5 = 4*a + 0 + c
---------------------------------- -
n -(n>>1)&0x05 = 4*a + 4*b + c + d
Now store this result back to n, so from now on n is 4*a + 4*b + c + d
To get c + d only, we mask n with 0b11 or n & 0x03
if we shift n above once, we get 0 + 2a + 2b + 0, but if shift it again it becomes a + b!
To make sure we only get the lowest two bits (a + b), we mask it to 0x03 or:
n & 0x03 = c + d
(n >> 2)&0x03) = a + b
------------------------ +
Nice! now we have this expression: a + b + c +d .
so, for 4-bit bit counting, we can use the expression:
n = n - (n>>1) & 0x05
bits = (n & 0x3) + (n>>2) & 0x3
Proof: as example above, n=13 (0b1101). so a=1,b=1, c=0, d=1
n = 0b1101 - (0b0110) & 0b0101 = 0b1101 - 0b100 = 13 - 4 = 9 = 0b1001
then the next step:
bits =(0b1001 & 0b0011) + (0b1001>>2) & 0b0011, or bits = 0b0001 + (0b0010) & 0b0011
or bits = 1 + 2 = 3 !!
For 32-bit, it is based on the same idea, except we have to do more.
Say n has set of coefficients {a[0], a[1], ...., a[31]} to represent the number, so n = a[0]*2^31 + a[1]*2^30 + .... + a[15]*2^0
The mask should be 32-bit, so instead of 0x5, we use 0x55555555 = 0b0101010101...0101
n = 2^31*a[0] + ... + 2^1*a[30] + 2^0*a[31]
(n>>1)&0b0101..0101 = 2^30*a[0] + ... + 2^2*a[28] + 2^0*a[30]
-------------------------------------------------------------- -
n -(n>>1)&0x055555555 = 2^31*a[0] - 2^28*a[0] + (2^30*a[1] - 2^26*a[1]) + ... + 4*a[29] - 2*a[30] + a[31]
Let's review binary arithmetics.
23 - 22 = 8 - 4 = 4
216 - 215 = 65536 - 32768 = 32768or: 2(a+1) - 2a = 2a
2(a+2) - 2a = 2a * 22 - 2a = 2a (4 - 1) = 3*2a
So the result is:
a[0] * (2^31 - 2^28) + a[1] * (2^30 - 2^26) + ..... + a[30] (4 -2) + a[31]
= 3*2^28*a[0] + .3*2^26*a[1].. + 2*a[30] + a[31]
n - n(>>1) & 0x055555555 = 3*2^28*a[0] + .3*2^26*a[1].. + 2*a[30] + a[31]
stored this as a new n.
n>>2 = 2^24*a[0] + 2^22*a[1] + ...2*a[28] + a[29]
a[0] * (2^31 - 2^28) + a[1] * (2^30 - 2^26) + ..... + a[30] (4 -2) + a[31]
= 3*2^28*a[0] + .3*2^26*a[1].. + 2*a[30] + a[31]
n - n(>>1) & 0x055555555 = 3*2^28*a[0] + .3*2^26*a[1].. + 2*a[30] + a[31]
stored this as a new n.
n>>2 = 2^24*a[0] + 2^22*a[1] + ...2*a[28] + a[29]
The rest is actually manipulation to count this a[0] + a[1] + ... + a[31]
A variant, but this one is invented by folks at MIT:
int bitcount(unsigned int n)
{
int cnt = 0;
register unsigned int tmp;
tmp = n - ((n >> 1) & 033333333333) - ((n >> 2) & 011111111111);
return ((tmp + (tmp >>3)) & 0030707070707) % 63;
}
int bitcount(unsigned int n)
{
int cnt = 0;
register unsigned int tmp;
tmp = n - ((n >> 1) & 033333333333) - ((n >> 2) & 011111111111);
return ((tmp + (tmp >>3)) & 0030707070707) % 63;
}
It uses the similar method. but with different approach (notice the number 01..., 0333... and 00307... are in octal. We could use Hex version but then it is harder to remember)