My Stuff: Algorithm

Showing posts with label Algorithm. Show all posts

Tuesday, January 15, 2019

Fast String Reversal

#include <iostream>
#include <cstring>

template<typename T>
void swap(T &c1, T &c2)
{
    // buffer-less swap
    c1 ^= c2;
    c2 ^= c1;
    c1 ^= c2;

}



void reverse(char *s)
{
    if (s == nullptr || *s == '\0')
        return;

    for(auto rear = s + strlen(s) - 1; rear > s; ++s, --rear)
    {
        swap(*s, *rear);
    }
}



int main()
{
    char s[] = "The quick brown fox jumps over the lazy dog";

    std::cout << "Original string: " << s << std::endl;
    reverse(s);

    std::cout << "Reversed string: " << s << std::endl;
    reverse(s);

    std::cout << "2-Reversed string: " << s << std::endl;
    return 0;
}

Tuesday, May 23, 2017

Suppose we have 16-bit number where we want to reverse some bits between bits i'th and j'th. To illustrate that, say we have a 16-bit value stored in v:

|<-------- L = sizeof(short)*CHAR_BIT --------->|
15 7 1 0
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| | | | x| x| x| x| x| x| x| | | | | | |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
^ ^
| |
i=12 j=6
|<------>|
L - i - 1

So, we want to reverse bits between bits 6th to 12th.

Assume i is always greater than j (i > j):
Number of bits to reverse: n = (i - j) + 1

Mask to use: (2^n - 1) << j

Or, in the case above:

n = (12 - 6) + 1 = 7
mask = (2^7 - 1) << 6 = 127 << 6 = 0x1FC0 (0001111111000000 in binary)

Then we mask the bits stored in v and shift it back to 0 (to do reverse operation):

r = (v & mask) >> j;

r =
15 7 1 0
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| 0| 0| 0| 0| 0| 0| 0| 0| 0| x| x| x| x| x| x| x|
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+

We use bit reversal operation as posted previously on r:

r = bit_reversal(r);

After this call, the bits are reversed, but filled in from MSB. We need to put them to the position originally started.

15 7 1 0
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| x| x| x| x| x| x| x| | | | | | | | | |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|-------> shr 3

<========= sizeof(unsigned int)* CHAR_BIT ======>

m = sizeof(short) * CHAR_BIT - n - j
m = 2*8 - 7 -6 = 16 - 13 = 3

Final result is:

(v & !mask) | ((r >> m) & mask);

Bit Reversal - Explained

The page Bit Twiddling Hacks shows how to reverse bits in an integer (e.g, reverse LSB with MSB bits or reading bits from the opposite direction), but seems it doesn't explain anything why that works.

Here I try to explain it step by step. Let start with a 8-bit char first. Assume each letter in ABCDEFGH represents each bits in the char (e.g, A is for MSB etc.).

x = ABCDEFGH

ABCDEFGH
10101010 (or 0xAA)
-------- &
A0C0E0G0 ---- >> 1 : 0A0C0E0G

ABCDEFGH
01010101 (or 0x55)
-------- &
0B0D0F0H ---- << 1 : B0D0F0H0

0A0C0E0G
B0D0F0H0
-------- |
BADCFEHG ---> stored as a new x, then we do next operation

BADCFEHG
11001100 (or 0xCC)
--------- &
BA00FE00 ---- >> 2: 00BA00FE

BADCFEHG
00110011 (or 0x33)
--------- &
00DC00HG ---- << 2: DC00HG00

00BA00FE
DC00HG00
-------- |
DCBAHGFE

Last, we operate the similar thing on the high and low nibbes:

DCBAHGFE
11110000 (or 0xF0)
-------- &
DCBA0000 ==> >> 4 = 0000DCBA

DCBAHGFE
00001111 (or 0x0F)
-------- &
0000HGFE ==> << 4 = HGFE0000

0000DCBA
HGFE0000
--------|
HGFEDCBA

Finally, we now get HGFEDCBA which is the reversed bits of ABCDEFGH! To summarize, to reverse bits in an octet, we do operations:

x = (x & 0xAA >> 1) | (x & 0x55 << 1);
x = (x & 0xCC >> 2) | (x & 0x33 << 2);
x = (x & 0xF0 >> 4) | (x & 0x0F >> 4);

For operation on 32-bit data, we do the similar one, except we do with 32-bit mask (instead of 10101010 binary or 0xAA, we mask with 0xAAAAAAAA) plus extra two pair of masks (0xF0F0F0F0 and 0x0F0F0F0F, 0xFF00FF00 and 0x00FF00FF) with shift left/right 4, 8 and 16 bit.

So, bit-reversal computation or algorithm for 32-bit integer:

unsigned int reverse_bits(unsigned int x)
{
x = ((x & 0xA0A0A0A0) >> 1) | ((x & 0x05050505) << 1);
x = ((x & 0xcccccccc) >> 2) | ((x & 0x33333333) << 2);

x = ((x & 0x0F0F0F0F) >> 4) | ((x & 0xF0F0F0F0) << 4);
x = ((x & 0xFF00FF00) >> 8) | ((x & 0x00FF00FF) << 8);
return (x >> 16) | (x << 16);

}

Sunday, March 11, 2012

Fibonacci in Python

# Fibonacci numbers module

def fib(n):    # write Fibonacci series up to n
    a, b = 0, 1
    while b < n:
        #print b,
        a, b = b, a+b

def fib2(n): # return Fibonacci series up to n
    result = []
    a, b = 0, 1
    while b < n:
        result.append(b)
        a, b = b, a+b
    print b
    return result
    
# make it executable as a script as well as module_exit
if __name__ == "__main__":
    import sys
    if (len(sys.argv) > 1):
        f = fib2(int(sys.argv[1]))
        str = ""
        print "Fibonacci sequence: ", f
    else:
        print sys.argv[0], " "

Friday, March 9, 2012

Counting the number of "1" bits

Another "ridiculous" interview I had was when an interviewer asked me how to count the number of set bits in a 32-bit or 8-bit. I told him that there are various algorithms to do this, but the easiest way but yet good enough in performance is a function like this:

uint32_t count_bits(uint32_t b)
{
    int i;
    int n = 0;
    
    for(i=0; i<sizeof(b)*8; i++)
    {
        n += (b & 1);
        b >>= 1;
    }   
    return n;
}

I also told him that there are many ways to do this (referred him to look at "Beautiful Code" to see some of the beautiful algorithms).

But then the interviewer argued. FIrst, he said the upper border of the for-loop should be to "sizeof(b)*8" (meaning, instead of using "<", I should have used "<="). To bad, I said yes to him without thinking further (Probably got nervous. Later at home, I checked that he was wrong). Second, he said that the fastest is to use look-up table (n[0] = 0, n[1]=1, n[2]=1,...,n[255]=8,...). Although I agreed with his argument, but then he become inconsistent with his original question and there is a big downside of his argument. I said that this wouldn't be a choice in embedded programming where memory resource is limited. Secondly, his original question was to compute ones in a 32-bit number. That means we have to 2^32 (4,294,967,296 or about 4 GBytes) items set manually in an array. That's so ridiculous. Who wants to waste that much space just to calculate the number of bits in an integer? For a byte, although it seems acceptable ("only" 256 bytes of table), but compare that to the above algorithm:

movl $32,  %edx
 xorl %ecx, %ecx

.L2:
 movl %ecx, %ebx
 shrl %ecx
 andl $1,   %ebx
 addl %ebx, %eax
 decl %edx
 jne .L2

It's only 7 x86 mnemonic instructions (about 12-18 bytes) and no memory involvement in the loop!. Yes, it is O(32) for 32-bit, but it's a constant speed. Even further, if the function is called only once, the fully optimized version of the instructions would be inserted inline in the place (without calling the function), so we can CPU time from doing push-pop stack frames. That's another tiny speedup. It's probably fine for table lookup (yes, it's the fastest, O(1)), but it is limited to 8-bit. For 16 or higher, seems the space-time trade-off is too big not to consider. Besides, the above algorithm extendable into polymorphism in Object-Oriented code.

When I searched Google, I found the following (on Stanford's web):

v = v - ((v >> 1) & 0x55555555);                    // reuse input as temporary
v = (v & 0x33333333) + ((v >> 2) & 0x33333333);     // temp
c = ((v + (v >> 4) & 0xF0F0F0F) * 0x1010101) >> 24; // count

Which renders to instruction codes like below:

shrl %edx
 andl $1431655765, %edx
 subl %edx, %eax
 movl %eax, %edx
 shrl $2, %eax
 andl $858993459, %edx
 andl $858993459, %eax
 addl %edx, %eax
 movl %eax, %edx
 shrl $4, %edx
 leal (%edx,%eax), %eax
 andl $252645135, %eax
 imull $16843009, %eax, %eax
 shrl $24, %eax

(14 mnemonics). This seems the fastest and most optimized. It took some time for me to understand it.

Seems the interviewer doesn't have enough experience in embedded or microcontroller or doesn't think much about optimization. Nevertheless, I failed an interview again. I start to think that many interview questions are really not to dig the thinking ability of a candidate, but rather to get instant answers meet their mind. I just hope big software companies like IBM, Microsoft, Google or Apple shouldn't fall to this kind of trap, but rather ask logical questions to know if the candidate can think systematically.

What a waste!