Monday, October 13, 2014

Left shift by negative number used to test unique characters in a string

What is the outcome of this?

val=-32;
mask = 1<<val;

I tried to execute it on Python, it failed with error message.

But if I do it in C or C++, I get something else.  Turn out, this depends o the compiler and CPU we use.  As I use Pentium4 and GCC to compile this, the following snippet of the generated assembly language is interesting:


.L4:
        movsbl  (%edx), %ecx
        movl    %esi, %edi
        addl    $1, %edx
        subl    $97, %ecx
        sall    %cl, %edi
        testl   %eax, %edi
        jg      .L5
        orl     %edi, %eax

.L2:

What is "sall" above? It is Intel CPU's mnemonic for "Shift Arithmetic Left for Long Integer".

In the machine code, -32 is transformed to Second-complement of 32:

32        = 0b00100000
~32      = 0b11011111
~32+1 = 0b11100000 = -32

According Wikipedia,  arithmetic shift left maintains the MSBit, so in this case MSB is always 1. Also, Intel CPUs only shifts the first 5-bits of the shifting bits (in this case, 5-bit LSB of 0b11100000 which is 0), so 1<<-32 is actuall 1<<0 = 1.

What about 1<<-33?

33        = 0b00100001
~33      = 0b11011110
~32+1 = 0b11100001 = -33

The first five bits is 1, so 1<<-33 = 1<<1 = 2

What about 1<<-31?

31        = 0b00011111
~31      = 0b11100000
~31+1 = 0b11100001 = -31

The first five bits is 1, so 1<<-31 = 1<<1 = 0x00000010

What about 1<<-7?

7        = 0b00000111
~7      = 0b11111000
~7+1 = 0b11111001 = -7

The first five bits is 1, so 1<<-7 = 1<<25 = 0x02000000

Now, how do we use this to check a uniqueness of characters in a string? 
Alphabet 'A' to 'Z' has 26 characters. Alphabet 'a' to 'z' has also 26 characters.

The difference between the character i in a string is: str[i] - 'a'.  Now this is the interesting part: if we declare a 32-bit variable (which is enough to cover 'A' to 'Z') where each bit corresponds to the occurence of each character in alphabet, we can determine if the character has been used before and tell the string has unique characters or not (e.g, bit 1 to tell if character 'A' has been used or not, etc.)

'A' - 'a' = -32
...
...
'Z' - 'a' = -7
...
...
'a' - 'a' = 0
...
...
'Z' - 'a' = 25

if we shift 1 to -32, it is the same as 1<<0.  If we shift 1 to -33 it is the same as 1<<1, so on.

1<<-32 = 0x01
1<<-31 = 0x02
...
...
1<<-7 = 1<<25 = 0x2000000
1<<0 = 0x01
1<<1 = 0x02
...
...
1<<25 = 0x2000000

so:
 1<<('A'-'a') == 1<<('a'-'a') = 0x01
 1<<('B'-'a') == 1<<('b'-'a') = 0x02
...
...
1<<('Z'-'a') == 1<<('z'-'a') = 0x2000000

So the code is:

for (i=0; i<strlen(s); i++)
{
    val = s[i] - 'a';
    if (flags & (1 << val))
        return 0; /* the string is not unique */
    flags |= 1 << val;
}
return 1;


Notice this algorithm might not work on different machine with different compiler (as shifting by negative number is undefined in standard C).





The Use of c++decl

cdecl and c++decl are two nice tools to check syntax or to generate declaration of C and C++ respectively.

for example, using c++decl (the text in bold is the one we type):

$ cdecl

Type `help' or `?' for help
c++decl>

c++decl> declare foo as pointer to member of class X function (arg1, arg2) returning reference to class Y
class Y &(X::*foo)(arg1, arg2)
c++decl>

c++decl> declare foo as reference to class X
class X &foo
c++decl>

c++decl> cast foo into reference to class X
(class X &)foo
c++decl>


We can also use them to check a declaration syntax, for example:

c++decl> explain int (*a)[5]
declare a as pointer to array 5 of int
c++decl>

c++decl> explain long **foo[10]
declare foo as array 10 of pointer to pointer to long
c++decl>

c++decl> explain int*&(foo);
declare foo as reference to pointer to int
c++decl>

The function can also be used to translate an english declaration to C (or C++) declaration.  For example:

$ c++decl declare "i as array 5 of pointer to function(int,int) returning reference to class X"
class X &(*i[5])(int , int )
$

$ c++decl declare "i as array 5 of pointer to function(reference to class X,constant reference to int) returning reference to class X"
class X &(*i[5])(class X &, int & const )
$

$ c++decl declare "const_ptr as const pointer to const int"
const int * const const_ptr

$ c++decl declare "const_ptr as const pointer to const reference to class X"
class X & const * const const_ptr
$

c++decl> declare myfunc as const pointer to function (const pointer to int, reference to class X) returning void
void (* const myfunc)(int * const , class X &)
c++decl>

c++decl> declare myfunctpr as array of pointer to function (void) returning int
int (*myfunctpr[])(void )
c++decl>

declare myptr as pointer to array 10 of struct MyStruct
struct MyStruct (*myptr)[10]
c++decl>