My Stuff: 03/01/2012

Sunday, March 25, 2012

Undirected Graph

The following small program illustrate how to implement a undirected graph with costs to neighbor nodes.

#ifndef _GRAPH_HPP_
#define _GRAPH_HPP_

#include <iostream>
#include <map>
#include <list>

using namespace std;

/*
 * Undirected graph class
 */
class Graph {
public:
 Graph(unsigned int id=0) { m_id = id; m_Ecount = 0; }
 virtual ~Graph();
 void SetId(unsigned int id) { m_id = id; }
 bool Match(Graph& g);
 bool AddAdjacent(Graph* g, int cost=0);
 bool AddAdjacentOne(Graph *g);
 list GetAdjacents() { return m_Adjacents; }
 
 int GetVCount() { return m_Adjacents.size(); }
 int GetECount() { return m_Ecount; }
 unsigned int GetId() { return m_id; }
 void SetCostTo(Graph *g, int cost);
 int GetCostTo(Graph* g);
 
 private:
 unsigned int m_id;
 int m_Ecount;
 map<Graph*,int> m_cost;
 list<Graph*> m_Adjacents;
};


#endif

graph.pp:

#include "graph.hpp"

using namespace std;

Graph::~Graph()
{
 // TODO:
 m_Adjacents.clear();
}


bool Graph::Match(Graph& g)
{
 cout << "this=" << this << endl;
 cout << "&g =" << &g << endl;
 if ((this == &g) || (g.GetId() == this->GetId()) &&
  (g.GetECount() == this->GetECount()) &&
  (g.GetVCount() == this->GetVCount()) )
  return true;
 return false;
}

bool Graph::AddAdjacentOne(Graph *g)
{
 m_Adjacents.push_back(g);
 //TODO: we need to tell g to add also edge to this object
 m_Ecount++;
 return true;
}


void Graph::SetCostTo(Graph *g, int cost)
{
 if (g == NULL)
  return;
 // Use hash-table to set the cost from this node to g
 m_cost[g] = cost;
}

int Graph::GetCostTo(Graph *g)
{
 if (g == NULL)
  return -1;
 // use hash-table to get the cost from this node to g
 return m_cost[g];
}

bool Graph::AddAdjacent(Graph* g, int cost)
{
 if (!g) return false;

 if (Match(*g))
 {
  cout << "ERROR: Tried to add itself" << endl;
  return false;
 }
 else {
  AddAdjacentOne(g);
  // tell other node to set its adjacent to this node too
  g->AddAdjacentOne(this);
  // add cost from this object to g
  g->SetCostTo(this, cost);
  SetCostTo(g, cost);
  return true;
 }
}



int main()
{
 Graph g[10];
 int i;
 int numOfVertices = 4;

 for(i=0; i<numOfVertices; i++)
  g[i].SetId(i);

 // g0 --- g1
 g[0].AddAdjacent(&g[1]);
 //g1 --- g2
 g[1].AddAdjacent(&g[2]);
 // g2 --- g3
 g[2].AddAdjacent(&g[3]);
 // g3 --- g0
 g[3].AddAdjacent(&g[0]);
 // g0 -- g2
 g[0].AddAdjacent(&g[2], 5);

 list<Graph*>::iterator it;
 list<Graph*> adjacents;

 for(i=0; i<numOfVertices; i++)
 {
  cout << "NODE g[" << i << "]" << endl;
  cout << "\tNumber of Edges in G[" << i << "] = " << g[i].GetECount() << endl;

  adjacents = g[i].GetAdjacents();
  cout << "\tNeighbors:" << endl;
  for ( it=adjacents.begin() ; it != adjacents.end(); it++ )
   cout << "\t\tNode=" << *it << ", g[" << (*it)->GetId() << "], cost="
     << g[i].GetCostTo(*it) << endl;
 }
}

Friday, March 23, 2012

MIVO.TV Network Analysis

While my browser was playing the video streaming (tru flash-plugin) from mivo.tv, a "netstat -t" revealed high traffic as below:

tcp 0 0 192.168.0.29:56448 68.68.29.24.:macromedia-fcs ESTABLISHED

After googling this "macromedia-fcs" I found:

TCP & UDP Port 1935 (Macromedia-fcs)

Technical description for port 1935:

The RTMP (Real Time Messaging Protocol) service associated with the computer port 1935 is a plain protocol utilized by the Adobe Macromedia Flash application. This proprietary service allows for the streaming of data, video and audio using an active Internet connection among a Flash Server and its associated player. This port supports the three variations of the RTMP service.

The communication port 1935 along with the TCP protocol is used for the delivery of encapsulated RTMPT that traverses firewall applications using HTTPS (secured) connections. The design of this protocol was intended to provide a persistent service for the Flash technology along with other programs like Adobe LiveCycle Data Services ES.

The raw TCP RTMP protocol uses one persistent connection for allowing the implementation of real time communication and smooth delivery of multimedia contents.

The protocol running on the port 1935 achieves this quality of transmission without affecting its ability to send bigger chunks of information including the fragmentation of data.

Digging deeper to the HTML code, I found a port in Javascript like this:

<object classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000"
<codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=9,0,0,0"
width="610" height="878" id="MivoTV" align="top">'
<param name="wmode" value="transparent">
<param name="allowScriptAccess" value="always" />
<param name="allowFullScreen" value="true" />
<param name="SeamlessTabbing" value="false"/>
<param name="movie" value="http://mivotv-static.com/swf/MivoTV.swf?r=99123"/>
<param name="quality" value="high" />
<param name="scale" value="noscale" />
<param name="menu" value="true" />
<param name="devicefont" value="false" />
<param name="bgcolor" value="#ffffff" />
<param name="name" value="MivoTV" />
<embed src="http://mivotv-static.com/swf/MivoTV.swf?r=99999" quality="high" scale="noscale" bgcolor="#ffffff" width="610" height="878" name="MivoTV" align="top" allowScriptAccess="always" allowFullScreen="true" type="application/x-shockwave-flash" pluginspage="http://www.macromedia.com/go/getflashplayer" wmode="transparent" menu="true" devicefont="false" />

But why it doesn't work if I open the URL directly and put some random number at the end [0..99999]?


 
</object></pre>

Hash template in C++

 

#include <map>
#include <iostream>
#include <cstring>

using namespace std;

struct eqstr {
    bool operator() (const char *s1, const char *s2) const {
   return (::strcmp(s1, s2) < 0 ? true : false);
 }
};


int main()
{
 // template 
    map <const char *, int, eqstr> months;

    months["january"] = 31;
    months["february"] = 28;
    months["march"] = 31;
    months["april"] = 30;
    months["may"] = 31;
    months["june"] = 30;
    months["july"] = 31;
    months["august"] = 31;
    months["september"] = 30;
    months["october"] = 31;
    months["november"] = 30;
    months["december"] = 31;

    cout << "february -> " << months["february"] << endl;
    cout << "september -> " << months["september"] << endl;
    cout << "april     -> " << months["april"] << endl;
    cout << "july      -> " << months["july"] << endl;
    cout << "november  -> " << months["november"] << endl;

 for(map::iterator it=months.begin();
     it != months.end(); it++)
 {
  cout << (*it).first << " => " << (*it).second << endl;
 }
}

Wednesday, March 21, 2012

ELF File Format

The C++ source code below:

#include <cstdio>
#include <iostream>

using namespace std;

class CParent {
public:
 CParent() {
  ref++;
  cout <<  "class CParent: constructor ref=" << ref << endl;
  Function();
 };
 virtual ~CParent() {
  ref--;
  cout << "class CParent: destructor ref=" << ref << endl;
 }
 virtual int Function() {
  cout << "\tA::Function()" << endl;
  return ref;
 }
 static unsigned int ref;
};


class CChild : public CParent {
public:
 CChild() {
  ref++;
  cout << "\tclass CChild: constructor ref=" << ref << endl;
  m_a = new CParent();
 };
 virtual ~CChild() {
  ref--;
  cout << "\tclass CChild: destructor ref=" << ref << endl;
  delete m_a;
 }

 static unsigned int ref;
private:
 CParent *m_a;
};


unsigned int CParent::ref = 0;
unsigned int CChild::ref = 0;

int main()
{
 int i;
 int n = 2;

 CChild *c = new CChild[n];
 for(i=0; i(&c[i]);
  printf("p[%d].Function()=%d\n", i, p->Function());
 }
 cout << "CChild::Ref=" << CChild::ref << endl;
 delete [] c;
}

A in-depth view of the generated ELF file format from the above C++ code is as below:

[buhadram@localhost src]$ readelf -a virt
ELF Header:
  Magic:   7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00 
  Class:                             ELF32
  Data:                              2's complement, little endian
  Version:                           1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       0
  Type:                              EXEC (Executable file)
  Machine:                           Intel 80386
  Version:                           0x1
  Entry point address:               0x80487d0
  Start of program headers:          52 (bytes into file)
  Start of section headers:          5192 (bytes into file)
  Flags:                             0x0
  Size of this header:               52 (bytes)
  Size of program headers:           32 (bytes)
  Number of program headers:         8
  Size of section headers:           40 (bytes)
  Number of section headers:         31
  Section header string table index: 28

Section Headers:
  [Nr] Name              Type            Addr     Off    Size   ES Flg Lk Inf Al
  [ 0]                   NULL            00000000 000000 000000 00      0   0  0
  [ 1] .interp           PROGBITS        08048134 000134 000013 00   A  0   0  1
  [ 2] .note.ABI-tag     NOTE            08048148 000148 000020 00   A  0   0  4
  [ 3] .note.gnu.build-i NOTE            08048168 000168 000024 00   A  0   0  4
  [ 4] .gnu.hash         GNU_HASH        0804818c 00018c 000040 04   A  5   0  4
  [ 5] .dynsym           DYNSYM          080481cc 0001cc 000160 10   A  6   1  4
  [ 6] .dynstr           STRTAB          0804832c 00032c 000214 00   A  0   0  1
  [ 7] .gnu.version      VERSYM          08048540 000540 00002c 02   A  5   0  2
  [ 8] .gnu.version_r    VERNEED         0804856c 00056c 000080 00   A  6   3  4
  [ 9] .rel.dyn          REL             080485ec 0005ec 000020 08   A  5   0  4
  [10] .rel.plt          REL             0804860c 00060c 000080 08   A  5  12  4
  [11] .init             PROGBITS        0804868c 00068c 000030 00  AX  0   0  4
  [12] .plt              PROGBITS        080486bc 0006bc 000110 04  AX  0   0  4
  [13] .text             PROGBITS        080487d0 0007d0 0005dc 00  AX  0   0 16
  [14] .fini             PROGBITS        08048dac 000dac 00001c 00  AX  0   0  4
  [15] .rodata           PROGBITS        08048dc8 000dc8 000114 00   A  0   0  8
  [16] .eh_frame_hdr     PROGBITS        08048edc 000edc 000074 00   A  0   0  4
  [17] .eh_frame         PROGBITS        08048f50 000f50 00022c 00   A  0   0  4
  [18] .gcc_except_table PROGBITS        0804917c 00117c 000041 00   A  0   0  1
  [19] .ctors            PROGBITS        0804a1c0 0011c0 00000c 00  WA  0   0  4
  [20] .dtors            PROGBITS        0804a1cc 0011cc 000008 00  WA  0   0  4
  [21] .jcr              PROGBITS        0804a1d4 0011d4 000004 00  WA  0   0  4
  [22] .dynamic          DYNAMIC         0804a1d8 0011d8 0000e0 08  WA  6   0  4
  [23] .got              PROGBITS        0804a2b8 0012b8 000004 04  WA  0   0  4
  [24] .got.plt          PROGBITS        0804a2bc 0012bc 00004c 04  WA  0   0  4
  [25] .data             PROGBITS        0804a308 001308 000004 00  WA  0   0  4
  [26] .bss              NOBITS          0804a320 00130c 000120 00  WA  0   0 32
  [27] .comment          PROGBITS        00000000 00130c 00002c 01  MS  0   0  1
  [28] .shstrtab         STRTAB          00000000 001338 00010e 00      0   0  1
  [29] .symtab           SYMTAB          00000000 001920 000680 10     30  49  4
  [30] .strtab           STRTAB          00000000 001fa0 0005a2 00      0   0  1
Key to Flags:
  W (write), A (alloc), X (execute), M (merge), S (strings)
  I (info), L (link order), G (group), x (unknown)
  O (extra OS processing required) o (OS specific), p (processor specific)

There are no section groups in this file.

Program Headers:
  Type           Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align
  PHDR           0x000034 0x08048034 0x08048034 0x00100 0x00100 R E 0x4
  INTERP         0x000134 0x08048134 0x08048134 0x00013 0x00013 R   0x1
      [Requesting program interpreter: /lib/ld-linux.so.2]
  LOAD           0x000000 0x08048000 0x08048000 0x011bd 0x011bd R E 0x1000
  LOAD           0x0011c0 0x0804a1c0 0x0804a1c0 0x0014c 0x00280 RW  0x1000
  DYNAMIC        0x0011d8 0x0804a1d8 0x0804a1d8 0x000e0 0x000e0 RW  0x4
  NOTE           0x000148 0x08048148 0x08048148 0x00044 0x00044 R   0x4
  GNU_EH_FRAME   0x000edc 0x08048edc 0x08048edc 0x00074 0x00074 R   0x4
  GNU_STACK      0x000000 0x00000000 0x00000000 0x00000 0x00000 RW  0x4

 Section to Segment mapping:
  Segment Sections...
   00     
   01     .interp 
   02     .interp .note.ABI-tag .note.gnu.build-id .gnu.hash .dynsym .dynstr .gnu.version .gnu.version_r .rel.dyn .rel.plt .init .plt .text .fini .rodata .eh_frame_hdr .eh_frame .gcc_except_table 
   03     .ctors .dtors .jcr .dynamic .got .got.plt .data .bss 
   04     .dynamic 
   05     .note.ABI-tag .note.gnu.build-id 
   06     .eh_frame_hdr 
   07     

Dynamic section at offset 0x11d8 contains 23 entries:
  Tag        Type                         Name/Value
 0x00000001 (NEEDED)                     Shared library: [libstdc++.so.6]
 0x00000001 (NEEDED)                     Shared library: [libm.so.6]
 0x00000001 (NEEDED)                     Shared library: [libgcc_s.so.1]
 0x00000001 (NEEDED)                     Shared library: [libc.so.6]
 0x0000000c (INIT)                       0x804868c
 0x0000000d (FINI)                       0x8048dac
 0x6ffffef5 (GNU_HASH)                   0x804818c
 0x00000005 (STRTAB)                     0x804832c
 0x00000006 (SYMTAB)                     0x80481cc
 0x0000000a (STRSZ)                      532 (bytes)
 0x0000000b (SYMENT)                     16 (bytes)
 0x00000015 (DEBUG)                      0x0
 0x00000003 (PLTGOT)                     0x804a2bc
 0x00000002 (PLTRELSZ)                   128 (bytes)
 0x00000014 (PLTREL)                     REL
 0x00000017 (JMPREL)                     0x804860c
 0x00000011 (REL)                        0x80485ec
 0x00000012 (RELSZ)                      32 (bytes)
 0x00000013 (RELENT)                     8 (bytes)
 0x6ffffffe (VERNEED)                    0x804856c
 0x6fffffff (VERNEEDNUM)                 3
 0x6ffffff0 (VERSYM)                     0x8048540
 0x00000000 (NULL)                       0x0

Relocation section '.rel.dyn' at offset 0x5ec contains 4 entries:
 Offset     Info    Type            Sym.Value  Sym. Name
0804a2b8  00000206 R_386_GLOB_DAT    00000000   __gmon_start__
0804a320  00000f05 R_386_COPY        0804a320   _ZTVN10__cxxabiv117__c
0804a360  00001405 R_386_COPY        0804a360   _ZSt4cout
0804a400  00001005 R_386_COPY        0804a400   _ZTVN10__cxxabiv120__s

Relocation section '.rel.plt' at offset 0x60c contains 16 entries:
 Offset     Info    Type            Sym.Value  Sym. Name
0804a2c8  00000107 R_386_JUMP_SLOT   00000000   __cxa_atexit
0804a2cc  00000207 R_386_JUMP_SLOT   00000000   __gmon_start__
0804a2d0  00000407 R_386_JUMP_SLOT   00000000   _ZdlPv
0804a2d4  00000507 R_386_JUMP_SLOT   00000000   _ZNSt8ios_base4InitC1E
0804a2d8  00000607 R_386_JUMP_SLOT   00000000   __libc_start_main
0804a2dc  00001307 R_386_JUMP_SLOT   0804871c   _ZNSt8ios_base4InitD1E
0804a2e0  00000707 R_386_JUMP_SLOT   00000000   _ZStlsISt11char_traits
0804a2e4  00000807 R_386_JUMP_SLOT   00000000   printf
0804a2e8  00000907 R_386_JUMP_SLOT   00000000   _ZNSolsEj
0804a2ec  00000a07 R_386_JUMP_SLOT   00000000   _Znwj
0804a2f0  00000b07 R_386_JUMP_SLOT   00000000   _Znaj
0804a2f4  00000c07 R_386_JUMP_SLOT   00000000   _ZdaPv
0804a2f8  00000d07 R_386_JUMP_SLOT   00000000   _ZNSolsEPFRSoS_E
0804a2fc  00001207 R_386_JUMP_SLOT   0804879c   _ZSt4endlIcSt11char_tr
0804a300  00001507 R_386_JUMP_SLOT   080487ac   __gxx_personality_v0
0804a304  00000e07 R_386_JUMP_SLOT   00000000   _Unwind_Resume

There are no unwind sections in this file.

Symbol table '.dynsym' contains 22 entries:
   Num:    Value  Size Type    Bind   Vis      Ndx Name
     0: 00000000     0 NOTYPE  LOCAL  DEFAULT  UND 
     1: 00000000     0 FUNC    GLOBAL DEFAULT  UND __cxa_atexit@GLIBC_2.1.3 (2)
     2: 00000000     0 NOTYPE  WEAK   DEFAULT  UND __gmon_start__
     3: 00000000     0 NOTYPE  WEAK   DEFAULT  UND _Jv_RegisterClasses
     4: 00000000     0 FUNC    GLOBAL DEFAULT  UND _ZdlPv@GLIBCXX_3.4 (3)
     5: 00000000     0 FUNC    GLOBAL DEFAULT  UND _ZNSt8ios_base4InitC1Ev@GLIBCXX_3.4 (3)
     6: 00000000     0 FUNC    GLOBAL DEFAULT  UND __libc_start_main@GLIBC_2.0 (4)
     7: 00000000     0 FUNC    GLOBAL DEFAULT  UND _ZStlsISt11char_traitsIcE@GLIBCXX_3.4 (3)
     8: 00000000     0 FUNC    GLOBAL DEFAULT  UND printf@GLIBC_2.0 (4)
     9: 00000000     0 FUNC    GLOBAL DEFAULT  UND _ZNSolsEj@GLIBCXX_3.4 (3)
    10: 00000000     0 FUNC    GLOBAL DEFAULT  UND _Znwj@GLIBCXX_3.4 (3)
    11: 00000000     0 FUNC    GLOBAL DEFAULT  UND _Znaj@GLIBCXX_3.4 (3)
    12: 00000000     0 FUNC    GLOBAL DEFAULT  UND _ZdaPv@GLIBCXX_3.4 (3)
    13: 00000000     0 FUNC    GLOBAL DEFAULT  UND _ZNSolsEPFRSoS_E@GLIBCXX_3.4 (3)
    14: 00000000     0 FUNC    GLOBAL DEFAULT  UND _Unwind_Resume@GCC_3.0 (6)
    15: 0804a320    44 OBJECT  WEAK   DEFAULT   26 _ZTVN10__cxxabiv117__clas@CXXABI_1.3 (5)
    16: 0804a400    44 OBJECT  WEAK   DEFAULT   26 _ZTVN10__cxxabiv120__si_c@CXXABI_1.3 (5)
    17: 08048dcc     4 OBJECT  GLOBAL DEFAULT   15 _IO_stdin_used
    18: 0804879c     0 FUNC    GLOBAL DEFAULT  UND _ZSt4endlIcSt11char_trait@GLIBCXX_3.4 (3)
    19: 0804871c     0 FUNC    GLOBAL DEFAULT  UND _ZNSt8ios_base4InitD1Ev@GLIBCXX_3.4 (3)
    20: 0804a360   140 OBJECT  GLOBAL DEFAULT   26 _ZSt4cout@GLIBCXX_3.4 (3)
    21: 080487ac     0 FUNC    GLOBAL DEFAULT  UND __gxx_personality_v0@CXXABI_1.3 (5)

Symbol table '.symtab' contains 104 entries:
   Num:    Value  Size Type    Bind   Vis      Ndx Name
     0: 00000000     0 NOTYPE  LOCAL  DEFAULT  UND 
     1: 08048134     0 SECTION LOCAL  DEFAULT    1 
     2: 08048148     0 SECTION LOCAL  DEFAULT    2 
     3: 08048168     0 SECTION LOCAL  DEFAULT    3 
     4: 0804818c     0 SECTION LOCAL  DEFAULT    4 
     5: 080481cc     0 SECTION LOCAL  DEFAULT    5 
     6: 0804832c     0 SECTION LOCAL  DEFAULT    6 
     7: 08048540     0 SECTION LOCAL  DEFAULT    7 
     8: 0804856c     0 SECTION LOCAL  DEFAULT    8 
     9: 080485ec     0 SECTION LOCAL  DEFAULT    9 
    10: 0804860c     0 SECTION LOCAL  DEFAULT   10 
    11: 0804868c     0 SECTION LOCAL  DEFAULT   11 
    12: 080486bc     0 SECTION LOCAL  DEFAULT   12 
    13: 080487d0     0 SECTION LOCAL  DEFAULT   13 
    14: 08048dac     0 SECTION LOCAL  DEFAULT   14 
    15: 08048dc8     0 SECTION LOCAL  DEFAULT   15 
    16: 08048edc     0 SECTION LOCAL  DEFAULT   16 
    17: 08048f50     0 SECTION LOCAL  DEFAULT   17 
    18: 0804917c     0 SECTION LOCAL  DEFAULT   18 
    19: 0804a1c0     0 SECTION LOCAL  DEFAULT   19 
    20: 0804a1cc     0 SECTION LOCAL  DEFAULT   20 
    21: 0804a1d4     0 SECTION LOCAL  DEFAULT   21 
    22: 0804a1d8     0 SECTION LOCAL  DEFAULT   22 
    23: 0804a2b8     0 SECTION LOCAL  DEFAULT   23 
    24: 0804a2bc     0 SECTION LOCAL  DEFAULT   24 
    25: 0804a308     0 SECTION LOCAL  DEFAULT   25 
    26: 0804a320     0 SECTION LOCAL  DEFAULT   26 
    27: 00000000     0 SECTION LOCAL  DEFAULT   27 
    28: 00000000     0 FILE    LOCAL  DEFAULT  ABS crtstuff.c
    29: 0804a1c0     0 OBJECT  LOCAL  DEFAULT   19 __CTOR_LIST__
    30: 0804a1cc     0 OBJECT  LOCAL  DEFAULT   20 __DTOR_LIST__
    31: 0804a1d4     0 OBJECT  LOCAL  DEFAULT   21 __JCR_LIST__
    32: 08048800     0 FUNC    LOCAL  DEFAULT   13 __do_global_dtors_aux
    33: 0804a42c     1 OBJECT  LOCAL  DEFAULT   26 completed.5530
    34: 0804a430     4 OBJECT  LOCAL  DEFAULT   26 dtor_idx.5532
    35: 08048860     0 FUNC    LOCAL  DEFAULT   13 frame_dummy
    36: 00000000     0 FILE    LOCAL  DEFAULT  ABS crtstuff.c
    37: 0804a1c8     0 OBJECT  LOCAL  DEFAULT   19 __CTOR_END__
    38: 08049178     0 OBJECT  LOCAL  DEFAULT   17 __FRAME_END__
    39: 0804a1d4     0 OBJECT  LOCAL  DEFAULT   21 __JCR_END__
    40: 08048d80     0 FUNC    LOCAL  DEFAULT   13 __do_global_ctors_aux
    41: 00000000     0 FILE    LOCAL  DEFAULT  ABS virt.cpp
    42: 0804a43c     1 OBJECT  LOCAL  DEFAULT   26 _ZStL8__ioinit
    43: 08048a08    64 FUNC    LOCAL  DEFAULT   13 _Z41__static_initializati
    44: 08048a48    28 FUNC    LOCAL  DEFAULT   13 _GLOBAL__I__ZN7CParent3re
    45: 0804a2bc     0 OBJECT  LOCAL  DEFAULT   24 _GLOBAL_OFFSET_TABLE_
    46: 0804a1bd     0 NOTYPE  LOCAL  DEFAULT   19 __init_array_end
    47: 0804a1bd     0 NOTYPE  LOCAL  DEFAULT   19 __init_array_start
    48: 0804a1d8     0 OBJECT  LOCAL  DEFAULT   22 _DYNAMIC
    49: 0804a308     0 NOTYPE  WEAK   DEFAULT   25 data_start
    50: 00000000     0 FUNC    GLOBAL DEFAULT  UND __cxa_atexit@@GLIBC_2.1.3
    51: 08048d70     5 FUNC    GLOBAL DEFAULT   13 __libc_csu_fini
    52: 080487d0     0 FUNC    GLOBAL DEFAULT   13 _start
    53: 08048ebc    12 OBJECT  WEAK   DEFAULT   15 _ZTI6CChild
    54: 00000000     0 NOTYPE  WEAK   DEFAULT  UND __gmon_start__
    55: 00000000     0 NOTYPE  WEAK   DEFAULT  UND _Jv_RegisterClasses
    56: 08048dc8     4 OBJECT  GLOBAL DEFAULT   15 _fp_hw
    57: 00000000     0 FUNC    GLOBAL DEFAULT  UND _ZdlPv@@GLIBCXX_3.4
    58: 0804a434     4 OBJECT  GLOBAL DEFAULT   26 _ZN7CParent3refE
    59: 08048dac     0 FUNC    GLOBAL DEFAULT   14 _fini
    60: 00000000     0 FUNC    GLOBAL DEFAULT  UND _ZNSt8ios_base4InitC1Ev@@
    61: 00000000     0 FUNC    GLOBAL DEFAULT  UND __libc_start_main@@GLIBC_
    62: 08048e88    20 OBJECT  WEAK   DEFAULT   15 _ZTV6CChild
    63: 08048b56    49 FUNC    WEAK   DEFAULT   13 _ZN7CParent8FunctionEv
    64: 08048cea    30 FUNC    WEAK   DEFAULT   13 _ZN6CChildD0Ev
    65: 08048b88   171 FUNC    WEAK   DEFAULT   13 _ZN6CChildC2Ev
    66: 0804871c     0 FUNC    GLOBAL DEFAULT  UND _ZNSt8ios_base4InitD1Ev@@
    67: 00000000     0 FUNC    GLOBAL DEFAULT  UND _ZStlsISt11char_traitsIcE
    68: 08048dcc     4 OBJECT  GLOBAL DEFAULT   15 _IO_stdin_used
    69: 0804a308     0 NOTYPE  GLOBAL DEFAULT   25 __data_start
    70: 0804a320    44 OBJECT  WEAK   DEFAULT   26 _ZTVN10__cxxabiv117__clas
    71: 0804a360   140 OBJECT  GLOBAL DEFAULT   26 _ZSt4cout@@GLIBCXX_3.4
    72: 08048dd0     0 OBJECT  GLOBAL HIDDEN    15 __dso_handle
    73: 0804a1d0     0 OBJECT  GLOBAL HIDDEN    20 __DTOR_END__
    74: 08048d10    90 FUNC    GLOBAL DEFAULT   13 __libc_csu_init
    75: 00000000     0 FUNC    GLOBAL DEFAULT  UND printf@@GLIBC_2.0
    76: 08048a64   100 FUNC    WEAK   DEFAULT   13 _ZN7CParentC2Ev
    77: 00000000     0 FUNC    GLOBAL DEFAULT  UND _ZNSolsEj@@GLIBCXX_3.4
    78: 08048ac8   112 FUNC    WEAK   DEFAULT   13 _ZN7CParentD1Ev
    79: 00000000     0 FUNC    GLOBAL DEFAULT  UND _Znwj@@GLIBCXX_3.4
    80: 00000000     0 FUNC    GLOBAL DEFAULT  UND _Znaj@@GLIBCXX_3.4
    81: 08048a64   100 FUNC    WEAK   DEFAULT   13 _ZN7CParentC1Ev
    82: 08048ed4     8 OBJECT  WEAK   DEFAULT   15 _ZTI7CParent
    83: 08048b38    30 FUNC    WEAK   DEFAULT   13 _ZN7CParentD0Ev
    84: 08048c34   182 FUNC    WEAK   DEFAULT   13 _ZN6CChildD2Ev
    85: 0804a30c     0 NOTYPE  GLOBAL DEFAULT  ABS __bss_start
    86: 0804a400    44 OBJECT  WEAK   DEFAULT   26 _ZTVN10__cxxabiv120__si_c
    87: 0804a438     4 OBJECT  GLOBAL DEFAULT   26 _ZN6CChild3refE
    88: 08048ec8     9 OBJECT  WEAK   DEFAULT   15 _ZTS7CParent
    89: 08048ac8   112 FUNC    WEAK   DEFAULT   13 _ZN7CParentD2Ev
    90: 08048eb4     8 OBJECT  WEAK   DEFAULT   15 _ZTS6CChild
    91: 08048ea0    20 OBJECT  WEAK   DEFAULT   15 _ZTV7CParent
    92: 00000000     0 FUNC    GLOBAL DEFAULT  UND _ZdaPv@@GLIBCXX_3.4
    93: 0804a440     0 NOTYPE  GLOBAL DEFAULT  ABS _end
    94: 00000000     0 FUNC    GLOBAL DEFAULT  UND _ZNSolsEPFRSoS_E@@GLIBCXX
    95: 08048b88   171 FUNC    WEAK   DEFAULT   13 _ZN6CChildC1Ev
    96: 0804879c     0 FUNC    GLOBAL DEFAULT  UND _ZSt4endlIcSt11char_trait
    97: 0804a30c     0 NOTYPE  GLOBAL DEFAULT  ABS _edata
    98: 08048c34   182 FUNC    WEAK   DEFAULT   13 _ZN6CChildD1Ev
    99: 080487ac     0 FUNC    GLOBAL DEFAULT  UND __gxx_personality_v0@@CXX
   100: 00000000     0 FUNC    GLOBAL DEFAULT  UND _Unwind_Resume@@GCC_3.0
   101: 08048d75     0 FUNC    GLOBAL HIDDEN    13 __i686.get_pc_thunk.bx
   102: 08048884   388 FUNC    GLOBAL DEFAULT   13 main
   103: 0804868c     0 FUNC    GLOBAL DEFAULT   11 _init

Histogram for `.gnu.hash' bucket list length (total of 3 buckets):
 Length  Number     % of total  Coverage
      0  0          (  0.0%)
      1  0          (  0.0%)      0.0%
      2  2          ( 66.7%)     57.1%
      3  1          ( 33.3%)    100.0%

Version symbols section '.gnu.version' contains 22 entries:
 Addr: 0000000008048540  Offset: 0x000540  Link: 5 (.dynsym)
  000:   0 (*local*)       2 (GLIBC_2.1.3)   0 (*local*)       0 (*local*)    
  004:   3 (GLIBCXX_3.4)   3 (GLIBCXX_3.4)   4 (GLIBC_2.0)     3 (GLIBCXX_3.4)
  008:   4 (GLIBC_2.0)     3 (GLIBCXX_3.4)   3 (GLIBCXX_3.4)   3 (GLIBCXX_3.4)
  00c:   3 (GLIBCXX_3.4)   3 (GLIBCXX_3.4)   6 (GCC_3.0)       5 (CXXABI_1.3) 
  010:   5 (CXXABI_1.3)    1 (*global*)      3 (GLIBCXX_3.4)   3 (GLIBCXX_3.4)
  014:   3 (GLIBCXX_3.4)   5 (CXXABI_1.3) 

Version needs section '.gnu.version_r' contains 3 entries:
 Addr: 0x000000000804856c  Offset: 0x00056c  Link: 6 (.dynstr)
  000000: Version: 1  File: libgcc_s.so.1  Cnt: 1
  0x0010:   Name: GCC_3.0  Flags: none  Version: 6
  0x0020: Version: 1  File: libstdc++.so.6  Cnt: 2
  0x0030:   Name: CXXABI_1.3  Flags: none  Version: 5
  0x0040:   Name: GLIBCXX_3.4  Flags: none  Version: 3
  0x0050: Version: 1  File: libc.so.6  Cnt: 2
  0x0060:   Name: GLIBC_2.0  Flags: none  Version: 4
  0x0070:   Name: GLIBC_2.1.3  Flags: none  Version: 2

Notes at offset 0x00000148 with length 0x00000020:
  Owner  Data size Description
  GNU  0x00000010 NT_GNU_ABI_TAG (ABI version tag)

Notes at offset 0x00000168 with length 0x00000024:
  Owner  Data size Description
  GNU  0x00000014 NT_GNU_BUILD_ID (unique build ID bitstring)
[buhadram@localhost src]$

Friday, March 16, 2012

Google Interview (3)

Question:
How would you reverse the image on an n by n matrix where each pixel is represented by a bit?

Answer:
I: n x n matrix
I[0][0] represents 1 byte at the left-most, top-most corner (8 pixels).

Just image it as columns and rows of pixels. For example, a line of an image is represented on the screen as columns of pixels:

+----+----+----+ ....+-----+-----+---+
| p₁ | p₂ | p₃ |     |p_m-2| p_m-1| p_m|
+----+----+----+ ....+-----+-----+---+

In reversed image, it looks like below:

+---+----+----+ ....+----+----+----+
|p_m |p_m-1|p_m-2|     | p₃ | p₂ | p₁ |
+---+----+----+ ....+----+----+----+

The reverse version would have the column position reversed as well as the bit order in each bit. To do this, we need to read each row of the original image in reverse and also do bit reversal.

We have to be careful when swapping bytes above. On 32-bit computers, we can swap left-right of every 32-bit integer of columns, or on 64-bit PCs we can do 64-bit (long integer) swapping before doing the bit reverse.

For example:

n = m/(sizeof(long integer)
for i=0 to n
    right_idx = m-1-i;
    swapLongInteger(image[i], image[right_idx])
    Reverse8Bit(image[i])
    Reverse8Bit(image[i+1])
Reverse8Bit(image[i+2])
Reverse8Bit(image[i+3])
    Reverse8Bit(image[right_idx])
Reverse8Bit(image[right_idx-1])
Reverse8Bit(image[right_idx-2])
Reverse8Bit(image[right_idx-3])
    i = i + sizeof(long integer)
done

The call to multiple "Reverse8Bit" above can be put in multithreading/can be done using parallelism to speed up computation.

The pseudo-code is as below:

# I : list original image
# R : reversed image
for i in range(0,n):
    for j in range(0,n):
        R[i][j] = Reverse8Bit(I[i][n-1-j])

The Reverse8Bit() function is like below (from http://graphics.stanford.edu/~seander/bithacks.html#BitReverseObvious):

unsigned int v;     // input bits to be reversed
unsigned int r = v; // r will be reversed bits of v; first get LSB of v
int s = sizeof(v) * CHAR_BIT - 1; // extra shift needed at end

for (v >>= 1; v; v >>= 1)
{   
  r <<= 1;
  r |= v & 1;
  s--;
}
r <<= s; // shift when v's highest bits are zero

Thursday, March 15, 2012

Google Interview (2)

Question:
How many degrees are there in the angle between the hour and minute hands of a clock when the time is a quarter past three?

Answer:
The stupid and quick answer would be 0 degree.

But, if we rethink again, it's not the case exactly. When the minute hand points to '15' (quarter), the hour hand has moved little bit. How many? Let's analyze!

To make a full-circle (360 degrees rotation), the hour hand has to pass 12 hours, thus for one hour, it takes 360 degrees/12 = 30 degrees. For a quarter hour = 1/4 * 30 = 7.5 degrees.

So the answer is: 7.5 degrees.

A Google Interview

Question:
If a person dials a sequence of numbers on the telephone, what possible words/strings can be formed from the letters associated with those numbers?

Answer:
Let's analyze this.

On a handset, there are keys with number '0' to '9'. Assume we have to key in full phone number (area code + number), it's gonna be 10 digits. Only certain keys have alphabets: 2 [ABC], 3 [DEF], 4 [GHI], 5 [JKL], 6 [MNO], 7 [PQRS], 8 [WXYZ], with total number of sets = 8. The question is to find what is the total number of permutation of these 10-digits out of n set of possible phone numbers

Using permutation formula:

Formula:
	Note: , where _nP_r is the formula for permutations of n objects taken r at a time.

The set of possible numbers then is: [222-222-2222 to 888-888-8888] or 6,666,666,666 possible phone numbers.

We then compute:
n = 10,000,000,000
r = 10

Permutation =6,666,666,666 !/(10!(6,666,666,666 - 10)!) = (6,666,666,656 * 6,666,666,657 * 6,666,666,658 *6,666,666,669 * 6,666,666,660 * 6,666,666,661 * 6,666,666,662 * 6,666,666,663 * 6,666,666,664 * 6,666,666,665 * 6,666,666,666)/3628800 = <just figure out your self!>

If the allowed valid number is only 1 digit, then computation would be easier:

Permutation = 6,666,666,666!/(1!*(6,666,666,666-1)!) = 6,666,666,666 possible words

Fast Hashing of Variable-length text strings

The following Hash algorithm based on a research paper written by Peter Pearson (a scientist at Lawrence Livermore National Lab) has an ability to generate unique hash numbers for strings with lengths no longer than 256 bytes. That means, we can have up to 256 buckets in a hash table, but it guarantees the uniqueness (free of hash-collision), thus no need to have linked-list in buckets.

The beauty of this algorithm is that it is designed to work on small microprocessors or microcontrollers, which sometimes don't have luxury to do complex arithmetics in their Instruction set (or, require more instructions). Instead, the following algorithm uses XOR and table lookup which should be easy to do in small CPUs. On more modern and high-end CPUs, XOR operation translated directly to mnemonic XOR in machine-code (a single instruction) and done even in almost wire-speed.

One of the use I can think of is for creating MAC table for virtual bridging. An Ethernet MAC address takes only 6 octets, so this algorithm can be used to lookup 256-bucket MAC table. The hash-key is MACaddr, while the output is port.

For example:

A MAC table structure might look like below:

+-------------------+---------+------+
|      MACAddr      +  port   |  Age |
+-------------------+---------+------+
| XX:XX:XX:XX:XX:XX |    1    |  6   |
| YY:YY:YY:YY:YY:YY |    2    |  7   |
| ZZ:ZZ:ZZ:ZZ:ZZ:ZZ |    3    |  102 |
| AA:BB:BB:BB:BB:BB |    4    |  33  |
| AA:AA:AA:AA:AA:AA |    5    |  76  |
+-------------------+---------+------+

Learning:
port = PearsonHash(SourceMacAddr);
MACTable[port].addr = SourceMacAddr;
MACTable[port].port = port;

Forwarding:
inspect destination MAC address from incoming ethernet frame

//Flooding
if (DestMacAddr == Broadcast)
Send to all ports, except the original incoming port, with this frame

// a small filtering is done here to prevent loop
if (DestMacAddr != SourceMacAddr)
{
port = PearsonHash(DestMacAddr);
if port is invalid:
flooding
else
DataForward(EthernetFrame, port);
}

Aging:
The following statements can be executed periodically.

for each entry/bucket in the MAC table:
if MACTable[i].age < 1:
Flush(MACTable[i])

A challenge might be how to expand this algorithm to handle, say, 64000 buckets/entries? How to generate the pseudorandom hash table?

#include <stdio.h>
#include <string.h>
#include <stdlib.h>


static unsigned char PseudoRandomHash[256] = {
     1,   87,   49,  12, 176, 178,  102, 166, 121, 193,   6,  84, 249, 230,  44,  163, 
     14, 197,  213, 181, 161,  85,  218,  80,  64, 239,  24, 226, 236, 142,  38,  200, 
    110, 177,  104, 103, 141, 253,  255,  50,  77, 101,  81,  18,  45,  96,  31,  222, 
     25, 107,  190,  70,  86, 237,  240,  34,  72, 242,  20, 214, 244, 227, 149,  235, 
     97, 234,   57,  22,  60, 250,   82,  175, 208,   5, 127, 199, 111,  62, 135,  248, 
    174, 169,  211,  58,  66, 154,  106, 195, 245, 171,  17, 187, 182, 179,   0,  243, 
    132,  56,  148,  75, 128, 133,  158, 100, 130, 126,  91,  13, 153, 246, 216,  219,
    119,  68,  223,  78,  83,  88,  201,  99, 122,  11,  92,  32, 136, 114,  52,   10, 
    138,  30,   48, 183, 156,  35,   61,  26, 143,  74, 251,  94, 129, 162,  63,  152, 
    170,   7,  115, 167, 241, 206,    3, 150,  55,  59, 151, 220,  90,  53,  23,  131, 
    125, 173,   15, 238,  79,  95,   89,  16, 105, 137, 225, 224, 217, 160,  37,  123, 
    118,  73,    2, 157,  46, 116,    9, 145, 134, 228, 207, 212, 202, 215,  69,  229, 
     27, 188,   67, 124, 168, 252,   42,   4,  29, 108,  21, 247,  19, 205,  39,  203, 
    233,  40,  186, 147, 198, 192,  155,  33, 164, 191,  98, 204, 165, 180, 117,   76, 
    140,  36,  210, 172,  41,  54,  159,   8, 185, 232, 113, 196, 231,  47, 146,  120, 
     51,  65,   28, 144, 254, 221,   93, 189, 194, 139, 112,  43,  71, 109, 184,  209}; 
     
#define N   256
     
     
int PearsonHash(const char *str)
{
    int n = strlen(str);
    int i;
    unsigned char *h;
    int result;
    
    if (n > N)
    {
        puts("Pearson Hashing can only be used for string length <256 characters");
        return -1;
    }
    h = (unsigned char *)malloc(N);
    bzero(h, sizeof(char)*N);
    h[0] = 0;
    for(i=1; i<n; i++)
    {
        h[i] = PseudoRandomHash[(h[i-1] ^ str[i])& 0xFF];
        //printf("h[%u]=%0x\n", i, h[i]);
        result = h[i];
    }
    free(h);
    return result;
}

     
int main()
{
    int i;
    int n;
    const char MyStringBase[] = "Ahlan Wa Sahlan Bib, ";
    char *StrTable[N];
    
    for(i=0; i<N; i++)
    {
        printf(" %03u ", PseudoRandomHash[i]);
        if ((i+1)%16 == 0)
            printf("\n");
    }
    for(i=0; i<N; i++)
    {
        StrTable[i] = (char *)malloc(N);
        sprintf(StrTable[i], "%s%u", MyStringBase, i);
    }    
    for(i=0; i<N; i++)
    {
        printf("\n%s: \tHash Index=%u\n", StrTable[i], PearsonHash(StrTable[i]));
        free(StrTable[i]);
    }    
}

Wednesday, March 14, 2012

Print Canonical Addresses of a Site

This 2 lines of Python queries DNS and display the canonical names of a host. It's like a simple DNS lookup (nslookup).

#!/usr/bin/python

import sys
import socket

HOST=''
PORT=80

if (len(sys.argv) < 2):
 print sys.argv[0], " "
 sys.exit(0)

HOST = sys.argv[1]
for res in socket.getaddrinfo(HOST, PORT, socket.AF_UNSPEC, socket.SOCK_STREAM):
   print res[4][0]

For example, if we execute the script and pass parameter 'www.google.com', we'd get this:

$ ./getdns.py www.google.com
74.125.224.50
74.125.224.51
74.125.224.48
74.125.224.49
74.125.224.52

Python GUI

This small Python script creates a window application and handles menu events as well as updating statusbar at the bottom.

#!/usr/bin/python

from Tkinter import *
import tkMessageBox

class StatusBar(Frame):

    def __init__(self, master):
        Frame.__init__(self, master)
        self.label = Label(self, text="", bd=1, relief=SUNKEN, anchor=W)
        self.label.pack(side=BOTTOM, fill=X)

    def set(self, format, *args):
        self.label.config(text=format % args)
        self.label.update_idletasks()

    def clear(self):
        self.label.config(text="")
        self.label.update_idletasks()

class App:

    def __init__(self, parent):
        self.this = self
        frame = Frame(parent)
        parent.protocol("WM_DELETE_WINDOW", ConfirmQuit)

        parent.geometry("%dx%d%+d%+d" % (600, 400, 100, 50))

        # create a menu
        menu = Menu(parent)
        parent.config(menu=menu)

        filemenu = Menu(menu)
        menu.add_cascade(label="File", menu=filemenu)
        filemenu.add_command(label="New", command=self.newFile)
        filemenu.add_command(label="Open...", command=self.OpenFile)
        filemenu.add_separator()
        filemenu.add_command(label="Exit", command=self.Exit)

        runmenu = Menu(menu)
        menu.add_cascade(label="Run", menu=runmenu)
        runmenu.add_command(label="Run now", command=self.callback)

        helpmenu = Menu(menu)
        menu.add_cascade(label="Help", menu=helpmenu)
        helpmenu.add_command(label="About...", command=self.About)

    def callback(event):
        print "callback was called from event class ",event.__class__
        print "event.__name__=",__name__
        print "event type:", type(event)
        print "event.this.__class__=", event.this.__class__
        print "this: type=", type(event.this), ",class=", event.this.__class__
        print "status_bar", status_bar.__class__
        status_bar.set("%s", "Ahlan bib")

    def newFile(event):
        print "New file", event
        status_bar.set("%s", "Creating a new file")

    def OpenFile(event):
        print "Open File", event
        status_bar.set("%s", "Opening a file")

    def About(event):
        print "(c) 2012, mlutfi", event
        status_bar.clear()

    def say_hi(self):
        print "hi there, everyone!"

    def Exit(event):
        ConfirmQuit()

class BaseWindow:
    def __init__(self, parent):
        Toplevel()


def ConfirmQuit():
    if tkMessageBox.askokcancel("Quit", "Do you really want to quit?"):
        root.destroy()

root = Tk()
root.protocol("WM_DELETE_WINDOW", ConfirmQuit)

status_bar = StatusBar(root)
status_bar.pack(side=BOTTOM, fill=X)

app = App(root)

root.mainloop()

Sunday, March 11, 2012

Fibonacci in Python

# Fibonacci numbers module

def fib(n):    # write Fibonacci series up to n
    a, b = 0, 1
    while b < n:
        #print b,
        a, b = b, a+b

def fib2(n): # return Fibonacci series up to n
    result = []
    a, b = 0, 1
    while b < n:
        result.append(b)
        a, b = b, a+b
    print b
    return result
    
# make it executable as a script as well as module_exit
if __name__ == "__main__":
    import sys
    if (len(sys.argv) > 1):
        f = fib2(int(sys.argv[1]))
        str = ""
        print "Fibonacci sequence: ", f
    else:
        print sys.argv[0], " "

Downloading book "Foundations of Computer Science"

This small python script would download all chapters in the book "Foundations of Computer Science".


#!/usr/bin/python

import sys
from subprocess import call  # for 'call'


urlbase = 'http://infolab.stanford.edu/~ullman/focs/'

others = ['preface.pdf', 'toc.pdf', 'index.pdf']
chapters = range(1,15)
links = []
links.extend(others)

def download(index, f):
    url = urlbase + f
    print "file = ", index+1, url
    call(["wget", url])


for ch in chapters:
    links.append('ch' + '%02d' %(ch) + '.pdf')

out = [download(index,obj) for index,obj in enumerate(links)]

print "out=", out

Regular Expression in Python

The following is a Python snippet to do regular expression. First it reads a file passed as an argument, the it goes to a for-loop for each line and do string regular-expression search

#!/usr/bin/python

import sys # for sys.argv, etc.
import re  # regular expression

if (len(sys.argv) > 1):
    filename = sys.argv[1]
else:
    print sys.argv[0], " "
    sys.exit(0)


with open(filename, 'r') as f:
    #f.read()
    for line in f:
        pattern = re.compile(r"print (.*)")
        match = pattern.search(line)
        if match:
            print "print is used to print '", match.group(1), "'"

~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ readfile.py 17,3-9 All "readfile.py" 21L, 386C

Computer Design Fun Party!

To compute expression A = (B-C)*(D-E)

Using accumulator:

Load B ; acc = B; opcode=1 B, oper=2B; mem.traf = 2 B

Sub C ; acc = B – C; 1 byte opcode, 2-bytes operand = 3 bytes

Store A ; A = B -C; 1 byte opcode, 2-bytes operand = 3 bytes

Load D ; acc = D; 1 byte opcode, 2-bytes operand = 3 bytes

Sub E ; acc = D -E; 1 byte opcode, 2-bytes operand = 3 bytes

Mul A ; acc = (B-C)* (D-E) ; 1 byte opcode, 2-bytes operand = 3 bytes

Store A ; A = (B-C) * (D-E) ; 1 byte opcode, 2-bytes operand = 3 bytes

Assume each opcode takes 8-bit, operands are 16-bit and data is move in 16-bit chunks:

Total bytes: 7 * 3 byte = 21 bytes

Memory traffic: 21 + 7 * 2 = 35 bytes

Using load-store:

Ld B, r1 ; r1 = B; opcode=1 B, op = 2 B; mem. traff = 2 bytes

Ld C, r2 ; r2 = C; memory traffic = 2 bytes

Sub r1, r2,r3 ; r3 = r1 * r2 = B – C; size=2 bytes, memory traffic = 0

Ld D, r1 ; r1 = D; memory traffic = 2 bytes

Ld E, r2 ; r2 = E; memory traffic = 2 bytes

Sub r1,r2,r4 ; r4 = r1 * r2 = D-E; memory traffic = 0 bytes

Mul r3, r4, r1 ; r1 = r3 * r4 = (B-C) * (D-E); memory traffic = 0 bytes

St r1,A ; A = r1 = (B-C) * (D-E); memory traffic = 2 bytes

Number of bytes for instructions: 3*7 = 21 bytes

Memory traffic: 21 + 5 *2 = 31 bytes

Using stack:

Push B ; mem. Traf = 2 bytes

Push C ; mem. Traf = 2 bytes

Sub ; B – C; mem. Traf. = 2 bytes

Push D ; mem traf = 2 bytes

Push E ; mem.traf = 2 bytes

Sub ; top of the stack contains D-E, below it is B-C

Mul ; top of the stack now contains (B-C) * (D-E)

Each instruction occupies 3 bytes (1 byte opcode and 2-byte operand), so total instructions use 3*7 = 21 bytes.

Memory traffic: 21 + 2*7 = 35 bytes

Register 0, 1, 2 (R0, R1, R2) are GPRs. R3 is initialized to 1 and make sure it is unchanged.
1. Control sequence that forms the 2’s complement difference (or, R0 ← R0 + (-R1) = R0 + 2^’s complement of R1) of the contents of R0 and R1:

Write Enable				A-bus enables				B-bus enables				F₀F₁		Time
0	1	2	3	0	1	2	3	0	1	2	3	F₀	F₁	T
0	1	0	0	0	1	0	0	0	0	0	0	1	1	0
0	1	0	0	0	1	0	0	0	0	0	1	0	0	1
1	0	0	0	1	1	0	0	1	0	0	0	0	0	2

Explanation:

At T=0:

F₀F₁ = 11 (A’), A-bus enable = 1, B-bus enable = all zeros. Data (say, Y) is placed in A-bus. C-bus contains Y’, Write-enable = pin 1. So Y is written to register 1 (R1 = Y’).

At T = 1:

F₀F₁ = 00 (add), A-bus enable = pin 1 (content of R1 which is X, is in A-bus), B-bus enable= pin 3 (content of R3 which is 1 is in B-bus), write-enable = pin 1 (C-bus contains Y’ + 1). So R1 = Y’ + 1 = 2’s compl. Of X.(or R1 = -Y)

At T = 2

B-bus = 0, hence data from R0, say X, is placed in B-bus. F₀F₁ = 00 (add), A-bus enable = 1 (which contains –Y). C-bus contains R0 + R1 = X - Y . Write-enable = pin 0, so result is stored in R0.

R0 XOR R1 (R0’ & R1 + R0 & R1’) = (R0 + R1)(R0’ + R1’), leaving the result in R0.

Write Enable				A-bus enables				B-bus enables				F₀F₁		Time	Note
0	1	2	3	0	1	2	3	0	1	2	3	F₀	F₁	T	(Movements etc.)
0	0	1	0	1	0	0	0	0	1	0	0	0	0	0	R2 ← A + B
1	0	0	0	1	0	0	0	0	0	0	0	1	1	1	R0 ← A’
0	1	0	0	0	1	0	0	0	0	0	0	1	1	2	R1 ← B’
0	1	0	0	1	0	0	0	0	1	0	0	0	0	3	R1 ← A’ + B’
1	0	0	0	0	1	0	0	0	0	1		0	1	4	R0 ← (A+B)&(A’+B’)

Booth Multiplication Algorithm:

/***********************************************************
Title: Booth Algorithm sample
Desc: To simulate booth multiplication algorithm
Author: Buhadram
Notes:

Ref:
- Computer Architecture: A Quantitative Approach
*************************************************************/
#include 
#include 
#include 
#include 

#define NUMTYPE	   	short int
#define NUMBITS		(8*sizeof(NUMTYPE))
#define MAX(x,y) 		((x) > (y) ? x : y)
#define MIN(x,y) 		((x) < (y) ? x : y)
#define BITMASK(x,mask) 	((x) & mask)
/*********************************
proc: bitmask
desc: return 1 if bit[pos] is set
      pos=0 for LSB, and p=sizeof(x)-1
      for MSB
*********************************/
#define BIT(x,pos)		( (BITMASK(x,1<<(pos))) >> (pos) )
#define ASCII_BIT(x)	((x) + '0')

char *get_input(const char *prompt, char *buf, int length)
{
	char *p;
	int i;
    printf("%s",prompt);
	// fgets is safer to avoid buffer overflow
	p = fgets(buf, length, stdin);
	// remove newline character
	buf[strlen(buf)-1] = 0;
	return p;
}

/********************************************
* Swapbits
* Desc: To swap bit orders, e.g LSB becomes MSB
        Note: b[0] is LSB, b[n-1] is MSB
********************************************/
void swapbits(NUMTYPE *b)
{
    int i;
    int tmp;

	tmp = 0;
	for(i=0; i base)
       base = 10;                    /* can only use 0-9, A-Z        */
    tail = &buf[BUFSIZE - 1];           /* last character position      */
    *tail-- = '\0';

    if (10 == base && N < 0L) {
	   *head++ = '-';
       uarg = -N;
    }
    else uarg = N;

    if (uarg) {
  	   	for (i = 1; uarg; ++i) {
       	   register ldiv_t r;
		   r = ldiv(uarg, base);
           *tail-- = (char)(r.rem + ((9L < r.rem) ? ('A' - 10L) : '0'));
           uarg    = r.quot;
    	}
    }
    else  *tail-- = '0';
	memcpy(head, ++tail, i);
    return str;
}

#endif


/******************************
proc: btol
desc: converts binary string to
	  long integer
******************************/
long btol(const char *bin)
{
    int n,i;
    long d;

	n = strlen(bin);
	d = 0;
	for(i=n-1; i>=0; i--) {
		d += (bin[i]-'0') * (1 << i);
    }
    return d;
}

/**************************************
Proc: Ashift_bits_right
desc: perform arithmetic shift right on
    bits in x, but preserving the sign bit.
***************************************/
void ashift_bits_right(long *x, int numbits)
{
    int neg;

    neg = (*x<0) ? 1 : 0;
    *x >>= numbits;

    if (neg)
       //*x |= (1<>= 1;
		qm = q0;
		q >>= 1;
		if (a0)
		   r |= b1;
		b1 <<= 1;
    }
    return r;
}


int main(int argc, char *argv[])
{
	#define BUF_SIZE	(1024+1)
	char *buf;
	NUMTYPE A, B;
	long res;
	char str[256];

	buf = (char *)malloc(BUF_SIZE+1);
	assert(buf);
    get_input("A = ", buf, BUF_SIZE);
    A = atol(buf);
    printf("A (in Binary) = %s\n", ltoa(A, str, 2));
    get_input("B = ", buf, BUF_SIZE);
    B = atol(buf);
    printf("B (in Binary) = %s\n", ltoa(B, str, 2));
    res = A+B;
    printf("A + B = %ld = 0x%0X\n", res, res);
    res = A - B;
	printf("A - B = %ld = 0x%0X\n", res, res);
    puts("\n\nBOOTH MULTIPLICATION\n");
    res = booth_mult(A,B);
	printf("A * B = %ld = 0x%0X = %sb\n", res, res, ltoa(B,str,2));
    free(buf);
	return 0;
}

Friday, March 9, 2012

Counting the number of "1" bits

Another "ridiculous" interview I had was when an interviewer asked me how to count the number of set bits in a 32-bit or 8-bit. I told him that there are various algorithms to do this, but the easiest way but yet good enough in performance is a function like this:

uint32_t count_bits(uint32_t b)
{
    int i;
    int n = 0;
    
    for(i=0; i<sizeof(b)*8; i++)
    {
        n += (b & 1);
        b >>= 1;
    }   
    return n;
}

I also told him that there are many ways to do this (referred him to look at "Beautiful Code" to see some of the beautiful algorithms).

But then the interviewer argued. FIrst, he said the upper border of the for-loop should be to "sizeof(b)*8" (meaning, instead of using "<", I should have used "<="). To bad, I said yes to him without thinking further (Probably got nervous. Later at home, I checked that he was wrong). Second, he said that the fastest is to use look-up table (n[0] = 0, n[1]=1, n[2]=1,...,n[255]=8,...). Although I agreed with his argument, but then he become inconsistent with his original question and there is a big downside of his argument. I said that this wouldn't be a choice in embedded programming where memory resource is limited. Secondly, his original question was to compute ones in a 32-bit number. That means we have to 2^32 (4,294,967,296 or about 4 GBytes) items set manually in an array. That's so ridiculous. Who wants to waste that much space just to calculate the number of bits in an integer? For a byte, although it seems acceptable ("only" 256 bytes of table), but compare that to the above algorithm:

movl $32,  %edx
 xorl %ecx, %ecx

.L2:
 movl %ecx, %ebx
 shrl %ecx
 andl $1,   %ebx
 addl %ebx, %eax
 decl %edx
 jne .L2

It's only 7 x86 mnemonic instructions (about 12-18 bytes) and no memory involvement in the loop!. Yes, it is O(32) for 32-bit, but it's a constant speed. Even further, if the function is called only once, the fully optimized version of the instructions would be inserted inline in the place (without calling the function), so we can CPU time from doing push-pop stack frames. That's another tiny speedup. It's probably fine for table lookup (yes, it's the fastest, O(1)), but it is limited to 8-bit. For 16 or higher, seems the space-time trade-off is too big not to consider. Besides, the above algorithm extendable into polymorphism in Object-Oriented code.

When I searched Google, I found the following (on Stanford's web):

v = v - ((v >> 1) & 0x55555555);                    // reuse input as temporary
v = (v & 0x33333333) + ((v >> 2) & 0x33333333);     // temp
c = ((v + (v >> 4) & 0xF0F0F0F) * 0x1010101) >> 24; // count

Which renders to instruction codes like below:

shrl %edx
 andl $1431655765, %edx
 subl %edx, %eax
 movl %eax, %edx
 shrl $2, %eax
 andl $858993459, %edx
 andl $858993459, %eax
 addl %edx, %eax
 movl %eax, %edx
 shrl $4, %edx
 leal (%edx,%eax), %eax
 andl $252645135, %eax
 imull $16843009, %eax, %eax
 shrl $24, %eax

(14 mnemonics). This seems the fastest and most optimized. It took some time for me to understand it.

Seems the interviewer doesn't have enough experience in embedded or microcontroller or doesn't think much about optimization. Nevertheless, I failed an interview again. I start to think that many interview questions are really not to dig the thinking ability of a candidate, but rather to get instant answers meet their mind. I just hope big software companies like IBM, Microsoft, Google or Apple shouldn't fall to this kind of trap, but rather ask logical questions to know if the candidate can think systematically.

What a waste!

Thursday, March 8, 2012

Parallelizable integer multiplication

At a recent interview with an Amazon, the hiring manager asked me about an algorithm to do integer multiplication without using any multiplication/division symbol in parallel computing. The following algorithm was suggested to do integer multiplication without using any "MULT" instruction:

int mult(int x, int y)
{
 int z;

 if ((x==0) || (y==0))
  return 0;
 if (x==1)
  return y;

 if (y==1)
  return x;

 z = mult(x, y>>1);
 if (y % 2 == 0)
  return z<<1;
 else
  return x + (z<<1);
}

My question came up, is this parallelizable? My argument to the interviewer was that this is not a good approach if he is asking from parallel computation perspective (See Robertazzi's papers on ACM journals to know more detail about parallel computation)

When a recursive call is made, the parameters are pushed into the stack before calling each subsequence recursive call. Involving stack in this manner is hardly parallelizable, because there is coupling of current result based on previous results. One of the principle of parallel computing is to decouple two or more computation as much as we can.

A dig into highly-optimized assembly language of the above code:

mult:
 pushl %ebp
 movl %esp, %ebp
 subl $24, %esp
 movl %ebx, -8(%ebp)
 movl %esi, -4(%ebp)
 movl 12(%ebp), %ebx
 movl 8(%ebp), %esi
 testl %ebx, %ebx
 je .L4
 testl %esi, %esi
 je .L4
 cmpl $1, %esi
 je .L2
 cmpl $1, %ebx
 .p2align 4,,3
 je .L5
 movl %ebx, %eax
 movl %esi, (%esp)
 sarl %eax
 movl %eax, 4(%esp)
 call mult
 andl $1, %ebx
 je .L7
 leal (%esi,%eax,2), %ebx
.L2:
 movl %ebx, %eax
 movl -4(%ebp), %esi
 movl -8(%ebp), %ebx
 leave
 ret
 .p2align 4,,10
 .p2align 3
.L4:
 xorl %ebx, %ebx
 movl -4(%ebp), %esi
 movl %ebx, %eax
 movl -8(%ebp), %ebx
 leave
 ret
 .p2align 4,,10
 .p2align 3
.L7:
 leal (%eax,%eax), %ebx
 movl -4(%ebp), %esi
 movl %ebx, %eax
 movl -8(%ebp), %ebx
 leave
 ret
 .p2align 4,,10
 .p2align 3
.L5:
 movl %esi, %ebx
 movl -4(%ebp), %esi
 movl %ebx, %eax
 movl -8(%ebp), %ebx
 leave
 ret

As can be seen, "SARL" (Shift Arithmetic Left) for the shift-left operation. Some issues after analyzing this code. First, there are multiple register-to-memory flow while doing the recursive:

movl %ebx, %eax
 movl %esi, (%esp)
 sarl %eax
 movl %eax, 4(%esp)

Memory access is expensive and the instructions might cause cache miss thus forcing CPU to reread data from RAM.
Another issue, how do we parallelize this, if there is stack push-pop coupling for the result?

My argument was to decompose the computation into two (or more) portions.

For example:

int pivot = b/2;

for(i=0; i<pivot; i++)
    sum1 += a;

for(i=pivot+1; i<b; i++)
    sum2 += a;
sum = sum1 + sum2;

No, let's check what we can see in the x86 assembly (gcc-generated):

mult:
 pushl %ebp
 movl %esp, %ebp
 subl $8, %esp
 movl 12(%ebp), %eax
 movl %ebx, (%esp)
 movl %esi, 4(%esp)
 movl 8(%ebp), %edx
 testl %eax, %eax
 je .L5
 testl %edx, %edx
 je .L5
 cmpl $1, %edx
 je .L2
 cmpl $1, %eax
 .p2align 4,,3
 je .L6
 movl %eax, %ecx
 xorl %esi, %esi
 shrl $31, %ecx
 xorl %ebx, %ebx
 addl %eax, %ecx
 sarl %ecx
 testl %ecx, %ecx
 jle .L3
 movl %ecx, %esi
 movl %ecx, %ebx
 imull %edx, %esi
.L3:
 xorl %ecx, %ecx
 cmpl %ebx, %eax
 jle .L4
 movl %eax, %ecx
 subl %ebx, %ecx
 imull %edx, %ecx
.L4:
 leal (%ecx,%esi), %eax
.L2:
 movl (%esp), %ebx
 movl 4(%esp), %esi
 leave
 ret
 .p2align 4,,10
 .p2align 3
.L5:
 xorl %eax, %eax
 movl (%esp), %ebx
 movl 4(%esp), %esi
 leave
 ret
 .p2align 4,,10
 .p2align 3
.L6:
 movl %edx, %eax
 jmp .L2

As can be seen, most of operations are in registers, also there is decoupling between 2 partitions (e.g, sum1 does not rely on sum2). The more we partition the calculation (e.g, pivot1 = b/4, and do 4 for-next loops for each partition), the more divisible partition can be accomplished by parallel computer. I was arguing that to optimize computation, hence lowering O(n), we should use divisible loops ("divide-et-empera") instead of recursive approach (seems the interviewer disagreed)

PS: My argument above caused me not to land job at Amazon.