Wednesday, December 28, 2005

Some Optimization Flags on GCC

Yesterday, I was playing around with some optimizations options supplied by GCC (ver 4.0.2) on my AMD64 laptop. I created a very small code to compute sin(x) * cos (x). As we know, Pentium and newer processors support a FP machine code to compute these two functions simulatenously in single mnemonic code: FSINCOS.

the code is as follow:

#include

inline double fsincos(double x)
{
return sin(x)*cos(x);
}

First, I compiled it to assembly source with no optimization enabled, just using defaults as follow:

> gcc -c -S fsincos.c -o fsincos.s

and then open the generated assembly code in fsincos.s. Overthere I saw somewhere it called internal library functions for sin and cos trigonometry functions as I expected. I then recompiled it with full optimization flags enabled:

> gcc -c -S -mtune=athlon64 -mfpmath=sse -msse3 -ffast-math -m64 -O3 fsincos.c -o fsincos64.s

Suprisingly, it still called GNU's "sin" and "cos" math functions!. What? I said (not loud, though). I checked again the gcc man page, nothing special about these things. Hm....let's try to add "387" in the mfpmath, I said in my mind.

So, the command is now as follow:

gcc -c -S -mtune=athlon64 -mfpmath=387,sse -msse3 -ffast-math -m64 -O3 fsincos.c -o fsincos64.s

Voila! now the assembly code called inline FP code "fsincos" in it. Since then, I add an environment variable in the enviroment as follow:

export CFLAGS="-mtune=athlon64 -mfpmath=387,sse -msse3 -ffast-math -m64"
export CPPFLAGS=$CFLAGS

X86_64 Assembly Nightmare

A few days ago I was trying to compile mpg123 application. It is a command-line MP3 player. I begin with command:

make help

It gave some options. First I tried "make linux", still failed to compile with a bunch of errors. I then tried "make linux-3dnow", or "make linux-3dnow-alsa". All of them failed to compile at some assembly files (*.s). I look at one of the assembly-code files, and saw that the codes were designed for 32bit platform, while my laptop was running x86_64 Suse v10.0 (64-bit Linux). I did change "pushl %ebp" etc. to "push %rbp" etc., the compiled went OK, but when I tried to run the program, it crashed with segmentation fault message.

Hmm...I then went to AMD website and download all the documentation. These PDF documents are huge (each PDF file has more than 300 pages). I started reading one of them. Will post any progress later.

Stay tuned!

64-Bit Power Struggle Heats Up

AMD still is the winner, at least for now.

http://www.eweek.com/print_article2/0,1217,a=167278,00.asp

Sunday, December 18, 2005

Fixes on FFTW for 64bit

I just downloaded FFTW library from http://www.fftw.org and tried to compile it with full optimizations and 64bit-enable flag. Many files were compiled OK, but when GCC tried to compile sse2.c, it failed with complaints:

Error: suffix or operands invalid for `push'
Error: suffix or operands invalid for `pop'
When I digged into the code I saw that it was using 32bit register ebx. I then changed it to rbx (64bit version of BX) and recompile it, it works.

Still need to test it rigorously whether the would inadvertent effect with all the optimizations. By the way, here is my optimization flags to GCC compiler:

CFLAGS="-mtune=athlon64 -msse2 -mfpmath=sse -ffast-math -m64 -O3"