Crash Course on Mixing C and Assembly on Linux/x86

Editor’s Note: This article is designed to get you thinking a bit about assembly on i386 machines, and to provide an example of x86 convention function calling. Its not really comprehensive enough to serve as a thorough tutorial. Look here, or here for a bit more comprehensive introduction.

Tinkering with assembly code is a great way to learn about how code compiles and runs, and provides great insight into writing better code. Its probably easier and [frankly] more useful, to insert some carefully crafted assembly code into a C program at just the right place. However, you learn more about the machine, and the way your code is stitched together if you call some C code from an assembly program, which is what we’ll do here.

Let’s think a bit about what we need to do to mix C code into assembly. As you probably already know, when you compile a C program, the compiler produces binary code that you can run on the CPU. When you assemble an assembly program, the assembler makes the same type of binary code as well. You can pretty easily mix these two types of code to create a coherent program that you can run. Doing this type of stuff is useful for making highly optimized code, dealing with embedded devices, writing device drivers, or developing low-level things in general. Here, we’re just doing it to tinker. 🙂

Now, here’s the basic thing we’re going to do here:

  1. 1. Write some C code that does something useful. I chose to do a simple calculation of the fibonacci sequence.
  2. 2. Write some assembly code that calls the C code we wrote.
  3. 3. Use a compiler to make binary code from the C code.
  4. 4. Use an assembler to make binary code from the assembly code
  5. 5. Use a linker to to stitch these two chunks of binary code together into an executable.
  6. 6. Run the executable!

The C code is pretty straightforward.

//filename: fib.c
int fib_linear(int fib_num) {
    int i,a,b,tmp;
    tmp=1;
    a = 1;
    b = 1;
    for(i=0; i< fib_num-1; i++)
    {
        tmp = a+b;
        a=b;
        b=tmp;
    }
    return b;
}

I’ve only used 3 instructions, mov, push, and call. mov simply puts an integer value into a register. push simply puts a value in a register onto the stack, a special bit of memory each program needs. call changes the flow of execution, by calling a function. (In case you forgot, assembly code pretty much shuffles data between <a href=”http://en.wikipedia.org/wiki/Hardware_register”>registers.</a> x86 has (essentially) 4 registers most people use for data, called EAX, EBX, ECX, and EDX.)

The last question here is “how do we call a function from an assembly file?” By the convention used for all x86 code, the result of the function is returned in the EAX register. According to convention, the function is free to trash EAX, ECX, and EDX, but not EBX. It is common to allow some registers to be trashed (caller-save registers) and some not to be trashed (caller-save) in all architectures. This allows for greater interoperability between code. In order to call the function, we need to load in the arguments to the function, then call the function. So if you want to call foo(x,y,z), you’d push z, push y, push x, then call foo. When the function finishes, look for the result in EAX. Don’t count on EAX, ECX, and EDX to be unchanged after the function call. Its just that simple.

So, here’s the actual assembly code you can look at!

;filename: main.as
extern fib_linear ;extern notifies the assembler that this label exists outside the file
extern printf

section .data
        msg:    db "eax=%X   ebx=%X   ecx=%X   edx=%X", 10 ;just a string for printf to use
section .text
        global main

main:
        mov ebx, 9   ;we will be computing the 9th fib number
        mov eax, 0xbeef ;random values
        mov ecx, 0xdead
        mov edx, 0xface

        push edx     ;load in 5th arg
        push ecx     ; 4th arg
        push ebx     ; 3rd arg
        push eax     ; 2nd arg
        push msg     ; 1st arg
        call printf ;same as calling "printf("eax= ...", eax, ebx, ecx, edx);"

        push ebx ;load in 1st argument
        call fib_linear ;calls "fib_linear(eax)"

        push edx
        push ecx
        push ebx
        push eax
        push msg
        call printf     ;print out the values of the 4 registers

        mov eax, 1      ;stops the program the right way
        mov ebx, 0
        int 80h

To test this out, simply run the commands:

nasm -f elf -o main.o main.asm  #assemble our asm file
gcc fib.c main.o -o fib_asm     #compile and link in one step
./fib_asm                       #run the program you'll see the value of the registers before and after the call

There you go! You’ve just made your first hybrid x86 assembly/C code program! I hope I’ve got you interested in assembly a bit through this short crash article. If you want a full tutorial you can look here, or just google around a bit. Happy hacking!

This entry was posted in Coding, Hardware, Open Source, Random, Ubuntu. Bookmark the permalink.

5 Responses to Crash Course on Mixing C and Assembly on Linux/x86

  1. Victor says:

    Seems like something got messed up:
    “for(i=0; i< fib_num-1; i++)”

    Pardon me if I’m too bold in assuming you mean “i < fibnum-1". 😉

  2. Victor says:

    Weeeell… that first “less-than” is supposed to be an ampersand in my comment… but not in your code…

    Agh, you see what it looks like in the first code block in your own dang post! 😀

  3. Pingback: Links 17/1/2010: New Pardus, Puredyne GNU/Linux | Boycott Novell

  4. zammi says:

    well i’am trying read this your write in fage home.
    i’m was confius about asembly, now i’am starting learn basic asembly
    please can you give me guide asembly basical. i’am using bactrack sistem operation for education focus asemble and i’am stil newbie.
    pardon me…
    regards
    zammi

  5. Pingback: Ideas to re-sharpen your skills | fossline

Leave a Reply

Your email address will not be published. Required fields are marked *