assembler | Arabesque

This entry is part 4 of 7 in the series Unix as IDE.

There are a lot of tools available for compiling and interpreting code on the Unix platform, and they tend to be used in different ways. However, conceptually many of the steps are the same. Here I’ll discuss compiling C code with gcc from the GNU Compiler Collection, and briefly the use of perl as an example of an interpreter.

GCC

GCC is a very mature GPL-licensed collection of compilers, perhaps best-known for working with C and C++ programs. Its free software license and near ubiquity on free Unix-like systems like GNU/Linux and BSD has made it enduringly popular for these purposes, though more modern alternatives are available in compilers using the LLVM infrastructure, such as Clang.

The frontend binaries for GNU Compiler Collection are best thought of less as a set of complete compilers in their own right, and more as drivers for a set of discrete programming tools, performing parsing, compiling, and linking, among other steps. This means that while you can use GCC with a relatively simple command line to compile straight from C sources to a working binary, you can also inspect in more detail the steps it takes along the way and tweak it accordingly.

I won’t be discussing the use of make files here, though you’ll almost certainly be wanting them for any C project of more than one file; that will be discussed in the next article on build automation tools.

Compiling and assembling object code

You can compile object code from a C source file like so:

$ gcc -c example.c -o example.o

Assuming it’s a valid C program, this will generate an unlinked binary object file called example.o in the current directory, or tell you the reasons it can’t. You can inspect its assembler contents with the objdump tool:

$ objdump -D example.o

Alternatively, you can get gcc to output the appropriate assembly code for the object directly with the -S parameter:

$ gcc -c -S example.c -o example.s

This kind of assembly output can be particularly instructive, or at least interesting, when printed inline with the source code itself, which you can do with:

$ gcc -c -g -Wa,-a,-ad example.c > example.lst

Preprocessor

The C preprocessor cpp is generally used to include header files and define macros, among other things. It’s a normal part of gcc compilation, but you can view the C code it generates by invoking cpp directly:

$ cpp example.c

This will print out the complete code as it would be compiled, with includes and relevant macros applied.

Linking objects

One or more objects can be linked into appropriate binaries like so:

$ gcc example.o -o example

In this example, GCC is not doing much more than abstracting a call to ld, the GNU linker. The command produces an executable binary called example.

Compiling, assembling, and linking

All of the above can be done in one step with:

$ gcc example.c -o example

This is a little simpler, but compiling objects independently turns out to have some practical performance benefits in not recompiling code unnecessarily, which I’ll discuss in the next article.

Including and linking

C files and headers can be explicitly included in a compilation call with the -I parameter:

$ gcc -I/usr/include/somelib.h example.c -o example

Similarly, if the code needs to be dynamically linked against a compiled system library available in common locations like /lib or /usr/lib, such as ncurses, that can be included with the -l parameter:

$ gcc -lncurses example.c -o example

If you have a lot of necessary inclusions and links in your compilation process, it makes sense to put this into environment variables:

$ export CFLAGS=-I/usr/include/somelib.h
$ export CLIBS=-lncurses
$ gcc $CFLAGS $CLIBS example.c -o example

This very common step is another thing that a Makefile is designed to abstract away for you.

Compilation plan

To inspect in more detail what gcc is doing with any call, you can add the -v switch to prompt it to print its compilation plan on the standard error stream:

$ gcc -v -c example.c -o example.o

If you don’t want it to actually generate object files or linked binaries, it’s sometimes tidier to use -### instead:

$ gcc -### -c example.c -o example.o

This is mostly instructive to see what steps the gcc binary is abstracting away for you, but in specific cases it can be useful to identify steps the compiler is taking that you may not necessarily want it to.

More verbose error checking

You can add the -Wall and/or -pedantic options to the gcc call to prompt it to warn you about things that may not necessarily be errors, but could be:

$ gcc -Wall -pedantic -c example.c -o example.o

This is good for including in your Makefile or in your makeprg definition in Vim, as it works well with the quickfix window discussed in the previous article and will enable you to write more readable, compatible, and less error-prone code as it warns you more extensively about errors.

Profiling compilation time

You can pass the flag -time to gcc to generate output showing how long each step is taking:

$ gcc -time -c example.c -o example.o

Optimisation

You can pass generic optimisation options to gcc to make it attempt to build more efficient object files and linked binaries, at the expense of compilation time. I find -O2 is usually a happy medium for code going into production:

gcc -O1
gcc -O2
gcc -O3

Like any other Bash command, all of this can be called from within Vim by:

:!gcc % -o example

Interpreters

The approach to interpreted code on Unix-like systems is very different. In these examples I’ll use Perl, but most of these principles will be applicable to interpreted Python or Ruby code, for example.

Inline

You can run a string of Perl code directly into the interpreter in any one of the following ways, in this case printing the single line “Hello, world.” to the screen, with a linebreak following. The first one is perhaps the tidiest and most standard way to work with Perl; the second uses a heredoc string, and the third a classic Unix shell pipe.

$ perl -e 'print "Hello world.\n";'
$ perl <<<'print "Hello world.\n";'
$ echo 'print "Hello world.\n";' | perl

Of course, it’s more typical to keep the code in a file, which can be run directly:

$ perl hello.pl

In either case, you can check the syntax of the code without actually running it with the -c switch:

$ perl -c hello.pl

But to use the script as a logical binary, so you can invoke it directly without knowing or caring what the script is, you can add a special first line to the file called the “shebang” that does some magic to specify the interpreter through which the file should be run.

#!/usr/bin/env perl
print "Hello, world.\n";

The script then needs to be made executable with a chmod call. It’s also good practice to rename it to remove the extension, since it is now taking the shape of a logic binary:

$ mv hello{.pl,}
$ chmod +x hello

And can thereafter be invoked directly, as if it were a compiled binary:

$ ./hello

This works so transparently that many of the common utilities on modern GNU/Linux systems, such as the adduser frontend to useradd, are actually Perl or even Python scripts.

In the next post, I’ll describe the use of make for defining and automating building projects in a manner comparable to IDEs, with a nod to newer takes on the same idea with Ruby’s rake.

Systems, Tools, and Terminal Science

Tag Archives: assembler

Unix as IDE: Compiling

GCC