Nobody Cares About Your Makefile

GNU Make is a relatively modern version of one of the oldest build tools still in regular use. Make has existed since 1977 and is still the standard tool for building native software in Unix-derived environment. The pattern Make established, based on using a directed graph to automate builds, is still in use in more recent tools.

make understanding

Make’s basic approach to building software is based on walking directed graphs. Each node of a Makefile’s graph represents a file, and the arrows point to other nodes representing dependencies. Make begins at the nodes named on the command line and follows the arrows to determine which other nodes to build first.

Make reads the graph, and the actions required to build any files it decides it needs to, are read from a text file called a Makefile with a very simple structure. Lines like

alice: bob gerald

specify that a file named alice depends on files named bob and gerald. Dependency rules can also include shell commands to perform when building alice:

alice: bob gerald
        cat bob gerald > alice

Makefiles are an elegant way to describe the proces of building native programs, but on their own they’re insufficient and inefficient for larger projects.

make hit-and-miss

Make grew out of the Unix culture. Make itself is designed to run from a shell, which makes it very easy to automate; Make editing modes and execution plugins exist for every worthwhile Unix text editor and Make itself forms the backbone for a number of Unix IDEs. In turn, Make uses a list of shell commands to execute at each step, which makes it very easy to extend—in fact, Make has no way to compile or package code on its own, relying on a toolchain like GCC for the finer details.

The same culture also popularized the C language, which requires both compilation into intermediate object files and linking into the final program. Because compiling source code can take a while for large projects, Make has features to support incremental recompilation. When Make encounters a node whose file is newer than the files of all dependency nodes, it skips the build stage for that file on the assumption that it’s already up to date. This means that running the same Make command twice will usually do different, but equivalent, things: the first one will build the entire dependency graph from source, and the second one will notice that everything’s up to date and do nothing.

On the surface, this is a safe optimization: if your source files haven’t changed, and your compile process is deterministic, then you don’t need to rebuild the intermediate or final products. However, specificying the project’s dependency graph correctly is not as easy as it appears.

Consider the following C program, composed of two source files and a header:

logging.h

#ifndef LOGGING
#define LOGGING

void log_message (const char *message);

#endif

logging.c

#include 
#include "logging.h"

void log_message (const char *message) {
    printf ("Log message: %s\n", message);
}

main.c

#include "logging.h"

int main () {
    log_message ("Hello, world.");
    return 0;
}

The final product, an executable named example-1, has a dependency graph shaped like this:

Both objects depend on the header file and one source file.

Both objects depend on the header file and one source file.

Unfortuntely, Make’s code-agnostic nature makes is unable to spot the dependencies on logging.h without help, giving this graph:

Make's stock rules are unaware of included files.

Make's stock rules are unaware of included files.

Adding a new feature to the “logging” library by modifying the source and header will recompile both logging.o and the final program, but not main.o. Depending on the kind of changes involved, this could be harmless, or it could lead to some fairly strange bugs—bugs that magically disappear if you build from a clean environment.

For a trivial project, like this one, it was easy enough to specify them manually in the Makefile I used to test this example:

example-1: main.o logging.o
	$(CC) $(LDFLAGS) $^ -o $@

main.o: main.c logging.h

logging.o: logging.c logging.h

In a real project, however, there can be tens or hundreds of header files, and manually maintaining the list of header files in two places (in the source itself, and in the Makefile) is tedious and error-prone. There is no good, general-case solution for this, but there are language-specific tools that can parse a source tree and spit out a list of include files in a format Make can understand. Using these tools is one more piece of complexity in the build and one more obstacle to maintenance, compounded by the Unix philosophy of gluing tools together.

GNU Make’s manual suggests this snippet for dependency analysis of C and C++ projects:

%.d: %.c
        @set -e; rm -f $@; \
         $(CC) -M $(CPPFLAGS) $< > $@.$$$$; \
         sed 's,\($*\)\.o[ :]*,\1.o $@ : ,g' < $@.$$$$ > $@; \
         rm -f $@.$$$$

include *.d

…clear as mud.

Various tools have grown up around Make that automate this and other kinds of configuration, from various dependency scanning tools to the GNU Autotools suite, used by almost every recent Unix-and-C project. These tools generate Makefiles that are about as comprehensible as that last snippet and degrade both the portability and the simplicity of the build system.

Make’s own tools for streamlining builds aren’t much better: “pattern” rules, which are made out of a combination of a wildcard matching language and special variables in the Makefile, can bridge gaps in dependency trees and allow Make to infer intermediate steps from the sources available and the target to be built. Patterns can also standardize builds by packaging up common steps; the pattern rule Make provides for compiling C source files takes advantage of this:

%.o: %.c
        $(CC) -c $(CPPFLAGS) $(CFLAGS) $^ -o $@

In a large project that makes heavy use of patterns, though, it rapidly becomes impossible to determine which source files will be rebuilt until you actually go to run the build.

make finished

Designing your project to play to Make’s strengths can create simple, readable, fast builds. However, even with the voluminous documentation on how to use Make available on the web, it can be hard to keep your build within Make’s outdated build model in the face of complex projects. If you plan to use Make, find someone who’s already experienced with it before setting out; it’s not simple enough for an inexperienced team to use effectively, and by the time they can use it to its fullest, they’ll be experts and your project will be late.

I’ve started with Make because it was one of the first tools to formalize the idea of a build process. While it hasn’t aged particularly well, and its support for comfort features and chrome is weak compared to modern offerings, Make is still a competent tool and does well at three of the fundamentals of a build system: automatability, standardization, and extensibility.

Edited Nov. 18, 2008: corrected the name of the header in #include directives in the example. Thanks, Plouj!

7 Comments

  • By kit, October 1, 2008 @ 10:01 pm

    it must be said that anything using $@.$$$$ in an example usage is on its way to loss.

  • By Michael S., November 17, 2008 @ 3:24 pm

    I’ve always thought that the tools themselves, e.g. the $(CC) compiler and associated linker, should be able to build the dependency graphs of the files they are depending on for creating the output file(s). It is only the tools themselves who can know what they are depending on. Humans - as makefile creators - are bound to fail. E.g. because the forget to add the file into the dependency tree.

    /Michael

  • By jsled, November 17, 2008 @ 5:37 pm

    @Michael S.: in case you’re unaware, that’s what the “clear as mud” “$(CC) -M […]” block referenced in the article is. It may be a sub optimal way of doing it (the output probably shouldn’t need to be sed-post-processed given probably the only thing anyone ever uses the output for is Make dependency generation) … but it’s not like the author of this article is actually suggesting an alternative.

  • By Owen, November 18, 2008 @ 2:19 am

    @jsled, @Michael S.: Make’s biggest problem is not Make, but the languages it was built to work with. C (and to a lesser extent, Fortran) are not well-suited to static analysis. Figuring out if a given symbol exists in a complete C program involves running the preprocessor - and people have written Turing machines for it, proving that it can compute completely arbitrary things in arbitrary time.

    Most newer languages have some sort of easy-to-detect dependency information. Java’s classfile format makes references to other classes very obvious in the constant pool, and the source format is relatively easy to parse. Ruby and Python both have hookable module systems: PyUnit takes advantage of this to “roll back” any imports that happen during test evaluation, but you could just as easily hook it to log the dependencies that are actually used during a test run.

    Even Objective-C (and gcc, incidentaly) has #import, which is a step in the right direction: it describes what the author wants to do (use a given library) rather than how to do it (paste this source file here).

  • By Plouj, November 18, 2008 @ 10:35 am

    Shouldn’t those include lines be #include “logging.h” and not #include “example-1.h”?

  • By Owen, November 18, 2008 @ 11:51 pm

    @Plouj: …crud, you’re right. I even recall reminding myself to fix that when I was writing the post. It’s fixed now.

  • By Rudolf Olah, December 27, 2008 @ 12:55 am

    You should check out SCons if you want a replacement for GNU Make. It’s Python, reads well and does a lot of things magically (with lots of room for configuration when the defaults aren’t good enough).

Other Links to this Post

RSS feed for comments on this post. TrackBack URI

Leave a comment

Image | WordPress Themes