Easy C Code Generation with X Macros

AuthorKevin Nygaard
Published

X macros are a C preprocessor technique for reducing boilerplate code, thus increasing code quality and maintainablity. A set of source data, kept in a file or a macro, is processed multiple times with multiple macros, resulting in a lightweight code generator.

Contents

1 C Enumerations and Boilerplate Code

The C programming language, lacking introspection and metaprogramming, forces programmers to write boilerplate code for common tasks. This repetitive code is a fertile source of bugs throughout a program's lifetime. Boilerplate magnifies changes, multiplying the number of places to manually update, and multiplying opportunities for mistake.1

Below is a common C pattern involving boilerplate code: assigning numbers to strings. Numbers, being small and simple, make for fast and efficient code; but strings, being easily read and understood by humans, make for more intuitive user interfaces. Thus, boilerplate code translates between the two.

typedef enum { Foo, Bar, Baz, } type;
char *TypeToString(type Type) {
    switch (Type) {
    case Foo: return "Foo";
    case Bar: return "Bar";
    case Baz: return "Baz";
    }
    return "Unknown";
}
type StringToType(char *String) {
    if (!strcmp("Foo", String)) return Foo;
    if (!strcmp("Bar", String)) return Bar;
    if (!strcmp("Baz", String)) return Baz;
    return -1;
}

The code above defines an enumeration named type, which specifies three enumerators (Foo, Bar, and Baz), and declares two functions: one for converting a type to a string (ie char *), and the other for converting a string to a type. Converting an unspecified enumerator returns the string "Unknown", and converting an unhandled string returns the value -1.

As a program grows and matures, adding new features often adds new enumerators and enumerations, but boilerplate complicates this task. Consider adding an enumerator, a trivial task that becomes a three step process: first, add the new enumerator to the list and use it; second, add it to the TypeToString function; third, add it to the StringToType function. The programmer is focused on the latter part of the first step: implementing the feature that uses the enumerator. The remaining steps are fabricated by C itself; they are accidental to the end goal, yet are sources of bugs and are necessary for program operation.

Eliminating these last two steps looks attractive: it removes two sources of bugs, simplifies the codebase, and maximizes programmer time spent solving real problems. However, consider the overall cost: such a change takes time to design, implement, debug, and maintain. If the costs are higher than the benefits, it isn't worth doing. For example, writing or downloading a code generator would solve the problem, but they require time to design or learn, and introduce complexity into the system.2

For our problem, we focus on a lightweight solution that exploits a feature of Standard C: the C preprocessor.

2 X Macros

The C preprocessor, although notorious for confusing and unmaintainable code, is nonetheless a workable solution for our example.3 Its macro system is automatically run by the C compiler, and it is powerful enough for rudimentary code generation. Together, these avoid modifying the build system and adding extra tools or libraries. However, keeping the macros simple and maintainable requires self-restraint, otherwise the resulting complexity outweighs the benefits. The road to hell is paved with good intentions.4

We explore two code generation techniques using X macros, macros which process a source of data multiple times. The first technique holds this data in a separate file, and the second technique holds this data in a macro.

2.1 File-based X Macros

Below is an example of the file-based X macro technique. It describes two files, type.c and main.c, indicated by the C comments. type.c defines the enumerator data, and main.c processes this data multiple times in multiple ways. After preprocessing, the resultant code is identical to the code at the beginning of this article.

/* type.c */
X(Foo) X(Bar) X(Baz)

/* main.c */
typedef enum {
    #define X(Name) Name,
    #include "type.c"
    #undef X
} type;
char *TypeToString(type Type) {
    switch (Type) {
    #define X(Name) case Name: return #Name;
    #include "type.c"
    #undef X
    }
    return "Unknown";
}
type StringToType(char *String) {
    #define X(Name) if (!strcmp(#Name, String)) return Name;
    #include "type.c"
    #undef X
    return -1;
}

In type.c, each enumerator is defined as a function call to X, with the enumerator name as an argument. X is not actually a function, but a macro that takes an argument. (Although the name X is conventional, it can be named anything.)5

In main.c, X is initially defined to append a comma to the enumerator name. Then the file containing all the X calls, type.c, is included during the enumeration definition, expanding into a properly formatted enumerator list. Afterwards, X is explicitly undefined for later definitions – this isn't strictly necessary, but it suppresses compiler warnings on subsequent definitions. (In many situations, redefining an already defined macro is a bug.)

The remaining two sites operate similarly, with X defining a suitable code fragment, and #include injecting these fragments into the proper location. Note that when a parameter name is preceded by #, the preprocessor converts the argument into a quoted string.

Overall, this file-based technique is clumsy and verbose: constantly redefining macros is error prone, and requiring data in files is limiting. These downsides magnify with more instantiations, resulting in a situation worse than the original problem.

However, while untenable with many small data files, the technique works well with fewer, large data files. Furthermore, these files can be generated by external programs and shared between multiple consumers (not necessarily C programs) as a simple exchange format. Some examples include register maps, test harnesses, and command line interfaces.

For our small example, file-based X macros are simply too expensive: adding an enumerator is cheap, but adding an enumeration costs an additional file and duplicate X macros. The next technique, however, removes both of these costs.

2.2 Macro-based X Macros

Below is an example of the macro-based X macro technique. TypeList defines the enumerator data, and the other macros (XEnum, XCase, and XCmp) process this data in different ways. After preprocessing, the resultant code is identical to the code at the beginning of this article.

#define TypeList(X) X(Foo) X(Bar) X(Baz)
#define XEnum(Name) Name,
#define XCase(Name) case Name: return #Name;
#define XCmp(Name)  if (!strcmp(#Name, String)) return Name;

typedef enum { TypeList(XEnum) } type;
char *TypeToString(type Type) {
    switch (Type) { TypeList(XCase) }
    return "Unknown";
}
type StringToType(char *String) {
    TypeList(XCmp)
    return -1;
}

As with the file-based technique, each enumerator is defined as a function call to X, with the enumerator name as an argument; however, these definitions are now kept in a macro instead of a file, and the macro explicitly defines X as a parameter.6

The other macros define code fragments and behave identically to their file-based counterparts, except now they are uniquely named. Each is passed as an argument to TypeList, which inserts the given argument at every X and then expands, injecting the code fragments at the location. (This "deferred" macro expansion is reminiscent of C function pointers.)

Overall, the macro-based technique is elegant and clean: the code generation macros are defined once and are shareable, the associated data is located nearby in the same file, and each instantiation adds little boilerplate. The data macros can even be in separate files; however, the file-based technique's simpler format is superior.

For our small example, macro-based X macros work well: adding enumerators and enumerations are cheap, and the associated boilerplate is small and maintainable. It is a sufficiently cheap and effective solution to our problem.

Further reduction of boilerplate is possible, but not recommended. A "magic macro" that does everything is harder to understand, maintain, and impedes code development. In practice, one-off tweaks and adjustments are common: overriding names and handlers, hooking into other interfaces, adding "hidden" enumerators, etc. In response, these all-in-one macros either grow in complexity (eventually becoming hellish abominations), or are decomposed into simpler forms.

3 Other Uses and Limitations

X macros are a useful preprocessor technique for cleaning up repetitive code, making it more reliable and easier to maintain: string conversion with enumerations is but one example, and C has plenty to offer. As mentioned earlier, register maps, which associate names, types, and values to addresses, can benefit from X macros. Likewise, programs associating strings with functions (eg commands in an interpreter, keywords in a parser, etc) are also good candidates for X macros.

These techniques also apply outside of C, to C++, Objective-C, and any other language that uses the C preprocessor or has one that is similar enough (Verilog comes to mind, though I haven't tested it).

That being said, X macros are not a panacea: they work well in certain situations, but poorly in others. The file-based X macros, although they "solved" our example problem, actually created more problems than they solved. Likewise, we dismissed dedicated code generators for our problem, but sometimes they are precisely what you need. It all depends on the situation.

I hope this brief discussion on X macros has been helpful. Feedback and corrections are welcome via email. ✚

Footnotes

  1. In terms of security concern, the most dangerous, prevalent bug is the buffer overflow, according to MITRE's Common Weakness Enumeration (CWE) report. Just considering our small example, mistakes in boilerplate could lead to buffer overflows, out-of-bounds reads, and improper input validations – three out of the top five software weaknesses.
  2. Chuck Moore, creator of the Forth programming language, noticed that programs naturally drift towards complexity, leading to unmaintainable code. He says there is only one opposing force to complexity: you, by "keeping it simple."
  3. With its arcane syntax and quirky behaviors, the preprocessor is often the scapegoat for unreadable code; however, C is equally culpable in making confusing code. For more examples, see past entries to the International Obfuscated C Code Contest (IOCCC).
  4. This Boost example solves a problem… but does it, really?
  5. A descriptive name, like ENUM, provides better context when searching the codebase, especially when different types of X macros are used.
  6. Omitting the parameter also works, as Randy Meyers presents in his article; however, the X macros must constantly be redefined, making largely identical to the file-based technique.

Further Resources