Plan 9 C Compilers

Ken Thompson

ken@plan9.bell-labs.com

ABSTRACT

This paper describes the overall structure and function of the Plan 9 C compilers. A more detailed implementation document for any one of the compilers is yet to be written.

1. Introduction

There are many compilers in the series. Six of the compilers (MIPS 3000, SPARC, Intel 386, Power PC, DEC Alpha, and Motorola 68020) are considered active and are used to compile current versions of Plan 9. Several others (Motorola 68000, Intel 960, ARM 7500, AMD 29000) have had only limited use, such as to program peripherals or experimental devices.

2. Structure

The compiler is a single program that produces an object file. Combined in the compiler are the traditional roles of preprocessor, lexical analyzer, parser, code generator, local optimizer, and first half of the assembler. The object files are binary forms of assembly language, similar to what might be passed between the first and second passes of an assembler.

Object files and libraries are combined by a loader program to produce the executable binary. The loader combines the roles of second half of the assembler, global optimizer, and loader. The names of the compliers, loaders, and assemblers are as follows:

SPARC   kc  kl  ka

Power   PC  qc  ql

MIPS    vc  vl  va

Motorola    68000  1c  1l

Motorola    68020  2c  2l

ARM 7500  5c  5l

Intel   960  6c  6l

DEC Alpha  7c  7l

Intel   386  8c  8l

AMD 29000  9c  9l

There is a further breakdown in the source of the compilers into object-independent and object-dependent parts. All of the object-independent parts are combined into source files in the directory /sys/src/cmd/cc. The object-dependent parts are collected in a separate directory for each compiler, for example /sys/src/cmd/vc. All of the code, both object-independent and object-dependent, is machine-independent and may be cross-compiled and executed on any of the architectures.

3. The Language

The compiler implements ANSI C with some restrictions and extensions [ANSI90]. Most of the restrictions are due to personal preference, while most of the extensions were to help in the implementation of Plan 9. There are other departures from the standard, particularly in the libraries, that are beyond the scope of this paper.

3.1. Register, volatile, const

The keyword register is recognized syntactically but is semantically ignored. Thus taking the address of a register variable is not diagnosed. The keyword volatile disables all optimizations, in particular registerization, of the corresponding variable. The keyword const generates warnings (if warnings are enabled by the compiler’s -w option) of non-constant use of the variable, but does not affect the generated code.

3.2. The preprocessor

The C preprocessor is probably the biggest departure from the ANSI standard.

The preprocessor built into the Plan 9 compilers does not support #if, although it does handle #ifdef and #include. If it is necessary to be more standard, the source text can first be run through the separate ANSI C preprocessor, cpp.

3.3. Unnamed substructures

The most important and most heavily used of the extensions is the declaration of an unnamed substructure or subunion. For example:

    typedef

    struct  lock

    {

        int    locked;

    } Lock;

    typedef

    struct  node

    {

        int type;

        union

        {

            double dval;

            float  fval;

            long   lval;

        };

        Lock;

    } Node;

    Lock*   lock;

    Node*   node;

The declaration of Node has an unnamed substructure of type Lock and an unnamed subunion. One use of this feature allows references to elements of the subunit to be accessed as if they were in the outer structure. Thus node->dval and node->locked are legitimate references.

When an outer structure is used in a context that is only legal for an unnamed substructure, the compiler promotes the reference to the unnamed substructure. This is true for references to structures and to references to pointers to structures. This happens in assignment statements and in argument passing where prototypes have been declared. Thus, continuing with the example,

    lock = node;

would assign a pointer to the unnamed Lock in the Node to the variable lock. Another example,

    extern void lock(Lock*);

    func(...)

    {

        ...

        lock(node);

        ...

    }

will pass a pointer to the Lock substructure.

Finally, in places where context is insufficient to identify the unnamed structure, the type name (it must be a typedef) of the unnamed structure can be used as an identifier. In our example, &node->Lock gives the address of the anonymous Lock structure.

3.4. Structure displays

A structure cast followed by a list of expressions in braces is an expression with the type of the structure and elements assigned from the corresponding list. Structures are now almost first-class citizens of the language. It is common to see code like this:

    r = (Rectangle){point1, (Point){x,y+2}};

3.5. Initialization indexes

In initializers of arrays, one may place a constant expression in square brackets before an initializer. This causes the next initializer to assign the indicated element. For example:

    enum    errors

    {

        Etoobig,

        Ealarm,

        Egreg

    };

    char* errstrings[] =

    {

        [Ealarm]    "Alarm call",

        [Egreg] "Panic: out of mbufs",

        [Etoobig]   "Arg list too long",

    };

In the same way, individual structures members may be initialized in any order by preceding the initialization with .tagname. Both forms allow an optional =, to be compatible with a proposed extension to ANSI C.

3.6. External register

The declaration extern register will dedicate a register to a variable on a global basis. It can be used only under special circumstances. External register variables must be identically declared in all modules and libraries. The feature is not intended for efficiency, although it can produce efficient code; rather it represents a unique storage class that would be hard to get any other way. On a shared-memory multi-processor, an external register is one-per-processor and neither one-per-procedure (automatic) or one-per-system (external). It is used for two variables in the Plan 9 kernel, u and m. U is a pointer to the structure representing the currently running process and m is a pointer to the per-machine data structure.

3.7. Long long

The compilers accept long long as a basic type meaning 64-bit integer. On all of the machines this type is synthesized from 32-bit instructions.

3.8. Pragma

The compilers accept #pragma lib libname and pass the librargin-top: 0; margin-bottom: 0; text-align: justify;">         ...

    }

will pass a pointer to the Lock substructure.

Finally, in places where context is insufficient to identify the unnamed structure, the type name (it must be a typedef) of the unnamed structure can be used as an identifier. In our example, &node->Lock gives the address of the anonymous Lock structure.

3.4. Structure displays

A structure cast followed by a list of expressions in braces is an expression with the type of the structure and elements assigned from the corresponding list. Structures are now almost first-class citizens of the language. It is common to see code like this:

    r = (Rectangle){point1, (Point){x,y+2}};

3.5. Initialization indexes

In initializers of arrays, one may place a constant expression in square brackets before an initializer. This causes the next initializer to assign the indicated element. For example:

    enum    errors

    {

        Etoobig,

        Ealarm,

        Egreg

    };

    char* errstrings[] =

    {

        [Ealarm]    "Alarm call",

        [Egreg] "Panic: out of mbufs",

        [Etoobig]   "Arg list too long",

    };

In the same way, individual structures members may be initialized in any order by preceding the initialization with .tagname<