Adding Application Support for a New Architecture in Plan 9
Plan 9 has five classes of architecture-dependent software: headers, kernels, compilers and loaders, the libc system library, and a few application programs. In general, architecture-dependent programs consist of a portable part shared by all architectures and a processor-specific portion for each supported architecture. The portable code is often compiled and stored in a library associated with each architecture. A program is built by compiling the architecture-specific code and loading it with the library. Support for a new architecture is provided by building a compiler for the architecture, using it to compile the portable code into libraries, writing the architecture-specific code, and then loading that code with the libraries.
This document describes the organization of the architecture-dependent code and headers on Plan 9. The first section briefly discusses the layout of the headers and the source code for the kernels, compilers, loaders, and the system library, libc. The second section provides a detailed discussion of the structure of libmach, a library containing almost all architecture-dependent code used by application programs. The final section describes the steps required to add application program support for a new architecture.
Architecture-dependent information for the new processor is stored in the directory tree rooted at /m where m is the name of the new architecture (e.g., mips). The new directory should be initialized with several important subdirectories, notably bin, include, and lib. The directory tree of an existing architecture serves as a good model for the new tree. The architecture-dependent mkfile must be stored in the newly created root directory for the architecture. It is easiest to copy the mkfile for an existing architecture and modify it for the new architecture. When the mkfile is correct, change the OS and CPUS variables in the /sys/src/mkfile.proto to reflect the addition of the new architecture.
Architecture-dependent headers are stored in directory /m/include where m is the name of the architecture (e.g., mips). Two header files are required: u.h and ureg.h. The first defines fundamental data types, bit settings for the floating point status and control registers, and va_list processing which depends on the stack model for the architecture. This file is best built by copying and modifying the u.h file from an architecture with a similar stack model. The ureg.h file contains a structure describing the layout of the saved register set for the architecture; it is defined by the kernel.
Header file /sys/include/a.out.h contains the definitions of the magic numbers used to identify executables for each architecture. When support for a new architecture is added, the magic number for the architecture must be added to this file.
The header format of a bootable executable is defined by each manufacturer. Header file /sys/include/bootexec.h contains structures describing the headers currently supported. If the new architecture uses a common header such as COFF, the header format is probably already defined, but if the bootable header format is non-standard, a structure defining the format must be added to this file.
Although the kernel depends critically on the properties of the underlying hardware, most of the higher-level kernel functions, including process management, paging, pseudo-devices, and some networking code, are independent of processor architecture. The portable kernel code is divided into two parts: that implementing kernel functions and that devoted to the boot process. Code in the first class is stored in directory /sys/src/9/port and the portable boot code is stored in /sys/src/9/boot. Architecture-dependent kernel code is stored in the subdirectories of /sys/src/9 named for each architecture.
The relationship between the kernel code and the boot code is convoluted and subtle. The portable boot code is compiled into a library for each architecture. An architecture-specific main program is loaded with the appropriate library and the resulting executable is compiled into the kernel where it is executed as a user process during the final stages of kernel initialization. The boot process performs authentication, attaches the name space root to the appropriate file system and starts the init process.
The organization of the portable kernel source code differs from that of most other architecture-specific code. Instead of storing the portable code in a library and loading it with the architecture-specific code, the portable code is compiled directly into the directory containing the architecture-specific code and linked with the object files built from the source in that directory.
Compilers and Loaders
The compiler source code conforms to the usual organization: portable code is compiled into a library for each architecture and the architecture-dependent code is loaded with that library. The common compiler code is stored in /sys/src/cmd/cc. The mkfile in this directory compiles the portable source and archives the objects in a library for each architecture. The architecture-specific compiler source is stored in a subdirectory of /sys/src/cmd with the same name as the compiler (e.g., /sys/src/cmd/vc).
There is no portable code shared by the loaders. Each directory of loader source code is self-contained, except for a header file and an instruction name table included from the directory of the associated compiler.
Most C library modules are portable; the source code is stored in directories /sys/src/libc/port and /sys/src/libc/9sys. Architecture-dependent library code is stored in the subdirectory of /sys/src/libc named the same as the target processor. Non-portable functions not only implement architecture-dependent operations but also supply assembly language implementations of functions where speed is critical. Directory /sys/src/libc/9syscall is unusual because it contains architecture-dependent information for all architectures. It holds only a header file defining the names and numbers of system calls and a mkfile. The mkfile executes an rc script that parses the header file, constructs assembler language functions implementing the system call for each architecture, assembles the code, and archives the object files in libc. The assembler language syntax and the system interface differ for each architecture. The rc script in this mkfile must be modified to support a new architecture.
Application programs process two forms of architecture-dependent information: executable images and intermediate object files. Almost all processing is on executable files. System library libmach provides functions that convert architecture-specific data to a portable format so application programs can process this data independent of its underlying representation. Further, when a new architecture is implemented almost all code changes are confined to the library; most affected application programs need only be reloaded. The source code for the library is stored in /sys/src/libmach.
An application program running on one type of processor must be able to interpret architecture-dependent information for all supported processors. For example, a debugger must be able to debug the executables of all architectures, not just the architecture on which it is executing, since /proc may be imported from a different machine.
A small part of the application library provides functions to extract symbol references from object files. The remainder provides the following processing of executable files or memory images:
∙ Header interpretation.
∙ Symbol table interpretation.
∙ Execution context interpretation, such as stack traces and stack frame location.
∙ Instruction interpretation including disassembly and instruction size and follow-set calculations.
∙ Exception and floating point number interpretation.
∙ Architecture-independent read and write access through a relocation map.
Header file /sys/include/mach.h defines the interfaces to the application library. Manual pages mach(2), symbol(2), and object(2) describe the details of the library functions.
Two data structures, called Mach and Machdata, contain architecture-dependent parameters and a jump table of functions. Global variables mach and machdata point to the Mach and Machdata data structures associated with the target architecture. An application determines the target architecture of a file or executable image, sets the global pointers to the data structures associated with that architecture, and subsequently performs all references indirectly through the pointers. As a result, direct references to the tables for each architecture are avoided and the application code intrinsically supports all architectures (though only one at a time).
Object file processing is handled similarly: architecture-dependent functions identify and decode the intermediate files for the processor. The application indirectly invokes a classification function to identify the architecture of the object code and to select the appropriate decoding function. Subsequent calls then use that function to decode each record. Again, the layer of indirection allows the application code to support all architectures without modification.
Splitting the architecture-dependent information between the Mach and Machdata data structures allows applications to choose an appropriate level of service. Even though an application does not directly reference the architecture-specific data structures, it must load the architecture-dependent tables and code for all architectures it supports. The size of this data can be substantial and many applications do not require the full range of architecture-dependent functionality. For example, the size command does not require the disassemblers for every architecture; it only needs to decode the header. The Mach data structure contains a few architecture-specific parameters and a description of the processor register set. The size of the structure varies with the size of the register set but is generally small. The Machdata data structure contains a jump table of architecture-dependent functions; the amount of code and data referenced by this table is usually large.
Libmach Source Code Organization
The libmach library provides four classes of functionality:
Header and Symbol Table Decoding - Files executable.c and sym.c contain code to interpret the header and symbol tables of an executable file or executing image. Function crackhdr decodes the header, reformats the information into an Fhdr data structure, and points global variable mach to the Mach data structure of the target architecture. The symbol table processing uses the data in the Fhdr structure to decode the symbol table. A variety of symbol table access functions then support queries on the reformatted table.
Debugger Support - Files named m.c, where m is the code letter assigned to the architecture, contain the initialized Mach data structure and the definition of the register set for each architecture. Architecture-specific debugger support functions and an initialized Machdata structure are stored in files named mdb.c. Files machdata.c and setmach.c contain debugger support functions shared by multiple architectures.
Architecture-Independent Access - Files map.c, access.c, and swap.c provide accesses through a relocation map to data in an executable file or executing image. Byte-swapping is performed as needed. Global variables mach and machdata must point to the Mach and Machdata data structures of the target architecture.
Object File Interpretation - These files contain functions to identify the target architecture of an intermediate object file and extract references to symbols. File obj.c contains code common to all architectures; file mobj.c contains the architecture-specific source code for the machine with code character m.
The Machdata data structure is primarily a jump table of architecture-dependent debugger support functions. Functions select the Machdata structure for a target architecture based on the value of the type code in the Fhdr structure or the name of the architecture. The jump table provides functions to swap bytes, interpret machine instructions, perform stack traces, find stack frames, format floating point numbers, and decode machine exceptions. Some functions, such as machine exception decoding, are idiosyncratic and must be supplied for each architecture. Others depend on the compiler run-time model and several architectures may share code common to a model. For example, many architectures share the code to process the fixed-frame stack model implemented by several of the compilers. Finally, some functions, such as byte-swapping, provide a general capability and the jump table need only select an implementation appropriate to the architecture.
Adding Application Support for a New Architecture
This section describes the steps required to add application-level support for a new architecture. We assume the kernel, compilers, loaders and system libraries for the new architecture are already in place. This implies that a code-character has been assigned and that the architecture-specific headers have been updated. With the exception of two programs, application-level changes are confined to header files and the source code in /sys/src/libmach.
1. Begin by updating the application library header file in /sys/include/mach.h. Add the following symbolic codes to the enum statement near the beginning of the file:
∙ The processor type code, e.g., MSPARC.
∙ The type of the executable. There are usually two codes needed: one for a bootable executable (i.e., a kernel) and one for an application executable.
∙ The disassembler type code. Add one entry for each supported disassembler for the architecture.
∙ A symbolic code for the object file.
2. In a file name /sys/src/libmach/m.c (where m is the identifier character assigned to the architecture), initialize Reglist and Mach data structures with values defining the register set and various system parameters. The source file for a similar architecture can serve as template. Most of the fields of the Mach data structure are obvious but a few require further explanation.
kbase - This field contains the address of the kernel ublock. The debuggers assume the first entry of the kernel ublock points to the Proc structure for a kernel thread.
ktmask - This field is a bit mask used to calculate the kernel text address from the kernel ublock address. The first page of the kernel text segment is calculated by ANDing the negation of this mask with kbase.
kspoff - This field contains the byte offset in the Proc data structure to the saved kernel stack pointer for a suspended kernel thread. This is the offset to the sched.sp field of a Proc table entry.