Acid: A Debugger Built From A Language

Phil Winterbottom

philw@plan9.bell-labs.com

ABSTRACT

Acid is an unusual source-level symbolic debugger for Plan 9. It is implemented as a language interpreter with specialized primitives that provide debugger support. Programs written in the language manipulate one or more target processes; variables in the language represent the symbols, state, and resources of those processes. This structure allows complex interaction between the debugger and the target program and provides a convenient method of parameterizing differences between machine architectures. Although some effort is required to learn the debugging language, the richness and flexibility of the debugging environment encourages new ways of reasoning about the way programs run and the conditions under which they fail.

1. Introduction

The size and complexity of programs have increased in proportion to processor speed and memory but the interface between debugger and programmer has changed little. Graphical user interfaces have eased some of the tedious aspects of the interaction. A graphical interface is a convenient means for navigating through source and data structures but provides little benefit for process control. The introduction of a new concurrent language, Alef [Win93], emphasized the inadequacies of the existing Plan 9 [Pike90] debugger db, a distant relative of adb, and made it clear that a new debugger was required.

Current debuggers like dbx, sdb, and gdb are limited to answering only the questions their authors envisage. As a result, they supply a plethora of specialized commands, each attempting to anticipate a specific question a user may ask. When a debugging situation arises that is beyond the scope of the command set, the tool is useless. Further, it is often tedious or impossible to reproduce an anomalous state of the program, especially when the state is embedded in the program’s data structures.

Acid applies some ideas found in CAD software used for hardware test and simulation. It is based on the notion that the state and resources of a program are best represented and manipulated by a language. The state and resources, such as memory, registers, variables, type information and source code are represented by variables in the language. Expressions provide a computation mechanism and control statements allow repetitive or selective interpretation based on the result of expression evaluation. The heart of the Acid debugger is an interpreter for a small typeless language whose operators mirror the operations of C and Alef, which in turn correspond well to the basic operations of the machine. The interpreter itself knows nothing of the underlying hardware; it deals with the program state and resources in the abstract. Fundamental routines to control processes, read files, and interface to the system are implemented as builtin functions available to the interpreter. The actual debugger functionality is coded in Acid; commands are implemented as Acid functions.

This language-based approach has several advantages. Most importantly, programs written in Acid, including most of the debugger itself, are inherently portable. Furthermore, Acid avoids the limitations other debuggers impose when debugging parallel programs. Instead of embedding a fixed process model in the debugger, Acid allows the programmer to adapt the debugger to handle an arbitrary process partitioning or program structure. The ability to interact dynamically with an executing process provides clear advantages over debuggers constrained to probe a static image. Finally, the Acid language is a powerful vehicle for expressing assertions about logic, process state, and the contents of data structures. When combined with dynamic interaction it allows a limited form of automated program verification without requiring modification or recompilation of the source code. The language is also an excellent vehicle for preserving a test suite for later regression testing.

The debugger may be customized by its users; standard functions may be modified or extended to suit a particular application or preference. For example, the kernel developers in our group require a command set supporting assembler-level debugging while the application programmers prefer source-level functionality. Although the default library is biased toward assembler-level debugging, it is easily modified to provide a convenient source-level interface. The debugger itself does not change; the user combines primitives and existing Acid functions in different ways to implement the desired interface.

2. Related Work

DUEL [Gol93], an extension to gdb [Stal91], proposes using a high level expression evaluator to solve some of these problems. The evaluator provides iterators to loop over data structures and conditionals to control evaluation of expressions. The author shows that complex state queries can be formulated by combining concise expressions but this only addresses part of the problem. A program is a dynamic entity; questions asked when the program is in a static state are meaningful only after the program has been ‘caught’ in that state. The framework for manipulating the program is still as primitive as the underlying debugger. While DUEL provides a means to probe data structures it entirely neglects the most beneficial aspect of debugging languages: the ability to control processes. Acid is structured around a thread of control that passes between the interpreter and the target program.

The NeD debugger [May92] is a set of extensions to TCL [Ous90] that provide debugging primitives. The resulting language, NeDtcl, is used to implement a portable interface between a conventional debugger, pdb [May90], and a server that executes NeDtcl programs operating on the target program. Execution of the NeDtcl programs implements the debugging primitives that pdb expects. NeD is targeted at multi-process debugging across a network, and proves the flexibility of a language as a means of communication between debugging tools. Whereas NeD provides an interface between a conventional debugger and the process it debugs, Acid is the debugger itself. While NeD has some of the ideas found in Acid it is targeted toward a different purpose. Acid seeks to integrate the manipulation of a program’s resources into the debugger while NeD provides a flexible interconnect between components of the debugging environment. The choice of TCL is appropriate for its use in NeD but is not suitable for Acid. Acid relies on the coupling of the type system with expression evaluation, which are the root of its design, to provide the debugging primitives.

Dalek [Ols90] is an event based language extension to gdb. State transitions in the target program cause events to be queued for processing by the debugging language.

Acid has many of the advantages of same process or local agent debuggers, like Parasight [Aral], without the need for dynamic linking or shared memory. Acid improves on the ideas of these other systems by completely integrating all aspects of the debugging process into the language environment. Of particular importance is the relationship between Acid variables, program symbols, source code, registers and type information. This integration is made possible by the design of the Acid language.

Interpreted languages such as Lisp and Smalltalk are able to provide richer debugging environments through more complete information than their compiled counterparts. Acid is a means to gather and represent similar information about compiled programs through cooperation with the compilation tools and library implementers.

3. Acid the Language

Acid is a small interpreted language targeted to its debugging task. It focuses on representing program state and addressing data rather than expressing complex computations. Program state is addressable from an Acid program. In addition to parsing and executing expressions and providing an architecture-independent interface to the target process, the interpreter supplies a mark-and-scan garbage collector to manage storage.

Every Acid session begins with the loading of the Acid libraries. These libraries contain functions, written in Acid, that provide a standard debugging environment including breakpoint management, stepping by instruction or statement, stack tracing, and access to variables, memory, and registers. The library contains 600 lines of Acid code and provides functionality similar to dbx. Following the loading of the system library, Acid loads user-specified libraries; this load sequence allows the user to augment or override the standard commands to customize the debugging environment. When all libraries are loaded, Acid issues an interactive prompt and begins evaluating expressions entered by the user. The Acid ‘commands’ are actually invocations of builtin primitives or previously defined Acid functions. Acid evaluates each expression as it is entered and prints the result.

4. Types and Variables

Acid variables are of four basic types: integer, string, float, and list. The type of a variable is inferred by the type of the right-hand side of an assignment expression. Many of the operators can be applied to more than one type; for these operators the action of the operator is determined by the type of its operands. For example, the + operator adds integer and float operands, and concatenates string and list operands. Lists are the only complex type in Acid; there are no arrays, structures or pointers. Operators provide head, tail, append and delete operations. Lists can also be indexed like arrays.

Acid has two levels of scope: global and local. Function parameters and variables declared in a function body using the local keyword are created at entry to the function and exist for the lifetime of a function. Global variables are created by assignment and need not be declared. All variables and functions in the program being debugged are entered in the Acid symbol table as global variables during Acid initialization. Conflicting variable names are resolved by prefixing enough ‘$’ characters to make them unique. Syntactically, Acid variables and target program symbols are referenced identically. However, the variables are managed differently in the Acid symbol table and the user must be aware of this distinction. The value of an Acid variable is stored in the symbol table; a reference returns the value. The symbol table entry for a variable or function in the target program contains the address of that symbol in the image of the program. Thus, the value of a program variable is accessed by indirect reference through the Acid variable that has the same name; the value of an Acid variable is the address of the corresponding program variable.

5. Control Flow

The while and loop statements implement looping. The former is similar to the same statement in C. The latter evaluates starting and ending expressions yielding integers and iterates while an incrementing loop index is within the bounds of those expressions.

acid: i = 0; loop 1,5 do print(i=i+1)

0x00000001

0x00000002

0x00000003

0x00000004

0x00000005

acid:

The traditional if-then-else statement implements conditional execution.

6. Addressing

Two indirection operators allow Acid to access values in the program being debugged. The * operator fetches a value from the memory image of an executing process; the @ operator fetches a value from the text file of the process. When either operator appears on the left side of an assignment, the value is written rather than read.

The indirection operator must know the size of the object referenced by a variable. The Plan 9 compilers neglect to include this information in the program symbol table, so Acid cannot derive this information implicitly. Instead Acid variables have formats. The format is a code letter specifying the printing style and the effect of some of the operators on that variable. The indirection operators look at the format code to determine the number of bytes to read or write. The format codes are derived from the format letters used by db. By default, symbol table variables and numeric constants are assigned the format code ’X’ which specifies 32-bit hexadecimal. Printing such a variable yields output of the form 0x00123456. An indirect reference through the variable fetches 32 bits of data at the address indicated by the variable. Other formats specify various data types, /p>

Every Acid session begins with the loading of the Acid libraries. These libraries contain functions, written in Acid, that provide a standard debugging environment including breakpoint management, stepping by instruction or statement, stack tracing, and access to variables, memory, and registers. The library contains 600 lines of Acid code and provides functionality similar to dbx. Following the loading of the system library, Acid loads user-specified libraries; this load sequence allows the user to augment or override the standard commands to customize the debugging environment. When all libraries are loaded, Acid issues an interactive prompt and begins evaluating expressions entered by the user. The Acid ‘commands’ are actually invocations of builtin primitives or previously defined Acid functions. Acid evaluates each expression as it is entered and prints the result.

4. Types and Variables

Acid variables are of four basic types: integer, string, float, and list. The type of a variable is inferred by the type of the right-hand side of an assignment expression. Many of the operators can be applied to more than one type; for these operators the action of the operator is determined by the type of its operands. For example, the + operator adds integer and float operands, and concatenates string and list operands. Lists are the only complex type in Acid; there are no arrays, structures or pointers. Operators provide head, tail, append and delete operations. Lists can also be indexed like arrays.

Acid has two levels of scope: global and local. Function parameters and variables declared in a function body using the local keyword are created at entry to the function and exist for the lifetime of a function. Global variables are created by assignment and need not be declared. All variables and functions in the program being debugged are entered in the Acid symbol table as global variables during Acid initialization. Conflicting variable names are resolved by prefixing enough ‘$’ characters to make them unique. Syntactically, Acid variables and target program symbols are referenced identically. However, the variables are managed differently in the Acid symbol table and the user must be aware of this distinction. The value of an Acid variable is stored in the symbol table; a reference returns the value. The symbol table entry for a variable or function in the target program contains the address of that symbol in the image of the program. Thus, the value of a program variable is accessed by indirect reference through the Acid variable that has the same name; the value of an Acid variable is the address of the corresponding program variable.

5. Control Flow

The while and loop statements implement looping. The former is similar to the same statement in C. The latter evaluates starting and ending expressions yielding integers and iterates while an incrementing loop index is within the bounds of those expressions.

acid: i = 0; loop 1,5 do print(i=i+1)

0x00000001

0x00000002

0x00000003

0x00000004

0x00000005

acid:

The traditional if-then-else statement implements conditional execution.

6. Addressing

Two indirection operators allow Acid to access values in the program being debugged. The * operator fetches a value from the memory image of an executing process; the @ operator fetches a value from the text file of the process. When either operator appears on the left side of an assignment, the value is written rather than read.

The indirection operator must know the size of the object referenced by a variable. The Plan 9 compilers neglect to