The Text Editor sam

Rob Pike

rob@plan9.bell-labs.com

ABSTRACT

Sam is an interactive multi-file text editor intended for bitmap displays. A textual command language supplements the mouse-driven, cut-and-paste interface to make complex or repetitive editing tasks easy to specify. The language is characterized by the composition of regular expressions to describe the structure of the text being modified. The treatment of files as a database, with changes logged as atomic transactions, guides the implementation and makes a general ‘undo’ mechanism straightforward.

Sam is implemented as two processes connected by a low-bandwidth stream, one process handling the display and the other the editing algorithms. Therefore it can run with the display process in a bitmap terminal and the editor on a local host, with both processes on a bitmap-equipped host, or with the display process in the terminal and the editor in a remote host. By suppressing the display process, it can even run without a bitmap terminal.

This paper is reprinted from Software—Practice and Experience, Vol 17, number 11, pp. 813-845, November 1987. The paper has not been updated for the Plan 9 manuals. Although Sam has not changed much since the paper was written, the system around it certainly has. Nonetheless, the description here still stands as the best introduction to the editor.

Introduction

Sam is an interactive text editor that combines cut-and-paste interactive editing with an unusual command language based on the composition of regular expressions. It is written as two programs: one, the ‘host part,’ runs on a UNIX system and implements the command language and provides file access; the other, the ‘terminal part,’ runs asynchronously on a machine with a mouse and bitmap display and supports the display and interactive editing. The host part may be even run in isolation on an ordinary terminal to edit text using the command language, much like a traditional line editor, without assistance from a mouse or display. Most often, the terminal part runs on a Blit1 terminal (actually on a Teletype DMD 5620, the production version of the Blit), whose host connection is an ordinary 9600 bps RS232 link; on the SUN computer the host and display processes run on a single machine, connected by a pipe.

Sam edits uninterpreted ASCII text. It has no facilities for multiple fonts, graphics or tables, unlike MacWrite,2 Bravo,3 Tioga4 or Lara.5 Also unlike them, it has a rich command language. (Throughout this paper, the phrase command language refers to textual commands; commands activated from the mouse form the mouse language.) Sam developed as an editor for use by programmers, and tries to join the styles of the UNIX text editor ed6,7 with that of interactive cut-and-paste editors by providing a comfortable mouse-driven interface to a program with a solid command language driven by regular expressions. The command language developed more than the mouse language, and acquired a notation for describing the structure of files more richly than as a sequence of lines, using a dataflow-like syntax for specifying changes.

The interactive style was influenced by jim,1 an early cut-and-paste editor for the Blit, and by mux,8 the Blit window system. Mux merges the original Blit window system, mpx,1 with cut-and-paste editing, forming something like a multiplexed version of jim that edits the output of (and input to) command sessions rather than files.

The first part of this paper describes the command language, then the mouse language, and explains how they interact. That is followed by a description of the implementation, first of the host part, then of the terminal part. A principle that influenced the design of sam is that it should have no explicit limits, such as upper limits on file size or line length. A secondary consideration is that it be efficient. To honor these two goals together requires a method for efficiently manipulating huge strings (files) without breaking them into lines, perhaps while making thousands of changes under control of the command language. Sam’s method is to treat the file as a transaction database, implementing changes as atomic updates. These updates may be unwound easily to ‘undo’ changes. Efficiency is achieved through a collection of caches that minimizes disc traffic and data motion, both within the two parts of the program and between them.

The terminal part of sam is fairly straightforward. More interesting is how the two halves of the editor stay synchronized when either half may initiate a change. This is achieved through a data structure that organizes the communications and is maintained in parallel by both halves.

The last part of the paper chronicles the writing of sam and discusses the lessons that were learned through its development and use.

The paper is long, but is composed largely of two papers of reasonable length: a description of the user interface of sam and a discussion of its implementation. They are combined because the implementation is strongly influenced by the user interface, and vice versa.

The Interface

Sam is a text editor for multiple files. File names may be provided when it is invoked:

sam file1 file2 ...

and there are commands to add new files and discard unneeded ones. Files are not read until necessary to complete some command. Editing operations apply to an internal copy made when the file is read; the UNIX file associated with the copy is changed only by an explicit command. To simplify the discussion, the internal copy is here called a file, while the disc-resident original is called a disc file.

Sam is usually connected to a bitmap display that presents a cut-and-paste editor driven by the mouse. In this mode, the command language is still available: text typed in a special window, called the sam window, is interpreted as commands to be executed in the current file. Cut-and-paste editing may be used in any window — even in the sam window to construct commands. The other mode of operation, invoked by starting sam with the option -d (for ‘no download’), does not use the mouse or bitmap display, but still permits editing using the textual command language, even on an ordinary terminal, interactively or from a script.

The following sections describe first the command language (under sam\fP-d and in the sam window), and then the mouse interface. These two languages are nearly independent, but connect through the current text, described below.

The Command Language

A file consists of its contents, which are an array of characters (that is, a string); the name of the associated disc file; the modified bit that states whether the contents match those of the disc file; and a substring of the contents, called the current text or dot (see Figures 1 and 2). If the current text is a null string, dot falls between characters. The value of dot is the location of the current text; the contents of dot are the characters it contains. Sam imparts to the text no two-dimensional interpretation such as columns or fields; text is always one-dimensional. Even the idea of a ‘line’ of text as understood by most UNIX programs — a sequence of characters terminated by a newline character — is only weakly supported.

The current file is the file to which editing commands refer. The current text is therefore dot in the current file. If a command doesn’t explicitly name a particular file or piece of text, the command is assumed to apply to the current text. For the moment, ignore the presence of multiple files and consider editing a single file.

Figure 1. A typical sam screen, with the editing menu presented. The sam (command language) window is in the middle, with file windows above and below. (The user interface makes it easy to create these abutting windows.) The partially obscured window is a third file window. The uppermost window is that to which typing and mouse operations apply, as indicated by its heavy border. Each window has its current text highlighted in reverse video. The sam window’s current text is the null string on the last visible line, indicated by a vertical bar. See also Figure 2.

Commands have one-letter names. Except for non-editing commands such as writing the file to disc, most commands make some change to the text in dot and leave dot set to the text resulting from the change. For example, the delete command, d, deletes the text in dot, replacing it by the null string and setting dot to the result. The change command, c, replaces dot by text delimited by an arbitrary punctuation character, conventionally a slash. Thus,

c/Peter/

replaces the text in dot by the string Peter. Similarly,

a/Peter/

(append) adds the string after dot, and

i/Peter/

(insert) inserts before dot. All three leave dot set to the new text, Peter.

Newlines are part of the syntax of commands: the newline character lexically terminates a command. Within the inserted text, however, newlines are never implicit. But since it is often convenient to insert multiple lines of text, sam has a special syntax for that case:

a

some lines of text

to be inserted in the file,

Sam is usually connected to a bitmap display that presents a cut-and-paste editor driven by the mouse. In this mode, the command language is still available: text typed in a special window, called the sam window, is interpreted as commands to be executed in the current file. Cut-and-paste editing may be used in any window — even in the sam window to construct commands. The other mode of operation, invoked by starting sam with the option -d (for ‘no download’), does not use the mouse or bitmap display, but still permits editing using the textual command language, even on an ordinary terminal, interactively or from a script.

The following sections describe first the command language (under sam\fP-d and in the sam window), and then the mouse interface. These two languages are nearly independent, but connect through the current text, described below.

The Command Language

A file consists of its contents, which are an array of characters (that is, a string); the name of the associated disc file; the modified bit that states whether the contents match those of the disc file; and a substring of the contents, called the current text or dot (see Figures 1 and 2). If the current text is a null string, dot falls between characters. The value of dot is the location of the current text; the contents of dot are the characters it contains. Sam imparts to the text no two-dimensional interpretation such as columns or fields; text is always one-dimensional. Even the idea of a ‘line’ of text as understood by most UNIX programs — a sequence of characters terminated by a newline character — is only weakly supported.

The current file is the file to which editing commands refer. The current text is therefore dot in the current file. If a command doesn’t explicitly name a particular file or piece of text, the command is assumed to apply to the current text. For the moment, ignore the presence of multiple files and consider editing a single file.

Figure 1. A typical sam screen, with the editing menu presented. The sam (command language) window is in the middle, with file windows above and below. (The user interface makes it easy to create these abutting windows.) The partially obscured window is a third file window. The uppermost window is that to which typing and mouse operations apply, as indicated by its heavy border. Each window has its current text highlighted in reverse video. The sam window’s current text is the null string on the last visible line, indicated by a vertical bar. See also Figure 2.

Commands have one-letter names. Except for non-editing commands such as writing the file to disc, most commands make some change to the text in dot and leave dot set to the text resulting from the change. For example, the delete command, d, deletes the text in dot, replacing it by the null string and setting dot to the result. The change command, c, replaces dot by text delimited by an arbitrary punctuation character, conventionally a slash. Thus,

c/Peter/

replaces the text in dot by the string Peter. Similarly,

a/Peter/

(append) adds the string after dot, and

i/Peter/

(insert) inserts before dot. All three leave dot set to the new text, Peter.

Newlines are part of the syntax of commands: the newline character lexically terminates a command. Within the inserted text, however, newlines are never implicit. But since it is often convenient to insert multiple lines of text, sam has a special syntax for that case:

a