This article describes a COBOL compiler and run-time interpreter for teaching COBOL in a computer-aided learning environment. The compiler converts COBOL source code into an intermediate pseudo code that is executed by the run-time interpreter. This package is the first implementation of COBOL written in BASIC for an unexpanded BBC Micro with a 32K memory.
COBOL is a high level computer programming language for implementing commercial applications. This report describes the work carried out to implement and test a COBOL compiler and run time interpreter for teaching COBOL in a computer-aided learning environment. The compiler converts COBOL source code into an intermediate pseudo code that is executed by the run time interpreter. Both the compiler and run time interpreter are written in BASIC and run on an unexpanded BBC Micro model B. This report describes the first implementation of COBOL for an unexpanded BBC Micro with a 32K memory.
A meeting at the University of Pennsylvania Computing Centre, Philadelphia in April, 1959 was held to consider the desirability and feasibility of establishing a common programming language for implementing business applications. This meeting concluded that:
A May, 1959 meeting at the Pentagon formed the Conference on Data Systems Languages (CODASYL), the organization that would produce the common language. In 1960, the US Department of Defence—at the time the largest users of computers—produced an initial specification of the Common Business Oriented Language (COBOL) giving birth to the world’s most popular commercial programming language to date.
COBOL has two features that make it particularly suitable for implementing business applications. First, COBOL is not complicated: programs are written in a form nearer to English than other high level languages. This ease of programming means that it’s easy for relatively inexperienced programmers and business users to learn. Second, records are the basis of all commercial programs be they master records in a payroll or stock records in a factory stock control system, and COBOL has powerful facilities for processing and manipulating record data structures.
Since its introduction in the early 1960s, COBOL has steadily increased in popularity until reaching its peak when 95% of all commercial applications were written in the language. Although the number of commercial applications written in COBOL has decreased to 85%, the popularity of the language is set to remain high with 75% of commercial applications still being written in COBOL by the middle of the next century.
COBOL is in a privileged position because its popularity grew in the early days of commercial programming. Because COBOL is so popular and so widespread, firms request that their computer programs are written in COBOL. Firms are reluctant to use newer languages because there is a ready supply of experienced COBOL programmers and far fewer programmers with experience of other languages. COBOL’s popularity makes it difficult for new languages to break into the commercial market. However, the decrease in the number of commercial applications being written in COBOL shows that other languages are filtering through.
The aims and objectives of this project are:
The problem is to run COBOL on a standard, unexpanded BBC Micro Model B. It is possible to run COBOL on a BBC Micro but current implementations require an Acorn Second Processor which costs approximately £150. On top of this cost is the COBOL package itself which costs between £70 and £150. This high cost puts COBOL out of reach for the majority of BBC Micro users that want to learn and use the language.
This package is aimed at users learning COBOL from scratch or improving a basic knowledge of the language. This documentation contains only a basic description of COBOL and the package should be used with books or worksheets designed for teaching COBOL.
All versions of COBOL have basic elements in common that are compulsory in all programs. COBOL programs are divided into four divisions that are written in the following order:
In full implementations of COBOL, programs must be laid out according to strict formatting rules that specify the number of spaces at the beginning of each line. Because this implementation is for teaching COBOL, the strict formatting has been relaxed to just three simple rules:
These simple formatting rules mean that users will spend less time correcting compilation errors caused by incorrect formatting and more time leaning about writing COBOL programs.
The Identification division documents the program with it’s name, it’s author and any comments, which are called remarks in COBOL:
The statements marked with an asterisk (*) are compulsory in this implementation.
Because this is a CAL package, all the items in the Identification division are compulsory to help learners document and understand their programs. The Identification division is ignored by the compiler.
The Environment division specifies the computer on which the program is to compile, the computer on which the object code program is to run and lists the peripherals needed to run the program:
In the Configuration section the Source Computer statement specifies the compiling computer and the Object Computer statement specifies the executing computer. In this implementation, the source and object computers are the same, i.e. the BBC Micro. In the File Control section of the Input Output section, the:
statement declares which files use which peripherals. There must be at least one
SELECT...ASSIGN statement that selects a disk file.
The Data division describes the format of the data records processed by the program:
The Data division is divided into the File section and the Working Storage section. The File section describes the files that are used for input and output. The Working Storage section contains the internal working data used to execute the statements in the Procedure division. The record used in this section follows the same format as the other records. The Working Storage section is compulsory because all programs require it.
The structure of the data records defined in the Data division depends on the application but they must follow a standard pattern:
The first line of a record is the file description (
FD) that specifies the external filename (
MASTER) of the record if it’s defined as a disk file. If the record is defined as a keyboard file, a filename must be supplied but it’s ignored by the compiler.
LABEL RECORDS ARE OMITTED clause has no effect in this implementation but is compulsory to make programs more like standard COBOL.
The remainder of the record is composed of the data fields. A data record can have up to 15 fields, and each field has four parts:
The level number specifies the position of the field in the record. Level numbers are explained in more detail below.
The field name is the name by which the field is referenced in the rest of the program. All field names and file descriptions must be unique within the same program because the compiler would be unable to differentiate fields and file descriptors with the same name.
The PIC keyword precedes a picture definition that specifies the length and data type of the field. Picture definitions are explained in more detail below.
The VALUE IS clause is optional and allows an initial value to be assigned to the field. If the VALUE IS clause is absent, alphabetic and alphanumeric fields will be null and numeric fields will be zero.
Each field has a level number (01, 02, 03, etc.) that specifies the position of the field within the hierarchical record structure. A level number is a two digit integer and must start with zero if it’s less than ten. Fields at the same level in the hierarchy must have the same field number. Sub-fields must have a field number higher than the fields above them in the hierarchy.
Fifteen fields are available in this implementation so the maximum level number is 15 (if steps of one are used). More fields would be available if more memory was available.
Field level numbers determine how data is moved from one field to another. Data is moved between the following records:
in three ways:
MOVE AGE TO NEW_AGEcopies the value of
MOVE NAME TO SAME_NAMEcopies
MOVE NAME TO FULL_NAMEconcatenates
SURNAMEand copies the result to
In standard COBOL, undivided fields in the working storage section have level number 77. In this implementation, all field level numbers follow the same pattern in each section. I felt a consistent field numbering system would make it easier to learn COBOL.
Each field is defined by a picture definition that specifies the number of characters the field can hold and the data type of those characters. This implementation of COBOL provides three data types: alphabetic, numeric and alphanumeric.
Alphabetic fields contain only alphabetic characters (the 26 letters of the alphabet). The number of A’s represents the number of alphabetic characters allowed in the field:
Numeric fields contain only the digits 0 to 9, an optional sign, S, and an optional decimal point, V. The number of 9s represents the number of numeric characters allowed either side of the decimal point in the field:
|9(4)||6||0000 to 9999|
|S99999||6||-99999 to +99999|
|S9999V99||5||-9999.99 to +9999.99|
Alphanumeric fields can contain alphabetic and numeric characters. The number of Xs represents the number of alphanumeric characters allowed in the field:
A string of two or more A, X or 9 characters can be replaced by a shorthand notation that specifies the number of characters in brackets following the character. For example, 9(6) is equivalent to 999999; and X(10) is equivalent to XXXXXXXXXX. However, it’s impossible to specify a decimal point using this shorthand when defining a numeric data type.
The Procedure division contains the instructions of the program that manipulate the data records specified in the Data division:
As with standard COBOL, all arithmetic statements such as Add and Multiply require that the fields they operate on are in the Working Storage section.
The COBOL statements available in the Procedure division are described using the following key:
|<file>||the name of a file|
|<word>||a sequence of characters|
|<literal>||a quoted string|
|<result>||the result of an operation which must be a <var>|
Data operating constructions:
In a full implementation of COBOL, the
AT END clause can be followed by other COBOL clauses, such as:
In this implementation, however, the
AT END clause must be followed by a
GO TO clause:
COBOL is normally run on mainframes and minicomputers. Implementing a version of COBOL for a microcomputer with a small 32K memory seems impossible. Several limitations had to be imposed to fit an implementation of COBOL into 32K, including omitting language features that are not required for novice COBOL users:
COMPUTE area=pi*r^2.have not been implemented because of memory limitations. This is not too disadvantageous because algebraic expressions, like arrays, are not essential for a CAL package because expressions such as
MULTIPLY HOURS BY RATE GIVING PAY.are more common in COBOL and more instructive when learning the language. Furthermore, algebraic expressions are not generally used in business applications.
Several other limitations were imposed:
Because the package is for teaching COBOL, only the File and Working Storage sections have been implemented. The Linkage and Report sections were not implemented to save time and memory and because the File and Working Storage sections are the most instructive of the four sections.
The package is divided into three programs: the main menu, the compiler and the run time interpreter. The main menu program enables the user to select the compiler or the run time interpreter or to exit the package. The compiler converts the user’s COBOL program into an intermediate pseudo code which is executed by the run time interpreter.
Although the package is a self contained suite of programs, it relies on some standard items of hardware and software.
The package requires the following hardware:
The package also requires the following software:
The package uses the following convention for storing files:
This convention enables files to be easily recognised on disk and enables the same file name to be used for the pseudo code file as for the source code file.
The compiler expects all source code files to be stored in the “C” directory. The compiler stores the pseudo code file in the “P” directory within which the run time interpreter expects to find all pseudo code files.
For information about disk directories refer to the Disk Filing System (DFS) manual.
When the main menu program has loaded, the user is presented with three options:
Option 1 loads the compiler. Option 2 loads the run time interpreter. Option 3 allows the user to leave the package; all loose ends are tied up and the user is returned to the familiar prompt of the BASIC environment:
To select an option, press the key of the number that corresponds to the menu option (the RETURN key is not required).
If the compiler’s data file (CDATA) or the run time interpreter’s data file (RTIDATA) is not on the disk the menu will display an error and won’t let the user continue. At this point the data files must be put on the disk and the package restarted.
When selected from the menu, the compiler loads and displays the following message on the screen:
while it loads its data file. This is a short process and the user need do nothing except watch. When the compiler has finished loading, the user must configure it by providing the following information:
As a COBOL program is compiled, each line of code is displayed as it is read from the source text file:
CAL COBOL Compiler compiling... IDENTIFICATION DIVISION. PROGRAM_ID EXAMPLE. AUTHOR JEFFREY MORGAN. DATE_WRITTEN MARCH_1989. ...
If selected, the pseudo code is displayed after each line of COBOL:
CAL COBOL Compiler compiling... IDENTIFICATION DIVISION. 1 33 PROGRAM_ID EXAMPLE. 22 -1 AUTHOR JEFFREY MORGAN. 23 -1 -1 DATE_WRITTEN MARCH_1989. 24 -1 ...
After compilation, the file name of the program, the number of compilation errors and the number of bytes of pseudo code are displayed:
------------------------------------------------------- Compilation of <C.EXAMPLE> complete. 6 compilation error(s). Pcode is 76(+) bytes long. -------------------------------------------------------
The plus in brackets reminds the user that although the pseudo code is 76 bytes long, the symbol table and the literals are also saved in the pseudo code file.
When printing the compilation of a COBOL program, the print out is preceded by the following header:
======================================================= = CAL COBOL Compiler = = = = Compilation of file <C.EXAMPLE> = = = = Documentation header = =======================================================
The header remarks—here “Documentation header”—are specified by the user.
After compiling a program, the user will be returned to the main menu. Alternatively, pressing the ESCAPE or BREAK keys when using the compiler will return the user to the main menu.
The compiler produces an error message whenever one is required:
Errors 20 to 23 are fatal and the compilation will stop when one of them occurs; otherwise, the compilation will continue until the end of the program.
The compiler does its best to continue when a compilation error occurs. However, as with most compilers, when an error does occur, several further errors may be caused by the first error. For example, if a Bad Picture error occurs, the field won’t become part of the record and when the field is referenced later in the program a No Such Variable error will occur. It’s best to debug each error as it occurs in the program because subsequent errors reported by the compiler often disappear when the first error in a program is fixed.
Error messages are displayed inside chevrons:
>>> SYNTAX ERROR <<<
In the unlikely event of an error occurring in either the compiler or the run time interpreter, the error is displayed in the following format:
Sorry, can't continue. An error has occurred in the Compiler/Run time interpreter itself. Disk changed at line 3620. >_
If this happens, the user should reload the package and start again.
When selected from the menu, the run time interpreter loads and displays the following message on the screen:
while it loads its data file. This is a short process and the user need do nothing except watch. When the run time interpreter has finished loading, the user must configure it by providing the following information:
The run time interpreter displays the result of each Display statement on the screen:
CAL COBOL Run Time Interpreter running... PROCESSING DATA FRED SMITH NEW PAY = 362.23 JIM PETERS NEW PAY = 647.57
If selected, the run time interpreter also displays the pseudo code as it is read from the pseudo code file:
running... PROCESSING DATA 14 1 -1 14 -1 FRED SMITH 14 102 -1 NEW PAY = 362.23 14 2 502 -1 14 -1 JIM PETERS 14 102 -1 NEW PAY = 647.57 14 2 502 -1 14 -1
Whenever a COBOL program requires input, the run time interpreter prompts the user:
The user should enter the required information and press the RETURN key.
When printing the execution of a COBOL program, the print out is preceded by the following header:
======================================================= = CAL COBOL Run Time Interpreter = = = = Run of file <C.EXAMPLE> = = = = Documentation header = =======================================================
The header remarks—here “Documentation header”—are specified by the user.
After running a program, the user has the option of either running another program or returning to the main menu. Alternatively, pressing the ESCAPE or BREAK keys when using the run time interpreter will return the user to the main menu.
The run time interpreter produces an error message whenever one is required:
If an error occurs when running a program, the run time interpreter displays the error, halts the execution of the program and returns the user to the main menu.
The programming problem is to input a stream of characters from a text file on disk containing a COBOL program, group the characters into the words of a line of COBOL, check the syntax of the line, add the information in the line to the symbol table and then convert the line into pseudo code. After the pseudo code and symbol table have been constructed they must be saved in a disk file, known throughout the package and this documentation as a pseudo code file.
The compiler must analyse COBOL programs as fast as possible to avoid lengthy compilation times, and must produce pseudo code that is as simple as possible to leave only the task of executing the program to the run time interpreter which, in turn, must execute the pseudo code as quickly as possible. Both the compiler and the run time interpreter must manage memory efficiently.
Several key decisions were made before implementing the package:
SELECT <file> ASSIGN TO <disk> | <keyboard>
Noise words are not essential for learning COBOL and allowing optional words might be confusing for novice users who might be confused about which words are optional. Noise words can be learned later when the user is more familiar with COBOL.
The data structure that holds the data records is an important part of the package because the compiler uses it to construct the symbol table, an internal representation used to compile the statements in the Procedure division that is also used by the run time interpreter to execute the program.
The following COBOL data record:
would be represented inside the compiler in the following table:
|Field Number||Field Name||Length||Type|
Although each field has a COBOL level number, the compiler numbers each field sequentially starting at one, as shown in the Field Number column. The name of the field as written in the COBOL source code is stored in the Field Name column. The Length column records the number of characters the field can contain. The type column records which of the three data types can be stored in the field:
Fields that head a set of sub-fields have zero length and a type that begins with the ¦ character. This character precedes the number of the first sub-field followed by the number of the last sub-field. The sub-field numbers are not the COBOL record level numbers but the compiler’s internal sequential numbering shown in the Field Number column. In the table, field 1, MASTER, has type ¦29 indicating that MASTER is the beginning of a set of sub-fields starting with field 2 and ending with field 9. Similarly, NAME has the type ¦34 because it heads sub-fields 3 to 4 and ADDRESS has the type ¦67 because it heads fields 6 to 7.
Whenever a field is defined that has a higher level number than the previous field, the type of the previous field is set to ¦** to mark it as heading a set of sub-fields. When the number of the final sub-field is known, the ¦** is updated to record the numbers of the first and last sub-field.
Each of the six available files has a string that holds the contents of all the fields in a record. Each field is located within the string using a pointer. The picture definitions in the Data division provide the data type and length of each field and enables the compiler to calculate the position of the pointer to the start of each field in the string. For example, the following pointers would be produced for the fields represented in the above table:
If the SURNAME, FORENAME, NUMBER, ROAD, AGE and PAY fields of the record held the values STEVENS, PETER, 21, HIGH ST., 37 and 127.34, respectively, the compiler would construct the following string to represent the record:
A B C D E F ↓ ↓ ↓ ↓ ↓ ↓ STEVENS,,,PETER,,,,,21,HIGH ST.,,37,127.34
A is the pointer to the start of fields 1, MASTER, 2, NAME, and 3, SURNAME. B is the pointer to the start of field 4, FORENAME. C is the pointer to the start of fields 5, ADDRESS, and 6, NUMBER. D is the pointer to the start of field 7, ROAD. E is the pointer to the start of field 8, AGE and F is the pointer to the start of field 9, PAY. The unused characters in each field are represented here by commas.
The string is 41 characters long because the FORENAME and SURNAME fields hold 10 alphanumeric characters each, the NUMBER field holds 3 alphanumeric characters, the ROAD field holds 10 alphanumeric characters, the AGE field holds 2 numeric characters and the PAY field holds 6 numeric characters.
The Procedure division is the only part of a COBOL program compiled using this implementation that produces pseudo code that is stored. The rest of a COBOL program is not redundant, however; the symbol table is constructed from the Data division, for example. For brevity, only the pseudo code produced when the Procedure division of a program is compiled will be described here; the pseudo code produced for the rest of the program follows the same pattern.
Each element of the pseudo code is described using the following key:
This implementation of COBOL has the following reserved words:
Before generating the pseudo code, each line of COBOL is reduced to remove the reserved words that are not converted into pseudo code. For example, the following line of COBOL:
is reduced to:
The reserved words By and Giving are not converted into pseudo code and are deleted. The following pseudo code is then generated:
12 607 605 602
The first number, 12, is the number of the Multiply reserved word, as listed above. The numbers 607, 605 and 602, are the variable numbers of the fields
GROSS_PAY, as calculated using the above formula.
The following program fragment—taken from the payroll test program C.PROCESS—shows each line of source code of a Procedure division and the corresponding pseudo code produced by the compiler. A plus sign (+) at the beginning of a line of pseudo code indicates that the pseudo code will be stored. Because the Procedure division statement is not an instruction that will be executed, the pseudo code for that line is not stored.
Four types of file are used by the package:
COBOL source code files are standard pure ASCII text files produced by a word processor or a text editor. They have a simple structure of one character followed by another and terminated by an end of file marker.
Pseudo code files are divided into two sections:
arranged in the following format:
<FILE n DETAILS> item specifies the structure of the records in a file in the following format:
<POINTER> specifies the start of the field in the string representation of the field, the
<TYPE> specifies the type. For example, AAA, X(20) and S99V9, and the
<LENGTH> specifies the number of characters allowed in the field.
<CONTENTS OF FILE n> item is the initial value of the fields in the record, as specified by the
VALUE IS clause in the Data division.
<FILE NAME> item is the external filename of the file.
<FILE TYPE> item specifies whether the file is an input file (-1) or an output file (1).
The structure of these files depends on the record defined in the Data division and therefore cannot be described here. The disk filing system handles all end of file markers, etc.
The compiler stores the following information in a data file:
This file has the following structure:
The run time interpreter stores the run time error messages in a data file in the following format:
The package was tested with a payroll application, which is a typical business data processing application. The payroll application was implemented by four programs:
The payroll is an elementary application that works as follows. A master record and a transaction record are loaded into the computer and the new amount to be paid is calculated using the following algorithm:
gross_pay = hours * rate rate = rate * over_rate over_pay = over_time * rate gross_pay = gross_pay + over_pay gross_pay = gross_pay + bonus pay = pay + gross_pay
The master record is updated with the new amount to be paid and the payroll processing program writes the updated record to a new file called C.NEW that has the same structure as the master file.
The payroll assumes there is a transaction record for each master record and that the transaction records are sorted in the same order as the master records.
The four payroll application programs used to test the package are listed below.
C.MASTER creates the master file from data entered at the keyboard.
C.TRANS creates the transaction file from data entered at the keyboard.
C.PROCESS processes the master and transaction files to produce a new master file.
C.DNEW displays the contents of the new master file to verify that C.PROCESS produces the correct results.
The package has been tested extensively: the test programs produce the expected test results so the package has passed these tests. Although the package is error free when compiling and running the test programs, there may still be errors in the package.
The limited 32K memory is the greatest obstacle to enhancing the package and few improvements can be made without more memory. One enhancement that could be made that does not require more memory is the addition of an indexed sequential filing system, a more natural filing system for COBOL. Using a different disk filing system—such as Acorn’s hierarchical Advanced Disk Filing System (ADFS)—would enable more files to be processed simultaneously.
Expanding the computer by adding extra memory, or by upgrading to a BBC Master or BBC Master Compact, would enable several enhancements, including:
ADD 1 TO COUNT GIVING COUNT.and
IF CHAR GREATER THAN "*" GO TO LABEL; and
With enough memory, however, a full implementation of COBOL could be produced but that would be beyond the scope of the project.