Input / Output

This section discussed SNOBOL4.2 input / output considerations. FILE NAMES, MODIFIERS and other I/O considerations:

File names can be associated with input/output unit numbers as mentioned in the section on invoking SNOBOL4.2 above. File names (more correctly, path names with optional modifiers) are specified in the command line or as the optional fourth operand of the INPUT or OUTPUT functions. If no fourth parameter is given for the INPUT or OUTPUT functions, the previous file name and modifiers assigned to the I/O unit (either on the command line or prior INPUT/OUTPUT function call) are used.

Path name syntax is described in the DOS manual. Paths names are used to gain access to files not in the current directory. In SNOBOL4.2, "file modifiers" may be optionally appended to the file name. Each modifier begins with the slash character "/" followed by the modifier name. All of the modifiers follow the path name with no intervening blanks. The modifier names and their functions are described below.

1. The /A modifier specifies that the file is to be considered an ASCII file with records terminated by the carriage control character (0Dh, which is discarded on input). The end of the file is indicated by the actual end of the file or the end-of-file character (1Ah), whichever comes first. The /B modifier is the opposite of /A and indicates a binary file. Characters are read until the buffer is full, except for possibly the last record read in the file, which may be shortened if the file size is not a multiple of the buffer size. The buffer size is given by the third parameter of the input function. Certain other modifiers only apply to a file read or written in one of the modes (ASCII /A or binary /B). /A is the default.

2. /NBUF (no buffering) disables SNOBOL4.2 buffering of output. This is particularly needed when writing to the console or perhaps a communications port. /BUF enables buffering and is the default, except for devices.

3. /TABX is the default and causes tab characters to be expanded into blanks on input of ASCII (/A) files. The tabs expand into blanks such that the next character after the tab has an offset in the line which is the next multiple of 8. This eliminates the need to preprocess lines when they were written by some editor or other program/utility to eliminate the tab characters. On output, /TABX compresses the output lines (in ASCII /A mode) so that they use tabs where possible. Quotation marks terminate the tab expansion. /NTABX disables this tab expansion/compression. Since /TABX is the default, SNOBOL4.2 is incompatible with SNOBOL4 versions 1.xx on this item. If you have any problems, just append /NTABX to the file name.

4. The /AP (append) modifier indicates that a file opened for write should be positioned to the end of the file, so that the next write appends to the file. /R is the opposite (replace) and is the default. It causes writing to start at the beginning of the file. For /AP, if the file is ASCII (/A and not /B), then a check is made for the end of file character 1Ah at the end of the file. If it appears there, the next write will replace the end of file character. /AP is ignored for devices, such as the console.

5. /NPR turns "?" prompt off when input is from the console. /PR turns prompting on and is the default. These only apply to ASCII files (/A) connected to the console.

6. The /VL (variable length) option provides the ability to read variable length records for ASCII files. The maximum record length is determined by the third parameter to the INPUT() function. The record ends with but does not include the carriage-return character in the file. Use of the /VL option is sometimes preferable to using &TRIM = 1 because less processing is involved and the reads reflect more precisely the content of the file. /VL applies to input on files read in ASCII mode (/A). Note, on binary files (/B), the last record may be less than the buffer size regardless of the /VL setting, when the number of bytes in the file is not an exact multiple of the record length specified in the third parameter of the INPUT function.

7. /LF enables (/NLF) disables) line feed character processing. If enabled, on input, line feeds at the start of a record are ignored. On output, a line feed character is inserted after the carriage control character. /LF is ignored if /CC is specified on output. Also, /LF only applies to files processed in ASCII mode (/A). /LF is the default.

8. /CC enables (/NCC disables) carriage control on output. /CC interprets the first character of the output record as a carriage control character if it is a legal carriage control character, otherwise it assumes a blank was given. Instead of ending the output records with carriage return and line feed characters, /CC causes these to be written at the front of the line instead. The codes are:

1 = new page: form feed and carriage return 0 = double space: 2 line feeds and 1 carriage return - = triple space: 3 line feeds and one carriage return blank = single space: carriage return and line feed + = overprint: only carriage return If /CC is specified /A is implied and /LF vs /NLF is ignored.

9. "/STD" DOS Standard I/O handles can be named by using:

IN/STD = 0 standard input (can be redirected using DOS) OUT/STD = 1 standard output (can be redirected using DOS) ERR/STD = 2 standard error output AUX/STD = 3 standard auxiliary device PRN/STD = 4 standard printer device /NBUF is set for standard I/O so that output to the console does not appear in bursts.

10. SUMMARY and RULES
Summary. (*=default):

/B Binary vs */A ASCII /NBUF No buffering vs */BUF Buffer */TABX Tab expand/compress vs /NTABX No tab expand or compress /AP Append to file vs */R Replace file */PR Prompt for input vs /NPR No prompt /VL Variable length read vs */FL Fixed length read */LF Line feeds vs /NLF No line feeds /CC Carriage control vs */NCC No carriage control /STD Interpret name as DOS standard I/O Rules: /B implies /NTABX /NLF /NCC /TABX implies /A /VL implies /A /LF implies /A /CC implies /A /STD implies /NBUF

11. If I/O unit is not assigned to a file when starting to read or write, it is then assigned to the standard device IN/STD or OUT/STD.

12: Device names such as CON, LPT1, and AUX must NOT have a colon (:) after them as was the practice for DOS 1.X.

13. INPUT() and OUTPUT() functions fail if the file name is not legal or if the modifiers are incorrect, in the fourth parameter.

14. The third parameter of the OUTPUT function, which is the FORTRAN format, now defaults to the null string, (no format). In the green book, it defaults to a specific format. This is an incompatibility between SNOBOL4.2 and SNOBOL4 as described in the green book. This was done to keep SNOBOL4 version 1.xx programs from truncating records and putting blanks in front of them on output, when interpreted by SNOBOL4.2.

FORTRAN FORMATTING

Minnesota SNOBOL4.2 now supports FORTRAN output formatting as used in the green book. People have noted that various examples depend on this formatting and it makes SNOBOL4.2 even more compatible with the mainframe versions of SNOBOL. Formatting is often used in conjunction with the /CC (carriage control) file modifier to achieve double spacing, page skips etc.

The details of FORTRAN formatting are covered in many books and FORTRAN manuals. However, only a subset of the full FORTRAN formatting facilities apply to SNOBOL. These are described below.

FORTRAN formatting is used only on output and only if the format (operand 3 of the OUTPUT function) was not null. A format is a string directs the characters to be written to particular columns and spacing. As the characters to be written are processed left to right, corresponding format elements are processed until the entire record to be written is processed. If there is an error in the syntax of the format string, then a write error will occur.

For example, if the format is

"('number ',6A1,' name ',30A1)" and is used with the source string "123456Doe, John" then the output record will be "number 123456 name Doe, John" The format string is always enclosed in parenthesis (). Inside these parenthesis is a list of format codes separated by commas "," or slashes '/'. A slash means that at that point a new output record is to be started. Thus, more than one record can be written with one write when using formatting. The following format codes are supported: aAw Indicates that up to "a" multiplied by "w" source characters are to be written out at this point. w should be 1 for compatibility with mainframe SNOBOL4. wHxxxx The "Hollerith" code Indicates that the "w" characters after the H are to be written out at this point. This is similar to the literal with quotes, except that a length is given instead of using delimiting quotes. Tr Indicates that the next data to be written should be placed in column "r" of the output record. wX Means that "w" blanks should be written at this point in the output. 'lit' Means that the characters between the quotes should be written to the output. To represent a quote in the middle of the string, use two quotes. aZw Means that the next "a" source characters are to be converted to hexadecimal digits and then written to the output record. w should be 2 if both hex digits are to be displayed for each source byte. w should be 1 if only the low order hex digit of each source byte is to be displayed. a(code,code,...) Means that the list of codes specified within parenthesis is to be repeated "a" times. If "a" is omitted, the codes are repeated indefinitely. Up to 5 levels of parenthesis are supported in formats.

where:

a is an optional repetition factor ranging from 1 though 255. If omitted, it is assumed to be one. w is an a field width in bytes ranging from 1 through 255. r is a column number in an output record, ranging from 1 through 255. (code,code,...) is a list of format codes separated by commas or slashes. Format codes within this list can be other lists up to 5 levels deep. separators separators are single commas, or any number of slashes. Each slash causes a new output record to be started.

If an entire format string is processed before the source characters are all consumed, then a new output record is started and the format is repeated from the beginning, unless there is a list of codes specified that has no repetition factor in front of it. Then a new record is not started (unless the output reaches 255 bytes) and format codes are used starting with the last indefinite code list in the format.


Prior Page, Next Page, First Page of the Minnesota SNOBOL4 Reference