Introduction
============

This is the guide for p5x, which is iso standard pascal with a few extensions.

It is derived from p5c, which is a free full iso level 1 standard pascal
compiler with conformant arrays, function parameters, arbitrary set sizes, etc.

It uses gnu c as intermediate code rather than p-code, so it is
 - fast, gcc creates very fast compact highly optimised code
 - portable with very little effort. gcc runs on almost every system and
   all that is needed is to compile the p5c c source code.
 - able to link naturally & easily to existing c code libraries.
 - able to take advantage of c tools like gcov code coverage analysis

Why Pascal?   easy to learn and understand,
              readable syntax, so easily maintained, easier to write,
              strong typing means more errors found at compile type, rather
              than at run time
              so bugs found earlier more reliable,
              higher productivity.

p5c (and hence p5x) is simple enough for one person to understand,
so it ideal for hobby compiler projects.
On the other hand, gcc produces some of the most highly optimised code around,
so p5c programs will run extremely quickly.

p5c is closely based on the p5 compiler, which is a freely available and
well documented ISO compliant pascal compiler.


Installation and Use
====================

The p5c home is at http://sourceforge.net/projects/pascal-p5c
and latest release can be downloaded from the files section.

The notes of the readme file describe how to get and construct the p5c compiler.

With p5c installed, there is a bash shell script called r that compiles and
runs a pascal program.
eg, to compile and run myprogram.pas:
    ./r  myprogram

To compile only, there is a similar script called pc.
eg, to compile but not run myprogram.pas:
    ./pc  myprogram

See the tools section of this document below.


Learning Pascal
===============

 -- There's lots of tutorials & learning material on the internet.

see eg this:
http://physinfo.ulb.ac.be/cit_courseware/pascal/pstart.htm
http://www.taoyue.com/tutorials/pascal/index.html
http://www.tutorialspoint.com/pascal/index.htm
http://en.wikibooks.org/wiki/Pascal_Programming

Note that p5c follows iso standard pascal and is slightly different to
the popular turbo pascal.  Try to favour learning material that makes the
differences clear.

The material in the doc directory of the p5 project accurately describes p5c.
Find it here ---> https://sourceforge.net/projects/pascalp5/files/


-- reference material

The classic reference to iso pascal is
"Pascal User Manual and Report" by Jensen & Wirth.

pascal language reference manual, from Hewlett Packard.
http://bitsavers.trailing-edge.com/pdf/hp/9000_hpux/6.x/98615-90053_HP_Pascal_Language_Reference_Apr88.pdf

This is a good technical level description of ISO pascal (also known as
ANSI Pascal).
Each feature in p5c pascal has an excellent description in this reference.
Note that this document describes features that are a superset of
iso pascal, and hence are not present in p5c pascal.
Ch 1 provides the only guide as to which features belong to the superset.


The following documents can be found at http://pascal-central.com/docs/

pascal-programming.pdf
   -- good for getting started and building up to large projects.  Assumes
      codewarior pascal, so some features of the language are not the same
      as iso pascal.

iso7185
   -- THE pascal standard, for uber geeks

pascal1973.pdf
   -- early version of Jensen & Wirth

pascal reference manual
   -- open VMS pascal reference manual.  Contains a good technical level
      description of iso pascal.  Note that this describes a highly
      professional vesion of pascal of which iso pascal is only a small
      subset.  It clearly differentiates between standard pascal and the
      extensions.  If you need another description of some feature in p5c
      pascal, it can almost certainly be found here.


Here's a quick summary of Pascal, taken from the p5 project:



*****************************************************************************

                    THE RULES OF ISO 7185 PASCAL

*****************************************************************************

This gives an overview of the basic rules of ISO 7185 Pascal. See
also the books on the subject. For serious users, I recommend:

Standard Pascal: Users Reference Manual, Doug Cooper
Oh ! Pascal !, Doug Cooper

Both available from Amazon.com.

Note that the following description could be wrong or incomplete.

*****************************************************************************
LEXOGRAPHY
*****************************************************************************

Pascal source consists of identifiers, keywords, numbers and special
character sequences.

A pascal identifier must begin with 'a' to 'z', but may
continue with 'a' to 'z' and '0' to '9'.

Examples of identifiers:
   X
   time
   readinteger
   WG2
   AlterHeatSetting
   GInqWsTran
   DeviceDriverIdentificationHeader
   DeviceDriverIdentificationBody


There is no length limit on labels, but there may be a practical limit.
If the compiler cannot process a source line longer than N, you cannot have a
label longer than N, since labels may not cross lines.

Keywords (or reserved words) appear just as labels, but have special meaning
wherever they appear, and may never be used as identifiers:

   and      array     begin     case      const     div       do
   downto   else      end       file      for       function  goto
   if       in        label     mod       nil       not       of
   or       packed    procedure program   record    repeat    set
   then     to        type      until     var       while     with

A number can appear in both integer and real form. Integers will appear
as a sequence of digits:

   83
   00004

Are valid integer numbers. For a number to be taken as "real" (or "floating
point") format, it must either have a decimal point, or use scientific
notation:

   1.0
   1e-12
   0.000000001
   -0.1
   5e-3
   87.35E+8

Are all valid reals. At least one digit must exist on either side of a
decimal point.
Strings are made up of a sequence of characters between single quotes:

   'string'

The single quote itself can appear as two single quotes back to back in a
string:

   'isn''t'

Examples of strings:

   'A'
   ';'
   ''''
   'p5c Pascal'
   'THIS IS A STRING'
   'Don''t think this is two strings'


Finally, special character sequences are one of the following:

   +        -         *         /         =         <         >
   [        ]         .         ,         :         ;         ^
   (        )         <>        <=        >=        ..        @
   {        }         (*        *)        (.        .)

Note that these are just aliases for the same character sequence:

   @  and ^ (or the "up arrow" if allowed in the typeface)
   (. and [
   .) and ]
   (* and {
   *) and }

Spaces and line endings in the source are ignored except that they may act
as "separators". No identifier, keyword, special character sequence or
number may be broken by a separator or other object. No two identifiers,
keywords or numbers may appear in sequence without an intervening separator:

   MyLabel         - Valid
   My Label        - Invalid
   begin farg := 1 - Valid
   beginfarg := 1  - Invalid
   1.0e-12         - Valid
   1.e-122e-3      - Invalid

*****************************************************************************
PROGRAM STRUCTURE
*****************************************************************************

A Pascal program appears as a nested set of "blocks", each of which has the
following form:

block_type name(parameter [, parameter]...);

label x[, y]...

const x = y;
      [q = r;]...

type x = y;
     [q = r;]...

var  x[,y]...: z;
     [x[,y]...: z;]...

[block]...

begin

   statement[; statement]

end[. | ;]

Note that:

   [option]    means optional.
   [repeat]... means can appear 0 or more times.
   [x | y]     means one or the other.

There are three types of blocks, program, procedure and function. Every
program must contain a program block, and exactly one program block exists
in the source file.
Each block has two distinct sections, the declaration and statements
sections. The declarations immediately before a statement section are
considered "local" to that section.
The declaration section builds a description of the data used by the coming
statement section in a logical order. For example, constants are usually
used to build type declarations, and type declarations are used to build
variables, and all of these may be used by nested blocks.

*****************************************************************************
LABEL DECLARATION
*****************************************************************************

The first declaration, labels, are numeric sequences that denote the target
of any goto's appearing in the block:

   label 99,
         1234;

Are valid labels. Labels "appear" to be numbers, and must be in the range
0 to 9999. The "appearance" of a number means that:

   label 1,
         01,

Are the same label.

*****************************************************************************
CONSTANT DECLARATION
*****************************************************************************

Constant declarations introduce fixed valued data as a specified identifier:

   const x = 10;
         q= -1;
         y = 'hi there';
         r = 1.0e-12;
         z = x;

Are all valid constant declarations. Only integer, real and character
constants may be so defined (no sets may appear).

*****************************************************************************
TYPES
*****************************************************************************

The type declaration allows types to be given names, and are used to create
variables later:

   type x = array [1..10] of integer;
        i = integer;
        z = x;

Types can be new types, aliases of old types, etc.

another example of a type declaration:

   type
      natural = 0..maxint;
      count = integer;
      range = integer;
      colour = (red, yellow, green, blue);
      sex = (male, female);
      year = 1900 . .2099;
      shape = (triangle, rectangle, circle);
      punchedcard = array [1 . .80] of char;
      charsequence = file of char;
      polar = record
                r : real;
                theta : angle
              end;
      indextype = 1..limit;
      vector = array [indextype] of real;
      person = " persondetails;
      persondetails = record
                        name, firstname : charsequence;
                        age : natural;
                        married : Boolean ;
                        father, child, sibling : person;
                        case s : sex of
                           male :(enlisted, bearded : Boolean);
                           female :(mother, programmer : Boolean)
                      end;
     FileOfInteger = file of integer;



*****************************************************************************
VARIABLE DECLARATION
*****************************************************************************

Variables set aside computer storage for a element of the given type:

   var x, y: integer;
       z:    array [1..10] of char;
       i, j : integer;
       k : 0 . .9;
       p, q, r : Boolean;
       operator : (plus, minus, times);
       a : array [0 . .63] of real;
       c : colour;
       f : file of char;
       hue1, hue2 : set of colour;
       p1, p2 : person;
       m, m1, m2 : array [1 . .10, 1 . .10] of real;
       coord : polar;
       pooltape : array [1 . .4] of FileOfInteger;
       date: record
               month : 1 . .12;
               year : integer
             end;


*****************************************************************************
BLOCK DECLARATION
*****************************************************************************

A block can be declared within a block, and that block can declare blocks
within it, etc. There is no defined limit as to the nesting level.
Because only one program block may exist, by definition all "sub blocks"
must be either procedure or function blocks. Once defined, a block may
be accessed by the block it was declared in. But the "surrounding" block
cannot access blocks that are declared within such blocks:

   program test;

   procedure junk;

   procedure trash;

   begin { trash }

      ...

   end;  { trash }

   begin { junk }

      trash;
      ...

   end;  { junk }

   begin { test }

      junk;
      ...

   end.  { test }

Here test can call junk, but only junk can call trash. Trash is "hidden"
from the view of test.
Similarly, a subblock can access any of the variables or other blocks
that are defined in surrounding blocks:

   program test;

   var x;

   procedure q;

   begin

   end;

   procedure y;

   begin

      q;
      x := 1

   end;

   begin

      y;
      writeln('x')

   end.

The variable "x" can be accessed from all blocks declared within the same
block.
It is also possible for a block to call itself, or another block that
calls it. This means that recursion is allowed in Pascal.

*****************************************************************************
DECLARATION ORDER
*****************************************************************************

Every identifier must be declared before it is used, with only one exception,
pointers, which are discussed later. But there is a way to declare procedures
and functions before they are fully defined to get around problems this
may cause.

*****************************************************************************
PREDEFINED TYPES
*****************************************************************************

Several types are predeclared in Pascal. These include integer, boolean, char,
real and text. Predeclared types, just as predeclared functions and procedures,
exist in a conceptual "outer block" around the program, and can be replaced
by other objects in the program.

*****************************************************************************
BASIC TYPES
*****************************************************************************

Types in Pascal can be classed as ordinal, real and structured. The ordinal
and real types are referred to as the "basic" types, because they have no
complex internal structure.
Ordinal types are types whose elements can be numbered, and there are a
finite number of such elements.

*****************************************************************************
INTEGER TYPES
*****************************************************************************

The basic ordinal type is "integer", and typically it represents the accuracy
of a single word on the target machine:

   var i: integer;

A predefined constant exists, "maxint", which tells you what the maximum
integral value of an integer is. So:

   type integer = -maxint..maxint;

Would be identical to the predefined type "integer". Specifically, the
results of any operation involving ordinals will only be error free if
they lie within -maxint to +maxint.
Although other ordinal types exist in Pascal, all such types have a mapping
into the type "integer", and are bounded by the same rules. The "ord"
function can be used on any ordinal to find the corresponding integer.


*****************************************************************************
ENUMERATED TYPES
*****************************************************************************

Enumerated types allow you to specify an identifier for each and every value
of an ordinal:

   type x = (one, two, three, four);

Introduces four new identifiers, each one having a constant value in sequence
from the number 0. So for the above:

   one   = 0
   two   = 1
   three = 2
   four  = 3

Enumerated types may have no relationship to numbers whatever:

   type y = (red, green, blue);

Or some relationship:

   type day = (mon, tue, wed, thur, fri, sat, sun);

Here the fact that "day"s are numbers (say, isn't that a lyric ?) is usefull
because the ordering has real world applications:

   if mon < fri then writeln('yes');

And of course, subranges of enumerated types are quite possible:

   type workday = (mon..fri);

Enumerated types are fundamentally different from integer and subrange types
in the fact that they cannot be freely converted to and from each other.
There is only one conversion direction defined, to integer, and that must
be done by special predefined function:

   var i: integer;
       d: day;

   ...

   i := ord(d); { find integral value of d }

BOOLEAN TYPES

The only predefined enumerated type is "boolean", which could be declared:

   type boolean = (false, true);

However, booleans cannot be cross converted (being enumerated types), this
user created type could not in fact be used just as the predeclared one.
Booleans are special in that several predefined procedures, and all of the
Comparison operators ("=", ">", etc.) give boolean results. In addition,
several special operators are defined just for booleans, such as "and",
"or" etc.

*****************************************************************************
CHARACTER TYPES
*****************************************************************************

Character types in Pascal hold the values of the underlying character set,
usually ISO single byte encoded (including ASCII). The Pascal standard
makes no requirements as to what characters will be present or what order they
will appear in. However, as a practical matter, most Pascal programs rely
on the characters of the alphabet and the digits '0'-'9' being present, and
that these are numbered sequentially (which leaves out EBCDIC, for example).
A character declaration appears as:

   var c: char;

Character values can also be converted to and from integers at will, but only
by using the special functions to do so:

   ord(c); { find integer value of character }

   chr(i); { find character value of integer }

*****************************************************************************
SUBRANGE TYPES
*****************************************************************************

Subrange types are simply a voluntary programmer restriction of the values
an ordinal type may hold:

   type constrained = -10..50;

(the notation x..y means all values from x to y inclusive.)
It is an error to assign a value outside of the corresponding range to a
variable of that type:

   var x: constrained

   ...

   x := 100; { invalid! }

But note that there are no restrictions on the USE of such a type:

   writeln('The sum is: ', x+100);

Here, even though the result of x+100 is greater than the type of x, it is
not an error. When used in an expression, a subrange is directly equivalent
to the type "integer".
Subranges can be declared of any ordinal type:

   type enum = (one, two, three, four, five, six, seven, eight, nine, ten);

   var e: three..seven;

   var c: 'a'..'z';

Etc.

*****************************************************************************
REAL TYPES
*****************************************************************************

Real types, or "floating point", allow approximations of a large range of
numbers to be stored. The tradeoff is that reals have no direct ordinality
(cannot be counted), and so have no direct relationship with integers. Real
types are the only basic type which is not ordinal.

   var r: real;

Integers are considered "promotable" to reals. That is, is is assumed that
an integer can always be represented as a real. However, there may be
a loss of precision when this is done (because the mantissa of a real
may not be as large as an integer).
Reals are never automatically promoted to integer, however, and the
programmer must choose between finding the nearest whole number to the real,
or simply discarding the fraction. This choice must be made explicitly by
predefined function.

*****************************************************************************
STRUCTURED TYPES
*****************************************************************************

A structured type is a type with a complex internal structure. In fact, the
structured types all have one thing in common: they can hold more than
one basic type object at one time. They are structured because they are
"built up" from basic types, and from other structured types.

*****************************************************************************
PACKING
*****************************************************************************

Structured types can also be "packed", which is indicated by the keyword
"packed" before the type declaration. Packing isn't supposed to change the
function of the program at all. Stripping the "packed" keywords out of a
program will not change the way it works (with the exception of "strings",
below).
Packing means that (if implemented: its optional) the program should conserve
space by placing the values in as few bits as possible, even if this takes more
code (and time) to perform.
Packing is better understood if you understand the state of computers before
Microprocessors (the jurassic age of computers ?). Most mainframe computers
access memory as a single word size only, and not even a neat multiple of
8 bits either (for example, 36 bit computer; the CDC 6000 has 60 bit words).
The machine reads or writes in words only. There is no byte access, no even/odd
addressing, etc. Because storage on such a machine of small items could be
wastefull (especially characters), programs often pack many single data items
into a single word.
The advent of the Minicomputer changed that. DEC started with an 8 bit machine
(just as microprocessors did), and when they changed to 16, then 32 bits the
ability to address single bytes was maintained.
For this reason, many people refer to such a machine as "automatically packed",
or that Pascal's packing feature is unecessary on such machines.

However, quantizing data by 8 bit bytes is not necessarily the optimal packing
method available. For example, a structure of boolean values, which take up
only 1 bit per element, left to byte packing would waste 7/8s of the storage
allocated.


*****************************************************************************
SET TYPES
*****************************************************************************

Set types are perhaps the most radical feature of Pascal. A set type can be
thought of as an array of bits indicating the presence or absence of each
value in the base type:

   var s: set of char;

Would declare a set containing a yes/present or no/not present indicator for
each character in the computer's character set. The base type of a set must
be ordinal.

Another example:

   var c: set of (club, diamond, heart, spade)


*****************************************************************************
ARRAY TYPES
*****************************************************************************

The most basic structured type is the array. Pascal is unusual in that both
the upper and lower bounds of arrays are declared (instead of just the upper
bound or length), and that the index type can be any ordinal type:

   var a: array [1..10] of integer;

Would declare an array of 10 integers with indexes from 1 to 10.
You may recognize the index declaration as a subrange, and indeed any
subrange type can be used as an index type:

   type sub = 0..99;

   var a: array [sub] of integer;

Arrays can also be declared as multidimensional:

   var a: array [1..10] of array [1..10] of char;

There is also a shorthand form for array declarations:

   var a: array [1..10, 1..10] of char;

Is equivalent to the last declaration.
A special type of array definition is a "string". Strings are arrays of packed
characters, with integer indexes, whose lower bound is 1:

   var s: packed array [1..10] of char;

So, these are equivalent:

   array [Boolean] of array [1..10] of array [size] of real
   array [Boolean] of array [1..10, size] of real
   array [Boolean, 1..10, size] of real
   array [Boolean, 1..10] of array [size] of real


And so are these:

   packed array [1..10, 1..8] of Boolean
   packed array [1..10] of packed array [1..8] of Boolean


String types are special in that any two strings with the same number of
components are compatible with each other, including constant strings.

*****************************************************************************
RECORD TYPES
*****************************************************************************

Records give the ability to store completely different component types together
as a unit. There they can be manipulated, copied and passed as a unit. It is
also possible to create different typed objects that occupy the same storage
space.

   var r: record

             a: integer;
             b: char

          end;

Gives a single variable with two completely different components, which can
be accessed independently, or used as a unit.

   var vr: record

              a: integer;
              case b: boolean of { variant }

                 true: (c: integer; d: char);
                 false: (e: real)

              { end }

           end;

Variant records allow the same "collection of types", but introduce the idea
that not all of the components are in use at the same time, and thus can occupy
the same storage area. In the above definition, a, b, c, d, and e are all
elements of the record, and can be addressed individually. However, there are
three basic "types" of record elements in play:

   1. "base" or normal fixed record elements, such as a.

   2. The "tagfield" element. Such as b.

   3. The "variants", such as c, d, and e.

All the elements before the case variant are normal record elements and are
always present in the record. The tagfield is also always present, but has
special function with regards to the variant. It must be an ordinal type, and
ALL of it's possible values must be accounted for by a corresponding variant.
The tagfield gives both the program and the compiler the chance to tell what
the rest of the record holds (ie., what case variant is "active"). The tagfield
can also be omitted optionally:

   var vr: record

              a: integer;
              case boolean of { variant }

                 true: (c: integer; d: char);
                 false: (e: real)

              { end }

           end;

In this case, the variant can be anything the program says it is, without
checking.
The variants introduce what essentially is a "sub record" definition that
gives the record elements that are only present if the selecting variant is
"active". A variant can hold any number of such elements.
If the compiler chooses to implement variants, the total size of the resulting
record will be no larger than the fixed record parts plus the size of the
largest variant.
It is possible for the compiler to treat the variant as a normal record,
allocating each record element normally, in which case the variant record
would be no different from a normal record.

more examples of records:

   record
     year : 0 . .2000;
     month : 1 . .12;
     day : 1 . .31
    end

   record
     name, firstname : string;
     age : 0 . .99;
     case married : Boolean of
        true : (Spousesname : string);
        false : ( )
   end

   record
     x, y : real;
     area : real;
     case shape of
       triangle : (side : real ;
       inclination, angle1, angle2  : angle);
       rectangle : (side1, side2 : real ;
                    skew: angle);
       circle : (diameter : real);
   end



*****************************************************************************
FILE TYPES
*****************************************************************************

Files are identical to arrays in that they store a number of identical
components. Files are different from arrays in that the number of components
they may store is not limited or fixed beforehand. The number of components
in a file can change during the run of a program.
A file can have any type as a component type, with the exception of other
file types. This rule is strict: you may not even have structures which
contain files as components.
A typical file declaration is:

   var f: file of integer;

Would declare a file with standard integer components.

A special predefined file type exists:

   var f: text;

Text files are supposedly equivalent to:

   type text = file of char;

and are also separated into lines, each terminated with an eoln.

There are special procedures and functions that apply to text files only.
See below for more details.


*****************************************************************************
POINTER TYPES
*****************************************************************************

Pointers are indirect references to variables that are created at runtime:

   var ip: ^integer;

Pointers are neither basic or structured types (they are not structured
because they do not have multiple components). Any type can be pointed to.
In practice, pointers allow you to create a series of unnamed components
which can be arranged in various ways.
The type declaration for pointers is special in that the type  specified to
the right of "^" must be a type name, not a full type specification.
Pointer declarations are also special in that a pointer type can be declared
using base types that have not been declared yet:

   type rp: ^rec;
        rec: record

                next: rp;
                val:  integer

             end;

The declaration for rp contains a reference to an undeclared type, rec. This
"forward referencing" of pointers allows recursive definition of pointer
types, essential in list processing.

*****************************************************************************
TYPE COMPATIBILITY
*****************************************************************************

Type compatibility (ability to use two different objects in relation to each
other), occurs on three different levels:

1. Two types are identical.

2. Two types are compatible.

3. Two types are assignment compatible.

Two types are identical if the exact same type definition was used to create
the objects in question. This can happen in several different ways. Two
objects can be declared in the same way:

   var a, b: array [1..10] of record a, b: integer end;

Here a and b are the same (unnamed) type. They can also be declared using the
same type name:

   type mytype = record a, b: integer end;

   var a: mytype;
       b: mytype;

Finally, an "alias" can be used to create types:

   type mytype = array [1..10] of integer;
        myother = mytype;

   var a: mytype;
       b: myother;

Even though an alias is used, these objects till have the same type.
Two types are considered compatible if:

1. They are identical types (as described above).

2. Both are ordinal types, and one or both are subranges of an identical
type.

3. Both are sets with compatible base types and "packed" status.

4. Both are string types with the same number of components.

Finally, two types are assignment compatible if:

1. The types are compatible, as described above.

2. Neither is a file, or has components of file type.

3. The destination is real, and the source is integer (because integers can
allways be promoted to real, as above).

4. The source "fits" within the destination. If the types are subranges of
the same base type, the source must fall within the destination's range:

   var x: 1..10;

   ...

   x := 1; { legal }
   x := 20; { not legal }

5. Both are sets, and the source "fits" within the destination. If the base
types of the sets are subranges, all the source elements must also exist in
the destination:

   var s1: set of 1..10;

   ...

   s1 := [1, 2, 3]; { legal }
   s1 := [1, 15]; { not legal }

*****************************************************************************
EXPRESSIONS
*****************************************************************************

The basic operands in Pascal are:

   nnn        - Integer constant. A string of digits, without sign, whose
                value is bounded by -maxint..maxint.
   x.xex      - Real constant.
   'string'   - String constant.
   [set]      - Set constant. A set constant consists of zero or more elements
                separated by ",":

                   [1, 2, 3]

                A range of elements can also appear:

                   [1, 2..5, 10]

                The elements of a set must be of the same type, and the
                "apparent" base type of the set is the type of the elements.
                The packed or unpacked status of the set is whatever is
                required for the context where it appears.
   ident      - Identifier. Can be a variable or constant from a const
                declaration.
   func(x, y) - A function call. Each parameter is evaluated, and the
                function called. The result of the function is then used
                in the encompassing expression.

The basic construct built on these operands is a "variable access", where
"a" is any variable access.

   ident    - A variable indentifier.
   a[index] - Array access. It is also possible to access any number of
              dimensions by listing multiple indexes separated by ",":

                 [x, y, z, ...]

   a.off    - Record access. The "off" will be the element identifier as
              used in the record declaration.

   a^       - Pointer reference. The resulting reference will be of the
              variable that the pointer indexes. If the variable reference
              is a file, the result is a reference to the "buffer variable"
              for the file.

Note that a VAR parameter only allows a variable reference, not a full
expression.
For the rest of the expression operators, here they are in precedence, with
the operators appearing in groups according to priority (highest first).
"a" and "b" are operands.

   (a)      - A subexpresion.
   not      - The boolean "not" of the operand, which must be boolean.

   a*b      - Multiplication/set intersection. If the operands are real or
              integer, the multiplication is found. If either operand is
              real, the result is real. If the operands are sets, the
              intersection is found, or a new set with elements that exist
              in both sets.
   a/b      - Divide. The operands are real or integer. The result is a real
              representing a divided by b.
   a div b  - Integer divide. The operands must be integer. The result is an
              integer giving a divided by b with no fractional part.
   a mod b  - Integer modulo. The operands must be integer. The result is an
              integer giving the modulo of a divided by b.
   a and b  - Boolean "and". Both operands must be boolean. The result is a
              boolean, giving the "and" of the operands.

   +a       - Identity. The operand is real or integer. The result is the
              same type as the operand, and gives the same sign result as the
              operand (essentially a no-op).
   -a       - Negation. The operand is real or integer. The result is the
              same type as the operand, and gives the negation of the
              operand.
   a+b      - Add/set union. If the operands are real or integer, finds the
              sum of the operands. If either operand is real, the result is
              real. If both operands are sets, finds a new set which contains
              the elements of both.
   a-b      - Subtract/set difference. If the operands are real or integer,
              finds a minus b. If either operand is real, the result is
              real. If both operands are sets, finds a new set which contains
              the elements of a that are not also elements of b.
   a or b   - Boolean "or". Both operands must be boolean. The result is
              boolean, giving the boolean "or" of the operands.

   a < b    - Finds if a is less than b, and returns a boolean result.
              The operands can be basic or string types.
   a > b    - Finds if a is greater than b, and returns a boolean result.
              The operands can be basic or string types.
   a <= b   - Finds if a is less than or equal to b, and returns a boolean
              result. The operands can be basic, string, or set types.
   a >= b   - Finds if a is greater than or equal to b, and returns a boolean
              result. The operands can be basic, string, or set types.
   a = b    - Finds if a is equal to b, and returns a boolean result.
              The operands can be basic, string, set or pointer types.
   a <> b   - Finds if a is not equal to b, and returns a boolean result.
              The operands can be basic, string, set or pointer types.
   a in b   - Set inclusion. A is an ordinal, b is a set with the same base
              type as a. Returns true if there is an element matching a in
              the set.

Note 1:
To compare sets for strict greater than (or less than), you need to combine
two compare operations.  So, for example, if sa & sb are compatible sets, this
is not allowed

     if sa < sb then ...                              { not allowed}

but this is the exact equivalent and is the preferred solution:

     if (sa <= sb) and not (sa >= sb) then ...        {OK}


Note 2:
The logical operators 'and' & 'or' have a higher precedence than the
relational operators (=, <>, >, >=, <, <=) so this expression

                  if a<b and b<c then ... {!!! illegal}

is interpreted as

                  if a< (b and b) <c then ...

which is not legal pascal.  Instead, write

                  if (a<b) and (b<c) then ... {legal}

Similarly, the in operator often needs parantheses, eg

                  if not (a in [1...9]) then ....


*****************************************************************************
PREDEFINED FUNCTIONS
*****************************************************************************

The following predefined functions exist:

   sqr(x)    - Finds the square of x, which can be real or integer. The
               result is the same type as x.
   sqrt(x)   - Finds the square root of x, which can be real or integer. The
               result is allways real.
   abs(x)    - Finds the absolute value of x, which can be real or integer.
               The result is the same type as x.
   sin(x)    - Finds the sine of x,which can be real or integer. x is
               expressed in radians. The result is always real.
   cos(x)    - Finds the cosine of x,which can be real or integer. x is
               expressed in radians. The result is always real.
   arctan(x) - Finds the arctangent of x, which can be real or integer. The
               result is always real, and is expressed in radians.
   exp(x)    - Finds the exponential of x, which can be real or integer. The
               result is always real.
   ln(x)     - Finds the natural logarithim of x, which can be real or
               integer. The result is always real.

   ord(x)    - Finds the integer equivalent of any ordinal type x.
   succ(x)   - Finds the next value of any ordinal type x.
   pred(x)   - Finds the last value of any ordinal type x.
   chr(x)    - Finds the char type equivalent of any integer x.

   trunc(x)  - Finds the nearest integer below the given real x (converts a
               real to an integer).
   round(x)  - Finds the nearest integer to the given real x.

Exampes:
  trunc(3.5) yields 3
  trunc(-3.5) yields -3
  round(3.5) ylelds 4
  round(-3.5) ylelds -4

Note: for more functions, use these identities:

   log10(x) = ln(x)/ln(10)

   arcsin(x) = arctan(x/sqrt(1-sqr(x)))
   arccos(x) = arctan(sqrt(1-sqr(x))/x)

*****************************************************************************
STATEMENTS
*****************************************************************************

Pascal uses "structured statements". This means you are given a few standard
control flow methods to build a program with.

*****************************************************************************
ASSIGNMENT
*****************************************************************************

The fundamental statement is the assignment statement:

   v := x;

There is a special operator for assignment, ":=" (or "becomes"). Only a
single variable reference may appear to the right, and any expression may
appear to the left.
The operands must be assignment compatible, as defined above.

Examples:
   x := y + z
   p := (l <= j) and (i < 100)
   i := sqr(k) - (i • j)
   huel := [blue. succ(c)]
   p1^.mother := true


*****************************************************************************
IF STATEMENT
*****************************************************************************

The if statement is the fundamental flow of control structure:

   if cond then statement [else statement]

In Pascal, only boolean type expressions may appear for the condition (not
integers). The if statement specifys a single statement to be executed
if the condition is true, and an optional statement if the condition is
false. You must beware of the "bonding problem" if you create multiple
nested if statements:

   if a = 1 then if b = 2 then writeln('a = 1, b = 2')
   else writeln('a <> 1');

Here the else clause is attached to the very last statement that appeared,
which may not be the one we want.

*****************************************************************************
WHILE STATEMENT
*****************************************************************************

Just as if is the fundamental flow of control statement, while is the
fundamental loop statement:

   while cond do statement

The while statement continually executes it's single statement as long as
the condition is true. It may not execute the statement at all if the
condition is never true.

*****************************************************************************
REPEAT STATEMENT
*****************************************************************************

A repeat statement executes a block of statements one or more times:

   repeat statement [; statement] until cond

It will execute the block of statements as long as the condition is false.
The statement block will always be executed at least once.

*****************************************************************************
FOR STATEMENT
*****************************************************************************

The for statement executes a statement a fixed number of times:

   for i := lower to upper do statement

   for i := upper downto lower do statement

The for statement executes the target statement as long as the "control
variable" lies within the set range of lower..upper. It may not execute
at all if lower > upper.
The control variable in a for is special, and it must obey several rules:

1. It must be ordinal.

2. It must be local to the present block (declared in the present block).

3. It must not be "threatened" in the executed statement. To threaten means
to modify, or give the potential to modify, as in passing as a VAR parameter
to a procedure or function (see below).

*****************************************************************************
CASE STATEMENT
*****************************************************************************

The case statement defines an action to be executed on each of the values of
an ordinal:

   case x of

     c1: statement;
     c2: statement;
     ...

   end;

The "selector" is an expression that must result in an ordinal type. Each of
the "case labels" must be type compatible with the selector. The case
statement will execute one, and only one, statement that matches the current
selector value. If the selector matches none of the cases, then an error
results. It is NOT possible to assume that execution simply continues if none
of the cases are matched. A case label MUST match the value of the selector.

*****************************************************************************
GOTO STATEMENT
*****************************************************************************

The goto statement directly branches to a given labeled statement:

   goto 123

   ...

   123:

Several requirements exist for gotos:

1. The goto label must have been declared in a label declaration.

2. A goto cannot jump into any one of the structured statements above
(if, while, repeat, for or case statements).

3. If the the target of the goto is in another procedure or function,
that target label must be in the "outer level" of the procedure or function.
That means that it may not appear inside any structured statement at all.

*****************************************************************************
COMPOUND STATEMENT
*****************************************************************************

A statement block gives the ability to make any number of statements appear
as one:

   begin statement [; statement]... end

All of the above statements control only one statement at a time, with the
exception of repeat. The compound statement allows the inclusion of a whole
substructure to be controlled by those statements.

*****************************************************************************
PROCEDURES AND FUNCTIONS
*****************************************************************************

When you need to use a block of the same statements several times, a
compound block can be turned into a procedure or function and given a name:

   procedure x;

   begin

      ...

   end;

   function y: integer;

   begin

      ...

   end;

Then, the block of statements can be called from anywhere:

   var i: integer;

   x; { calls the procedure }

   i := y; { calls the function }

The difference between a procedure and a function is that a function returns
a result, which can only be a basic or pointer type (not structured). This
makes it possible to use a function in an expression. In a function, the
result is returned by a special form of the assign statement:

   function y: integer;

   begin

      ...
      y := 1 { set function return }

   end;

The assignment is special because only the name of the function appears on
the left hand side of ":=". It does not matter where the function return
assignment appears in the function, and it is even possible to have multiple
assignments to the function, but AT LEAST one such assignment must be executed
before the function ends.
If the procedure or function uses parameters, they are declared as:

   procedure x(one: integer; two, three: char);

   begin

      ...

   end;

The declaration of a parameter is special in that only a type name may be
specified, not a full type specification.
Once appearing in the procedure or function header, parameters can be
treated as variables that just happen to have been initialized to the value
passed to the procedure or function. The modification of parameters has no
effect on the original parameters themselves. Any expression that is
assignment compatible with the parameter declaration can be used in place
of the parameter during it's call:

   x(x*3, 'a', succ('a'));

If it is desired that the original parameter be modified, then a special form
of parameter declaration is used:

   procedure x(var y: integer);

   begin

      y := 1

   end;

Declaring y as a VAR parameter means that y will stand for the original
parameter, including taking on any values given it:

   var q: integer;

   ...

      x(q);

Would change q to have the value 1. In order to be compatible with a VAR
the passed parameter must be of identical type as the parameter declaration,
and be a variable reference.
Finally, Pascal provides a special mode of parameter known as a procedure or
function parameter which passes a reference to a given procedure or function:

   procedure x(procedure y(x, q: integer));

   ...

   procedure z(function y: integer);

   ...

To declare a procedure or function parameter, you must give it's full
parameter list, including a function result if it is a function.
A procedure or function is passed to a procedure or function by just it's
name:

   procedure r(a, b: integer);

   begin

      ...

   end;

   begin

      x(r); { pass procedure r to procedure x }

      ...

The parameter list for the procedure or function passed must be "congruent"
with the declared procedure or function parameter declaration. This means
that all it's parameters, and all of the parameters of it's procedure or
function parameters, etc., must match the declared parameter. Once the
procedure or function has been passed, it is then ok for the procedure or
function that accepts it to use it:

   procedure x(procedure y(x, q: integer));

   begin

      y(1, 2);
      ...

Would call r with parameters 1 and 2
Procedures and functions can be declared in advance of the actual appearance
of the procedure or function block using the forward keyword:

   procedure x(a, b: integer); forward;

   procedure y;

   begin

      x(1, 2)
      ...

   end;

   procedure x;

   begin

      ...

The forward keyword replaces the appearance of the block in the first
appearance of the declaration. In the second appearance, only the name of
the procedure appears, not it's header parameters. Then the block appears
as normal.
The advance declaration allows recursive structuring of procedure and
function calls that would be otherwise not be possible.

*****************************************************************************
PREDEFINED PROCEDURES AND FILE OPERATIONS
*****************************************************************************

A file is not accessed directly (as an array is). Instead, Pascal
automatically declares one component of the files base type which is
accessed by special syntax:

   f^

So that:

   f^ := 1;

Assigns to the file "buffer" component, and:

   v := f^;

Reads the file buffer. Unless the file is empty or you are at the end of the
file, the file buffer component will contain the contents of the component
at the file location you are currently reading or writing. Other than that,
the file buffer behaves as an ordinary variable, and can even be passed as
a parameter to routines.
The way to actually read or write through a file is by using the predeclared
procedures:

   get(f);

Loads the buffer variable with the next element in the file, and advances the
file position by one element, and:

   put(f);

Outputs the contents of the buffer variable to the file and advances the file
position by one. These two procedures are really all you need to implement
full reading and writing on a file. It also has the advantage of keeping the
next component in the file as a "lookahead" mechanism.
However, it is much more common to access files via the predefined procedures
read and write:

   read(f, x);

Is equivalent to:

   x := f^; get(f);

And:

   write(f, x);

Is equivalent to:

   f^ := x; put(f);

Read and write are special in that any number of parameters can appear:

   read(f, x, y, z, ...);
   write(f, x, y, z, ...);

The parameters to read must be variable references. The parameters to write
can be expressions of matching type, except for the file parameter (files
must always be VAR references).
Writing to a file is special in that you cannot write to a file unless you
are at the end of the file. That is, you may only append new elements to the
end of the file, not modify existing components of the file.
Files are said to exist in three "states":

  1. Inactive.
  2. Read.
  3. Write.

All files begin life in the inactive state. For a file to be read from, it
must be placed into the read state. For a file to be written, it must be
placed in the write state. The reset and rewrite procedures do this:

  reset(f);

Places the buffer variable at the 1st element of the file (if it exists), and
sets the file mode to "read".

  rewrite(f);

Clears any previous contents of the file, and places the buffer variable at
the start of the file. The file mode is set to "write".
A file can be tested for only one kind of position, that is if it has reached
the end:

  eof(f);

Is a function that returns true if the end of the file has been reached. eof
must be true before the file can be written.

*****************************************************************************
PREDEFINED PROCEDURES AND TEXT FILES
*****************************************************************************

As alluded to before, text files are treated specially under Pascal. First,
The ends of lines are treated specially. If the end of a line is reached,
a read call will just return a space. A special function is required to
determine if the end of the line has been reached:

   eoln(f);

Returns true if the current file position is at the end of a line. Pascal
strictly enforces the following structure to text files:

   line 1<eoln>
   line 2<eoln>
   ...
   line N<eoln>
   <eof>

There will always be an eoln terminating each line. If the file being read
does not have an eoln on the last line, it will be added automatically.
Besides the standard read and write calls, two procedures are special to text
files:

   readln(f...);
   writeln(f...);

Readln behaves as a normal read, but after all the items in the list are
read, The rest of the line is skipped until eoln is encountered.
Writeln behaves as a normal write, but after all the items in the list are
written, an eoln is appended to the output.
Text files can be treated as simple files of characters, but it is also
possible to read and write other types to a text file. Integers and reals can
be read from a text file, and integers, reals, booleans, and strings can be
written to text files. These types are written or read from the file by
converting them to or from a character based format. The format for integers
on read must be:

   [+/-]digit[digit]...

Examples:

   9
   +56
   -19384

The format for reals on read is:

   [+/-]digit[digit]...[.digit[digit]...][e[+/-]digit[digit]...]

Examples:

   -1
   -356.44
   7e9
   +22.343e-22

All blanks are skipped before reading the number. Since eolns are defined as
blanks, this means that even eoln is skipped to find the number. This can
lead to an interesting situation when a number is read from the console. If
the user presses return without entering a number (on most systems), nothing
will happen until a number is entered, no matter how many times return is
hit !
Write parameters to textfiles are of the format:

   write(x[:field[:fraction]]);

The field states the number of character positions that you expect the object
to occupy. The fraction is special to reals. The output format that occurs
in each case are:

   integer: The default field for integers is implementation defined, but
   is usually the number of digits in maxint, plus a position for the sign.
   If a field is specified, and is larger than the number of positions
   required to output the number and sign, then blanks are added to the left
   side of the output until the total size equals the field width. If the
   field width is less than the required positions, the field width is
   ignored.

   real: The default field for reals is implementation defined. There are
   two different format modes depending on whether the fraction parameter
   appears.
   If there is no fraction, the format is:

      -0.0000000e+000

   Starting from the left, the sign is either a "-" sign if the number is
   negative, or blank if the number is positive or zero. Then the first digit
   of the number, then the decimal point, then the fraction of the number,
   then either 'e' or 'E' (the case is implementation defined), then the sign
   of the exponent, then the digits of the exponent. The number of digits in
   the exponent are implementation defined, as are the number of digits in
   a fraction if no field width is defined. If the field width appears, and
   it is larger than the total number of required positions in the number
   (all the characters in the above format without the fraction digits),
   then the fraction is expanded until the entire number fills the specified
   field, using right hand zeros if required. Otherwise, the minimum required
   positions are always printed.
   If a fraction appears (which means the field must also appear), the format
   used is:

      [-]00...00.000..00

   The number is converted to it's whole number equivalent, and all the of
   whole number portion of the number printed, regardless of the field size,
   proceeded by "-" if the number is negative. Then, a decimal point appears,
   followed by the number of fractional digits specified in the fraction
   parameter. If the field is greater then the number of required positions
   and specified fraction digits, then leading spaces are appended until the
   total size equals the field width. The minimum positions and the specified
   fractional digits are always printed.

*****************************************************************************
HEADER FILES
*****************************************************************************

The header files feature was originally designed to be the interface of
Pascal to the external files system, and as such is implementation by
definition.

The header files appear as a simple list of identifiers in the program
header:

   program test(input, output, source, object);

Each header file should be declared again in the variables section
of the program block:

   program test(intlist);

   var intlist: file of integer;

Two files are special, and should not be redeclared. These are input and
output. The input files are understood to represent the main input and
main output from the program, and are present in all Pascal programs.
In addition, they are the default files is special forms of these
procedures and functions:

   This form      is equivalent to      This form
   --------------------------------------------------------------
   write(...)                           write(output, ...)
   writeln(...)                         writeln(output, ...)
   writeln                              writeln(output)
   read(...)                            read(input, ...)
   readln(...)                          readln(input, ...)
   readln                               readln(input)
   eof                                  eof(input)
   eoln                                 eoln(input)


*****************************************************************************
PACKING PROCEDURES
*****************************************************************************

Because arrays are incompatible with each other even when they are of the
same type if their packing status differs, two procedures allow a packed
array to be copied to a non-packed array and vice versa:

   unpack(PackedArray, UnpackedArray, Index);

Unpacks the packed array and places the contents into the unpacked array.
The index gives the starting index of the unpacked array where the data
is to be placed. Interestingly, the two arrays need not have the same index
type or even be the same size ! The unpacked array must simply have enough
elements after the specified starting index to hold the number of elements
in the packed array.

   pack(UnpackedArray, Index, PackedArray);

Packs part of the unpacked array into the packed array. The index again gives
the starting position to copy data from in the unpacked array. Again, the
arrays need not be of the same index type or size. The unpacked array simply
need enough elements after the index to provide all the values in the packed
array.

*****************************************************************************
DYNAMIC ALLOCATION
*****************************************************************************

In Pascal, pointer variables are limited to the mode of variable they can
index. The objects indexed by pointer types are anonymous, and created or
destroyed by the programmer at will. A pointer variable is undefined when
it is first used, and it is an error to access the variable it points to
unless that variable has been created:

   var p: ^integer;

   ...

      new(p); { create a new integer type }
      p^ := 1; { place value }

Would create a new variable. Variables can also be destroyed:

      dispose(p);

Would release the storage allocated to the variable. It is an error (a very
serious one) to access the contents of a variable that has been disposed.
A special syntax exists for the allocation of variant records:

   var r: record

             a: integer;
             case b: boolean of

                true: (c: integer);
                false: (d: char)

             { end }

          end;

   ...

   new(p, true);

   ...

   dispose(p, true);

For each of new and dispose, each of the tagfields we want to discriminate
are parameters to the procedure. The appearance of the tagfield values allow
the compiler to allocate a variable with only the amount of space required
for the record with that variant. This can allow considerable storage savings
if used correctly.
The appearance of a discriminant in a new procedure does not also
automatically SET the value of the tagfield. You must do that yourself. For
the entire life of the variable, you must not set the tagfield to any other
value than the value used in the new procedure, nor access any of the
variants in the record that are not active.
The dispose statement should be called with the exact same tagfield values
and number.
Note that ALL the tagfields in a variable need not appear, just all the ones,
in order, that we wish to allocate as fixed.

*****************************************************************************
*****************************************************************************



Implementation Dependent Features
=================================


-- Compiler Options

Options are as in p5 (and p4 before that, and many other pascal compilers),
eg {$X+,Y-,Z+}, with '$' as the first character in a comment, then a sequence
of characters.  There is an example near the top of pcom.pas.
(*$X+*) comments can be used instead of {$X+} comments.

More that one option can be in the same comment if they are separated by a
 comma, eg

{$l+,w+,t- }

There should be no spaces between the options.
Upper case letters may be used instead of lower case letters.
p5c issues a warning whenever an unrecognised option is encountered.

More details can be found in the p5 documentation.

Lisings are turned on with the 'l' character followed by a '+' in
a comment like this:
   {$l+}

and turned off with a '-' as in this comment:
   {$l-}

Debugging can be turned on and off with the d character, eg
{$d+} {debug on}
{$d-} {debug on}

Debugging performs lots of runtime checks on the code for various
programming errors that otherwise can be hard to find.
eg
   a: array[1..9] of real;
   ....
   i := 10;
   if a[i] > 0.0 then  { error i too big for a }
      ...

With debugging enabled, unexpected behavior in your program is reported exactly
where it happens.  With debugging disabled, there is no checking so the code
will be faster, but should an error occur, the behaviour of your code will be
undefined.  In plain English, that means it could blow up in your face.


The debug checks include arithmetic, file, memory, and range checks.
They are described in more detail below.


Variant and tag checking is controlled with the {$v+} & {$v-} options.
These checks can find problems when a variant record is likely to contain the
wrong data, as in this example:

   var   r : record
               case t: boolean of
                 true ( size : integer);
                 false( value: real;);
             end;
   ....

   r.t := true;  { r contains size }

   ....

   if r.value > 100 then    { !!! error - r holds size, not value }


A warning option has been added, so to turn warnings off use a comment
like this:
     {$w-}
and to turn them back on again, use this:
     {$w+}

There is an option that embeds pascal line numbers in the generated c code:
     {$n+}
You probably don't need to use this option often, but it might be useful if
you need to browse the generated c code and relate it to the corresponding
pascal code.

Also if you use the gnu c tools (as the pascal analyser, pan, does) the {$n+}
option forces the the generated c code to refer to the pascal line numbers.

Line numbers are normally turned off, so if you want them you must explicitly
turn them on.

The p5x pascal tools pan & rv turn line numbers on automatically, so you
don't need do this yourself.

There is also the {$z+} option which favours increasing code size over
memory size in evaluating set expressions.  The default is {$z-} (off).
It is an advanced option and is described in the section on set
implementation later in this document.

The other options from p5 do nothing or have been removed.


-- File Header Parameters

File variables listed in the program header are bound to external files.
They must also be declared as file variables at the program level (ie not in a
procedure or function).

For example, if my program's header looked like this

    program myprog( input, output, file1, file2 );
    var file1, file2 : text;
    ... etc ...

and myprog was called with parameters something like this

    myprog data res

then variable file1 in myprog refers to the file called data, and file2
refers to the file called res.

The files are bound in the same order as they appear in the header.
input and output don't count because there are already bound to standard input
and standard output.

If the program is called with too few parameters, the extra files in the
program header are assigned temporary files.

There is a test program called tfile.pas which illustrates header file
parameters: it reads a message from the first file argument, and writes
a response to the second.
Run it with a sequence of commands like this:

      echo 'message 1' > file1
      ./r tfile file1 file2

      The program reads and displays the contents of file1, and writes a
      message to file2.  You can show what it wrote with a command like this:

      cat file2

There is also a benchmark program, dhry.pas, that uses /proc/uptime as a file
header parameter and then uses this to provide timing information.
Compile and run with:
            ./r dhry /proc/uptime
This assumes the /proc filesystem is available (eg linux, cygwin),
other systems may need suitable modifications.

If you have the /proc filesystem, and need to pass command line parameters to
your program, write a program with an external text file and try this:

         myprog /proc/self/cmdline


-- files are closed when they go out of scope

Files declared in a procedure or function are closed when that procedure or
function returns.  This is true whether it returns normally to its caller,
or if it executes a goto statement to an outer procedure of function.

Files declared in dynamic memory (ie in a pointer via new) are closed when
the memory is disposed.


-- Short Circuit boolean conditions

In p5c and p5x, all boolean expressions are short circuited, ie boolean
expressions are always evaluated left to right and only as far as necessary
to determine the result.

Consider, for example, this statement:

 var a: array[1..last] of real;
 ...
 while (i <= last) and (a[i] <> c) do
 ....

If i > last, the left subexpression is false, so the whole expression must be
false, whatever the value of the right subexpression.  So it is not necessary
to evaluate the right subexpression.
Besides saving a few instructions of execution time, a[i] will be accessed
only if (i <= last) , so avoiding the danger of accessing the array when i is
larger than the last index.

In other words, the rhs of the and operator is evaluated if and only if
the lhs is true.

Conversely, to evaluate the rhs if and only if the lhs is false, use the or
operator.  eg with an expression like this:

 while (p = nil) or (p^.size=0) do
 ....

Note that there could be trouble in expressions where the rhs of a boolean
expression is function that has side effects.   For example if doSomething(x)
is a function then in this statement

    if (weather = sunny) and doSomething(x) then ...

doSomething will never be called when the weather is sunny.  If you need
doSometing to be called always, it should not be part of a boolean expression.


The pascal standard does not say whether boolean expressions should be
short circuit or fully evaluated.
(And where boolean expressions are fully evaluated, there is no guarantee
whether the left or right hand sides are evaluated first.)

Pascal compilers that evaluate both sides of a boolean expression should
produce a run time error for the above examples.

So code that relies on the behavior of boolean expressions is not portable.


-- Mod & Div

p5c and p5x  correctly implement the mod operator according to iso pascal.
The mod oparator in pascal is not exactly equivalent to the remainder operator.
The difference is when negative numbers are used, eg
   (-15) mod 4 is +1, ie the result of the mod operator should be >= 0 and < 4.
On the other hand, for the rem operator, (-15) rem 4 is -3, ie  -(15 rem 4)

Why does this make sense?  Think about, say, 5 mod 4, which is +1.  Now add 4 to
5, or subtract 4 from 5 and then take the same mod again, ie 9 mod 4 = 1, and
 1 mod 4 = 1.
In fact we can add or subtract 4 as many times to 5 as we want, the result after
taking the mod is always +1.  This is still true if we subtract 4 enough times
to get a negative number, like -15.
 ie 5 - (4x4) = -15, and -15 mod 4 is +1.

By the rules of pascal, -22 mod 3 is interpreted as -(22 mod 3), ie
with the mod done first.  If you need to evaluate (-22) mod 3, use brackets.


-- sets (in particular, size of sets)
p5c and p5x allow sets to contain negative numbers, and have arbitrary size
subject to any limits imposed by your computer's hardware and/or gcc.

A set will use only as many bytes as it needs, subject to some rounding at
the ends,
  eg the set s1 declared as follows will occupy exactly 2 bytes:

            s1: set of 0..15;


A set of integer is assumed to be too big and is truncated to a default size
of [-255..+255].
The default size is only used when the true set size cannot be determined at
compile time, for example in an expression like:

    if [a1,a2,a3] = [b1..b2] then ....

The p5x compiler will issue a warning whenever it cannot determine the size
of a set expression.

A sutable workaround, should it be needed, is to use set intersection to make
the set size known:

    if [a1,a2,a3]*[-10000..10000] = [b1..b2]*[-10000..10000] then ....

where in this case the sizes of the sets [a1,a2,a3] and [b1,b2] are known to
lie within the bounds -10000..10000.  The set limits need to be constants
(so the compiler knows what they are).  These constants don't need to be the
same on each side of the comparison, and are necessary only when the compiler
cannot determine the size of a set expression.
Note:
 - in this example, 2 large temporary sets will be constructed - this is
   likely to be a problem only in systems where memory is severely constrained.
 - [first..last] is a constant limit if first and last are defined as
   constants.
 - p5c can now see an expression like [first..first+100] as a constant
   (thought it's really an expression because they are not simple constants
   on their own).  This ability is limited to expressions containing integer
   constants and the integer arithmetic operators (+, -, *, div, mod)
   along with parentheses.

There are some notes about set size in the implentation section of this
document.


-- conformant arrays
Conformant arrays are fully implemented in p5c.
Conformant arrays allow arrays of any size (but known index type and
component type) to be passed to a procedure or function.

here's an example:

{ return the dot product of 2 vectors a.b }
function dotproduct( a,b: array[lo..hi:integer] of real): real;
var
   i: integer;
   x: real;
begin
   x := 0.0;
   for i := lo to hi do
      x := x + a[i]*b[i];
   dotproduct := x;
end;

procedure test1;
var
  vector1, vector2: array[1..20] of real;
begin
 ...
 if dotproduct(vector1,vector2) = 0 then
    writeln( 'vectors are orthogonal' );
 ...
end;

procedure test2;
var
  myvector: array[0..9] of real;
  magnitude: real
begin
 ...
  magnitude := sqrt(dotproduct(myvector,myvector);
  writeln( 'magnitude of myvector is', magnitude:1:3 );
 ...
end;

Notes:
1. in the function dotproduct the parameters a and b are conformant arrays.
   (a and b are called formal parameters in the literature)

2.  the index range of the conformant array is lo..hi.
    lo and hi are called the bound identifiers and apart from using their
    value, their use is quite limited:
       - they are not constants, so can't be used to define new types, eg
         var i : lo..hi;   { ILLEGAL }
       - they are not variables, so are not assignable, eg
         lo := 1; { ILLEGAL }
       - the bounds must be identifiers (that is thay must have names), so
         this is not allowed:
         a,b: array[1..hi:integer] of real ;   { ILLEGAL, low bound cannot be a constant}

3. dotproduct is called by test1 and test2 with arrays of different sizes.
   (the arrays in the test procedures are called actual parameters in the
   literature)
   The arrays in the test procedures must have index types compatible with the
   index type of a and b in the dotproduct function, ie integer in this case.
   The component types of the arrays in the test procedures must match the
   component type of a and b, ie real.

4. a and b are in a group, ie they share the same type definition, so the types
   of the arrays in test1 and test2 must also match.  If we needed a and b to
   have different sizes, then we would need an extra conformant array
   parameter.

Multi dimensional arrays can be defined like this:
   procedure matrixMultiply( var a,b,c: array[lo1..hi1:integer;
                                              lo2..hi2:integer] of real );
   this is exactly equivalent to
   procedure matrixMultiply( var a,b,c: array[lo1..hi1:integer] of
                                        array[lo2..hi2:integer] of real );

Conformant arrays can even go inside function and procedure parameters:

procedure p(function ff(aa:array[lb..ub:char] of real):boolean);

When calling p, make sure the function you provide exactly matches ff.



-- packing

The packed keyword is recognised as expected, but only records are stored
packed.  All other types are stored the same, whether declared packed
or not.
The standard tranfer functions pack() and unpack() are implemented in p5c.


-- page procedure

This procedure outputs a form feed to the text file.
The exising line is terminated with a writeln if necessary.


-- deviations from the pascal standard

There are just a few of these, mainly to remove restrictions imposed by the
pascal standard. (Let's face it, sometimes it's easier to implement a feature
than it is to block it because the standard says so, then test that the block
is correct.)  Some are inherited from p5 (and p4 before that).

 - mod -ve numbers
   for b<0, (a mod b) = -(a mod (-b)), and b < -(a mod (-b)) <= 0
   for example (-5) mod (-3) is -2

- write format field width can be negative
   a negative field width causes the output to be left justified,
   ie any padding is added to the right.
   This feature is provided for free by gcc (and any standard c compiler).
   NB: the precision field width of a real number must be >= 0

- string index
  The pascal standard implies that the upper string index must be strictly
  greater than one.  In p5x (and many other pascal compilers), is is possible
  to have a string of length one.

 - conformant array passing
   To make compiler writing easier, the pascal standard allows some
   restrictions on passing conformant arrays to procedures with
   conformant array parameters.
   In p5c, the hard work is passed on to gcc, so conformant arrays can
   always be passed to procedures with conformant array parameters with
   no restrictions.

 - calculating exponentials of the form exp( ln(x) * y )
   this and the similar exp( y * ln(x) ) are a way to evaluate x ** y
   (sometimes expressed x^y). p5c and p5x recognise these expressions as a
   special case and optimises them to the (almost) equivalent but far more
   accurate c function pow(x,y).
   Even if ln(x) has the smallest possible theoretical error, when
   used as an argument to the exp function, this error becomes amplified.
   Using the pow() function avoids this, so is more accurate.

 - write(pointer)
   write the value of a pointer to a file.
   The output format is system dependent - in fact it is determined by gcc.

   for example:
      new(p);
      writeln( 'pointer is', p:12 );

   this might result in something like:
       pointer is   0x15b6010

 - ord(pointer)
   create an integer value from a pointer.
   this feature is inherited from the original p4 & p5 compilers.
   Its results are not defined or predictable, ie it's allowed, but sometimes
   it might work, in other cases it might not.  If you're thinking of using
   this so you can write out the value of a pointer, p5x allows you to write
   the value of pointers directly and correctly.  Consider doing that instead.

 - external directive
   p5x can access external functions and procedures with the external
   directive (just like nearly all other pascal compilers).
   For example, suppose you have a C source file with the function myProc
   defined as follows:

     cfile.c:
     void myProc( int a )
     {
       ....
     }

   Your pascal source could then call myProc() via a declaration like this:

     procedure myProc( arg1 : integer ); external; { case of myProc matters }

   Notes:
     - myFunc must be a C function, or appear like a C function.
       This means that external functions can't use names that clash with C
       reserved keywords, eg

          procedure switch; external;   {won't work, switch is reserved by C }

       A function called Switch will work, because it uses an upper case
       letter.
       Additionally, p5x appends _n (where n is 1,2,3, etc) to your pascal
       variable names when it generates c code - this guarantees you can't use
       c reserved words in your pascal code.  So having an external function
       called a_1 in code like this will also generate an error:

       var a : integer;            {translated to a_1 in gcc}
       procedure a_1; external;    {name clash}

     - don't forget to link your C object file with your pascal object file, eg
             p5x mprog.pas myprog.c > myprog.lst
             cc -I. -lm myprog.c cfile.c -o myprog

     - The names of the arguments (arg1 in the above example) are arbitrary.
     - the case of the function or procedure name in the external declaration
       must exactly match the case of the external C function.
       So myProc() in the above example will not link to a function called
       MyProc() or myproc(), etc.
     - Pascal code that calls myproc() does not need to match the case of the
       declaration (this is what you would expect).
       So, you can write code like
          myproc(13);   { OK, not part of the declaration }
          MYPROC(23);
          etc

       In other words, the case in the external declaration matters, but the
       case doesn't matter when calling the function (or procedure).


embedding C code in generated code stream
=========================================

p5c and p5x enable your own C code to be embedded into the emitted
code stream that is generated by p5x (or p5c).  Simply add your own code
inside special comments like this:

      {@@ your ... code ... here @@}

The sample program dirDemo.pas uses this feature to access system level
functions to get a directory listing.

For another example, to make a function inline, do this:

      procedure myProc(arg1 : type1)  {@@ inline @@};

'inline' is a special gcc attribute that is applied to c functions to make
them inlined, and the {@@ inline @@} comment above simply adds the inline
keyword to the emitted c code.
Consult your gcc documentation for other attributes that you might like to use.
There are examples of using this feature in the tp5c.pas test file and in the
clib.inc.pas c library file.

Notes:

- you can use (*@@  ... @@*) comments

- there is no space allowed between the comment and @@ markers, ie '{ @@' will
  be treated as a normal comment.

- any end of comment (ie '}' or '*)' ) between the {@@ and @@} is assumed to
  be part of your c code and does not terminate the comment.

- the code between the {@@ and @@} markers is emitted as soon as it is
  parsed by the compiler.  The c code emitted by the compiler that corresponds
  to your pascal code might be built up, stored internally and emitted
  later.  Check the final c code is what you expect it to be.  In the above
  example, p5x parses the procedure declaration before it sees the 'inline'
  attribute.  On the other hand, p5x doesn't emit the c code for the procedure
  declaration til after the 'inline' atrribute.

- if you use the c preprocessor to preprocess your pascal code, it will strip
  out c style comments before the code is compiled.  The comments won't make
  it into the final c code.  Also #if, #define type preprocessor lines could
  be processed when the pascal code is compiled.



p5x Extensions
==============


Underscores are allowed in identifiers
--------------------------------------
  eg
   function my_function( x: real ): real;

It is an error if an identifier begins with an underscore because all identifiers
starting with an underscore are reserved for system variables.
so this, for example, is not allowed
        _bad : integer;   { error, name starts with '_' }

Support for underscores is very common, so code that uses this feature is likely
to work on other pascal compilers.

Note, however, that some pascal compilers do not allow an identifier to end
with an underscore, or do not allow adjacent underscores.
p5x fobids only an initial underscore.


dollar char is allowed in identifiers
-------------------------------------
eg
   my$var: real;

This feature is not common, so identifiers with dollar characters may fail
on other pascal compilers.

Note that identifiers must not begin with a dollar because hexadecimal numbers
start with a dollar.
so this, for example, is not allowed
        $bad : integer;   { error, this is a hex number }


// style comments
-----------------
comments starting with // are available, comment ends at the end of a line
 eg
    var
// this is a comment
      x : real;
      i : integer; // this is also a comment
      c : char;

This feature is available on many other pascal compilers, but not all.
Delphi, free pascal and gnu pascal all accept this style of comments.
Code containing these comments might work on other compilers.


mixed declaration order
-----------------------
The order of declarations is relaxed, and there can be multiple sections of
declarations, eg

  program x(output);

  const
    c1 = 1.e-5;

  type colour = (red, orange, yellow, green, blue, indigo, violet);
  const firstColour = red;
        lastColour = violet;

  const
     msgLen = 10;
  type
     messageString = packed array[1..msgLen] of char;
  var
     messages = array[1..6] of messageString;

  var
    v1 : real;
  const
    tol = 1.0e-5;
  function solve( var x: real; function f(x:real):real ): boolean;
  ...

  begin{x}
  ...
  end. {x}

This allows related declarations to be grouped together.
Everything must be defined before it is used - as expected.
Also, all pointers must be resolved before the end of any type declaration,
as shown in this example:

type
   prec1 = ^rec1;
   rec1 = record   { this is OK - rec1 is resolved here }
             s    : packed array[1..8] of char;
             link : ^rec2;
          end;
{ !!! ERROR - rec2 is not resolved before the end of the type declaration }

var
   pr : prec1;

type
   rec2 = record  { rec2 is resolved here, but too late }
             s2    : packed array[1..8] of char;
             link2 : prec1;
          end;

Mixed declaration order is very common amongst pascal compilers, so using this
feature is unlikely to create portability problems.


constant expressions
--------------------
Constant expressions for ordinal types can be used wherever a constant is
expected in the source code. Ordinal types are integers, booleans chars and
enumerated types, but not reals or strings or pointers.

Constant expressions can include any of these operators or functions:
   +, -, or
   *, mod, div, and
   abs(), sqr()
   ord(), odd(), chr(),
   pred(), succ()
   not, ()

This is useful when declaring types, constants and arrays, eg

const
   first = 0;
   last = 20;
   size = last-first+1;

If you need the maximum of 2 integer constants, a & b, try this:

   const
     a = something;
     b = otherthing;
     maxab = (a+b + abs(a-b)) div 2;

and the minimum is

   minab = (a+b - abs(a-b)) div 2;


case statements expect constants for the case values, so expressions can be
used here as well, eg

   case e of
     base .. base+offset:  do_something;
     base+offset+1 .. end: do_anotherthing;
     otherwise: warning(bad_value);
   end; {case}

A compile time error is issued when the constant expression overflows or
underflows, eg

  const rubbish = succ(maxint);  {!!! ERROR, this overflows}

The ability to use constant expressions is very common amongst pascal
compilers, but different compilers allow different operators and functions in
the expressions.


command line parameters
-----------------------
The standard procedure

        argv( i, s )

copies the ith command line parameter to the string variable s.

The command line parameter is padded with blanks or truncated to fit the
string variable s if necessary.

The standard function argc returns the number of command line arguments
available.

parameter 0 is the name of the program, so the maximum value of the parameter
number in argv is argc-1.
Any attempt to access parameters outside the range 0..argc-1 is an error.

For example a small segment of code to write out the command line parameters
of a program might look like this:

   var str : packed array[1..20] of char
   ...
   writeln( 'there are ', argc:1, ' command line parameters' );
   argv( 0, str );
   writeln( 'program name is ', str );
   if argc > 1 then begin
     writeln( 'the command line parameters are:' );
     for i := 1 to argc-1 do begin
       argv(i, str);
       writeln( 'argv[', i:1, '] is ''', str, '''' );
     end; {for}
   end; {if}


case statement
--------------
Case statements accept a range for case values, as in the example:

   case e of
     0:        writeln('empty');
     1..12:    writeln('less than a dozen');
   end;

The keyword 'otherwise' is allowed in a case statement to handle case values
not handled by the case constants, eg

   myChar := 'q';
   case myChar of
     'a','e','i','o','u': writeln('myChar is a vowel');
     '0'..'9'           : writeln('myChar is a digit');
     otherwise
       otherLetter := true;
       writeln( 'myChar is not a vowel or a digit' );
   end;

Any number of statements may appear between the otherwise and end reserved
words, there is no need for a separate begin..end.

If the case selector (letter in the above example) does not correspond to
one of the case constants, and there is no otherwise part of the case
statement then a runtime error will occur.  Note that one purpose of runtime
errors is to highlight programming errors, so when the case statement has an
otherwise part that does very little, these programming errors may remain
unnoticed.  (In case you're wondering - another purpose of runtime errors is
to prevent the program behaving unpredictablly and damaging its environment.)

Case statements with ranges and an otherwise part are a very widely supported
by most pascal compilers, so using these features is likely to be portable.


succ() & pred() take an optional second argument
------------------------------------------------
If succ(e) returns the value that follows e, succ(e,n) returns the nth value after e.
Similarly, pred(e,n) returns the nth value before e.

eg
   type
      shape = (triangle, rectangle, circle);

   var
      colour : (white, red, orange, yellow, green, blue);
   ...

   succ(triangle, 0) { yields triangle}
   succ(yellow)      { yields green}
   succ(yellow, 2)   { yields blue}
   pred(red, -1)     { yields yellow}
   pred(triangle, 0) { yields triangle}
   pred(green)       { yields yellow}
   pred(blue, 2)     { yields yellow}

  These are as defined for iso-1026 extended pascal.

  Notes:
  -  the first argument must be an ordinal value (just like the old pred/succ)
  -  the second argument must be an integer.  It may be negative or zero, and
         pred( e, -n ) = succ( e, n )
  -  the result may not exist (eg succ(green,2) above).  If run time
     checking is enabled, then a fatal error is produced.
  -  these variants of succ and pred can be used as an inverse ord function.
     In the above example, ord(orange) is 2.  To go the other way, use succ
     (or pred) like this:

         succ(white, 2) { colour that corresponds to the integer 2 }

     If you can't guarantee that white always will be the first colour, use this:

         succ(white, 2-ord(white))


predefined constants
--------------------
in addition to the standard predefined constant maxint, the following
are available:

  maxreal   the largest real number
  minreal   the smallest positive real number (that has full precision)
  epsreal   the smallest number greater than 1.0 is (epsreal + 1)
  maxchar   the larget value of type char

  these are as defined for iso-1026 extended pascal.

  Note that many systems allow numbers smaller than minreal, but with a loss
  of precision.  Eg, for 64-bit ieee systems, minreal is 2.225074e-308, and
  a smaller number such as 0.005074e-308 is still valid, but has fewer
  significant digits.


file Assign()
-------------
The assign(f,s) procedure connects a named file to a file variable, where f is
a file variable and s is a pascal string variable or constant.
After calling assign, when calling either reset(f) or rewrite(f) the external
file is opened.

If assign() is not used, reset(f) and rewrite(f) open temorary files that
are not accessible to the world outside your pascal program.

For example to read data from the text file 'mydata':

            var f: text;
            begin
              assign( f, 'mydata' );
              reset(f);
              ....

Any trailing blanks are stripped from from the name string.  This means that
trailing blanks in file names obtained from the command line using argv() do
not need to be processed.

If the string is empty (or all blanks), the file variable is not assigned
to any file, and reset or rewrite will open a temporary file (just like
standard pascal does).

Many pascal compilers offer this procedure, so code that uses this function is
very likely to be portable to other pascal compilers.


file extend
-----------
It is possible to append new data to an existing file with the extend
procedure, eg

           var f: text;
           ...
           rewrite(f);
           writeln(f, 'first line');
           reset(f);
           ...
           extend(f);
           writeln(f, 'another line');

It is an error to call extend with a file that does not exist.

This feature is defined in iso-1026 extended pascal.  Many other versions of
pascal have a similar procedure, although sometimes it is called append().


the exponent operator
---------------------
This raises a real number to a real or integer power.
eg, for x raised to the power of 2.3,
    r := x**2.3;

The left factor (ie x in this example) is real, or is converted to real, and the
result is always real.

If the power (ie the factor on the right) is an integer, more efficient code
is produced but the result may be less accurate for very large powers( say
100 million or so).

A negative number raised to a non integer power produces a fatal error unless
debugging is disabled - in which case the results are not defined.

By the rules of pascal, -2**2 is interpreted as -(2**2), ie the ** is evaluated
before the negation.  If you want (-2)**2, use parentheses.

Two or more exponents in a row must have parentheses, eg
   r := (x**2)**3; { OK, this uses parentheses }
   r := x**(2**3); { OK, this uses parentheses }
   r := x**2**3;   { error }

Note that exponents were left out of the original pascal because they
encouraged lazy programming.  For example it is easy to evaluate a polynomial
like this:

    x := 12*x**3 + 8*x**2 + 4*x + 1;

but it can be coded much more eficiently as

    x := (((12*x) + 8)*x + 4)*x + 1;

Many pascal compilers provide a similar ** operator, but vary slightly in the
way they handle integer powers.


relaxed constant string assignment & compare
--------------------------------------------
a constant string does not need to be as long as the string it is assigned to
or compared with.
The string is padded with as many null characters as necessary.

for example,

    type str = packed array[1..9] of char;
    var
        myStr : str;
    begin
      myStr := '123456789';  {standard pascal}
      myStr := '123';        // p5x extension
      if myStr > '122' then
         ...

Similarly, a parameter may be assigned a shorter constant string:

    procedure myProc(s:str);
    ...
    myProc('abc');  // works for parameters as well


Notes:
- many pascal versions provide similar extensions, but may pad with spaces
- it is an error to attempt assignment of longer strings
- only constant strings may be shorter, you cannot have, say,

    myShortStr := myLongString;  // assign string variables
- the string library functions in string.inc.pas and the code in
  tstring.pas show how strings can be used flexibly and efficently in p5x.



external variables
------------------
the external directive can be used with variables in much the same way as
it can be used with functions and procedures.

eg the following accesses a real variable called myvar in an
   external c source file:

  var myvar: real; external;         // case must match
  ...
  writeln( 'myvar is ', MYVAR:1:2 ); // pascal rules - any case is OK

Notes:
 - the case of myvar in the declaration must exactly match the case of myvar
   in the external file.
 - when accessing myvar in other parts of the code, normal pascal rules apply
   and the case is not important (as shown in the above example).
 - the external c file must be linked with your pascal program.  For example
   if you had a c file called ext.c with external definitions in it, you could
   link it with your pascal program with something like this:

          p5x myprog.pas myprog.c > myprog.lst
          cc -I . -lm -o myprog myprog.c ext.c


function and procedure inlining
-------------------------------
the inline directive allows functions and procedures to be inlined, for
example

   function getChar(pos:integer): char; inline;
   begin
      nextChar := buffer[pos];
   end;

So now all calls to getChar are (probably) replaced by the code inside
getChar, and we can expect more efficient code - not least by saving the
overhead of the function calls.

Notes:
 - forward declared functions can be inlined, but only after they have been
   defined.  The inline directive must appear where the function is defined,
   like this:

        procedure forwardProc(var x: real); forward; // declared forward here
        ...
        procedure someProc
        begin
           forwardProc(r);           // cannot be inlined here
        end;
        ...
        procedure forwardProc; inline;               // declared inline here

 - any problems calling inlined procedures recursively are dealt with by gcc

 - the inline directive does not force inlining, it just provides a hint to the compiler.
   gcc inlines simple functions anyway, whether or not they are declared with
   the inline directive.


hexadecimal numbers
-------------------
  hexdecimal numbers start with a $, eg
  const mask = $000f;
  hexadecimal constants can be used anywhere a constant can.
This is a very common feature, so code that uses hex numbers like this is
likely to work on other pascal compilers.


bit operations
--------------
  the normal bit operations are provided by predeclared functions:
    bitand(a,b)   -- return bitwise and of a and b
    bitor(a,b)    -- return bitwise or of a and b
    bitxor(a,b)   -- return bitwise xor of a and b
    bitnot(a)     -- return bitwise complement of a
    rshift(a,b)   -- return bitwise right signed shift of a by b places. The
                     newly vacant bits on the left are filled with the sign
                     bit, ie ones for negative numbers, zeroes for positive
                     numbers or zero.
    rshiftu(a,b)  -- return bitwise right unsigned shift of a by b places. The
                     newly vacant bits on the left are always filled with
                     zeroes.

 A right shift by a negative amount is interpreted as a left shift, and the
 newly vacant bits on the right are filled with zeroes.

  as an example, for
           a := $ff00; b := $f0f0;
  then
       bitand(a,b) returns $f000
       bitor(a,b)  returns $fff0
       bitxor(a,b) returns $0ff0
       bitnot(a)   returns $00ff

       rshift(a,4)   returns $0ff0
       rshift(a,-4)  returns $ff000
       rshiftu(a,4)  returns $0ff0
       rshiftu(a,-4) returns $ff000

  So, if a is a positive number, then rshift & rshiftu produce the same
  result. Also, they produce the same result for a left shift, ie when b
  is negative.

  On the other hand, if a is negative, and b is positive the most
  significant bits of the result will be 0 for rshiftu and 1 for rshift.
  eg, assuming 32 bit integers
      rshift( $ffffffff,4 )  returns $ffffffff
      rshiftu( $ffffffff,4 ) returns $0fffffff

  Most pascal compilers that allow bitwise arithmetic on integers allow
  the 'and' and 'or' to operate on integers, and may also introduce new
  operators such as xor and shl.  This can create precedence problems, which
  do not occur when the bit operations are implemented as functions.
  eg what does this mean:
     a or b xor c
  is it
     (a or b) xor c
  or maybe
     a or (b xor c) ?

  To create portable programs, you may need to write your own bitxxx functions
  that use the corresponding bit operators.
  eg
     { my bitand function }
     function bitand(a,b: integer;
     begin
        bitand := a and b;
     end; {bitand}

  If you're thinking of using the bit operators to extract bit fields from an
  integer, you might like to consider using a packed record instead.
  eg, consider assigning digits of a bcd encoded integer:

  var bcdValue: integer;
  ...
  tensDigit:= 3;
  bcdValue := bitor( bitand(bitnot($000000f0), bcdValue), {clear tens digit}
                     rshift(tensDigit, -4) );             {assign new value}

  or, alternatively:

  var bcdValue: packed record
                   units    : 0..9;  { bits 0..3, ie ls 4 bits }
                   tens     : 0..9;  { bits 4..7 }
                   hundreds : 0..9;  { bits 8..11 }
                   thousands: 0..9;  { bits 12..15 }
                end;
  ...
  tensDigit:= 3;
  bcdValue.tens := tensDigit;


GetTimeStamp()
--------------
There is a predefined type called timeStamp that is defined as follows:

type timeStamp = packed record
                    year: integer;
                    dateValid,
                    timeValid: boolean;
                    month: 1..12;
                    day:  1..31;
                    hour: 0..23;
                    minute: 0..59;
                    second: 0..60;  {leap second!}
                    day_of_week: 1..7;
                    isdst: boolean;
                    dstValid: boolean;
                 end;

There is also a predefined procedure called getTimeStamp that finds the
current date and time.  eg

        var t: timeStamp;

        begin
            getTimeStamp( t );
            writeln( 'the year is ' t.year );
            ....

This is almost identical to the terms described in the extended pascal
standard (iso 10206).  Differences include:
- dateValid and timeValid are both true or both false (in fact they are
  different names for each other)
- if time and date are invalid, the other fields are undefined
- the seconds field accounts for leap seconds
- day_of_week is an extra field, 1 = Sunday, ... 7 = Saturday
- dstValid is true if the daylight saving information is known, otherwise
  it is false.
- isdst is a daylight saving flag, true if current time is daylight saving
  time.  Otherwise it is false.
  If dstValid is false, idst will also be false.



halt
----
This procedure does what it says - it halts the program with exit code 0.
Optionally, it may take an integer argument for a specified error code.

    halt;    // exit code 0
    halt(99) // exit code 99

Use it when something has gone wrong, and there is no hope of recovery,
for example:

   if eof(myFile) then begin
      writeln('unexpected end if input, quitting');
      halt;   // what else can we do?
   end;


The c preprocessor
==================

Although the c preprocessor is not part of pascal, it is possible to use it
with p5c.  This allows features such as

 - include files.
 - define macros
 - conditional compilation

It is important to remember that the c preprocessor processes your file before
it is compiled.  It simply manipulates the text in your file without knowing
anything about the program code.

An include file will typically contain code that you need to share across many
separate programs - a collection of constants, perhaps, or some special
functions that you need to use over and over.

Say you have a library that can print data in some special format and you need
to use it in many programs.

Put your code in a separate file, say myLib.inc.pas, and include this each of
your programs with a line like this

# include "myLib.inc.pas"

Note:
- the '#' char must be in the left margin.
  This is true for all c preproccessor directives.
- the file mylib.inc.pas must be in the current directory.
  It is possible to have it in another directory if you include the file name
  in <angle brackets> instead of "quotes", and if you also add a -I option to
  your build command line.  Consult you gcc documents for specific details.


Macro definitions are useful for small repeated sections of text, with or
without parameters.

The file cdefs.inc.pas contains macro definitions for your system's real
numbers, eg the number of significant digits a real numeber has is defined
like this:

#define REAL_DIGITS 15

As an example, here's a small program that uses this file to test your
system's real numbers:

    program testReal(output);

    #include "cdefs.inc.pas"

    begin
       {field width is real digits + sign + '.' + exp width}
       writeln('the biggest real is ', REAL_MAX:REAL_DIGITS+7);
    end.

Notes:
- the c preprocessor definitions, eg REAL_DIGITS, are known to the c preprocessor
  only, they are not pascal variables.  Unlike in pascal, the case matters,
  so Real_Digits cannot be used instead of REAL_DIGITS, and the '_' is
  allowed.
- the file cdefs.inc.pas is auto generated at the same time as p5x to match
  whatever values your system provides.

Here'a an example of a macro with parameters:

   program testMac(output);
   #define MAX(a,b)  (((a) + (b) + abs((a) - (b)))/2)
   begin
      writeln(' max(3,5) is ', MAX(3,5) );
   end.

Similarly, we could have written a MIN macro by replacing the addition with a
subtraction, as follows:

   #define MIN(a,b)  (((a) + (b) - abs((a) - (b)))/2)

So this can be a very useful feature, but watch out.  Consider this macro to
generate the cube of a number:

#define CUBE(z)  z*z*z

Now let's suppose we want this
   n := 2;
   writeln( '(2n+1 cubed is ', CUBE(2n+1) );

We expect the answer 3 cubed = 27, but get 13.
Why?  Looking at the list file created by p5c, we see that the c preprocessor
has generated the statement

   writeln( '(2n+1 cubed is ', 2*n+1*2*n+1*2*n+1 );

and since multiply has a higher precedence than +, it is evaluated like this:

   writeln( '(2n+1 cubed is ', (2*n) + (1*2*n) + (1*2*n) + 1 );

So we should always use parenthese with macros, so our revised CUBE macro
is now:

#define CUBE(z)  ((z)*(z)*(z))

There is yet another pitfall to be wary of.  Consider a function f that has
side effects - say it prompts the user to enter a number.  Let's now suppose
we want to print the cube of this number.  We might be tempted to write:

   function f: integer;
   var i:integer;
   begin
       write('please enter a number ');
       read(i);
       f := i;
   end;
       ...
       write( CUBE(f) );  {get anumber from user, cube it}

What happens? The user is prompted 3 times for the number!  With macros, it is
not always so obvious when something is not behaving as we expect.

    We can write instead:
    num := f;             { get a number }
    write( CUBE(num) );   { cube it }


Here's an example of condtional compilation.  It includes or excludes some
part of the program according to whether a prticular feature has been enabled.

#define HAS_FEATURE

program p(...)
 {... code ...}


#if HAS_FEATURE
 { code to implement feature }
#else
 writeln( 'feature not implemented' );
#endif

There's a small example in the test file tp5c.pas which optionally
compiles code for interactive testing of standard input when TEST_STDIN is
defined as a preprocessor symbol, eg by running it with the command:

        ./r -DTEST_STDIN tp5c

Normally, the code is omitted so that it can run unattended.


The c preprocessor provides other features:

cpp strips out // style comments and /* ... */ comments, so with the c
preprocessor, you can use comments like this in pascal:

         writeln( 'hello world' );   // comment til end of line

or this
        /** provide answer to life, the universe and everything */
        function answer : integer;
        begin
           answer := 42;
        end;


Also, There is a line number macro called __LINE__ that you can use like this:

          if somethingBad then
             writeln( 'problem found at line ', __LINE__ );


Note that you can't use all the features of cpp, for example __DATE__
produces illegal pascal code.

See the cpp documentation for gcc or just about any other c language
description for more details.


If you use the r build script, you can force using the preprocessor by
starting one of the first 10 lines of you pascal program with a '#' char.



Debug Checks
============

As mentioned earlier, with the debug option turned on the p5x compiler
generates extra code to self check your program as it runs.
This will locate most range errors, memory errors, etc without you needing
to figure it out later.

Except for memory debugging, the options can be turned on or off as required.
For example to check a particular fragment of code:

    {$d+}
    { generate an error if i is outside the array bounds }
    if a[i] > 20 then
    {$d-}

    { no checks here }

Memory debugging needs to check the whole program, not small fragments of
code.  So it is global, ie it is turned on for the whole program, or it is
off for the whole program.
To turn it off, put a {$d-} comment before the program heading.
If there is a {$d+} or nothing before the program heading then memory
debugging is enabled. eg in this example, memory debug is enabled, but all
other debugging is disabled:

{$d+}
program myprog(input, output);
{$d-}
  ...

When Memory debug is enabled, the p5x compiler gathers information about
memory as it is allocated and then checks it as pointers are used.
Finally, at the end of the program, it reports any memory that has not been
disposed (ie memory leaks).


These errors issue a warning:
-----------------------------

- not disposing memory,
Memory allocated with new but then not disposed causes a memory leak.
This becomes important only when your program is likely to run out of memory.


Other errors are programming errors so are fatal:
-------------------------------------------------

- accessing an invalid pointer
  The pointer is not pointing to memory that has been allocated with new().
  This can be caused by a pointer being used before it is initialised
  with new, or being corrupted, say as part of a variant record.

- accessing the nil pointer
  pointer could be nil because no value was ever assigned to it.

- attempting to dispose memory twice

- attempting to use disposed memory
  eg
     dispose(p);
     dispose(p^.next); {pointer p has been disposed, so cannot be used again}

- using a file that is not yet defined
  eg
      program test(output);
      var f:text;
      begin
        reset(f);  { f is not defined since nothing has been written to it }

  Use rewrite(f) to create a file before reset(f);

- writing to a file opened for reading

- reading from a file opened for writing

- file errors when reading char, integer or real
  this can be caused when the program is trying to read data that is in the
  wrong format, or trying to read past the end of the file, so there is
  not enough data.  A hardware error could also cause this problem.
  For an example of data in the wrong format consider reading an integer
  from a text file containing "abc".  This causes a fatal error since an
  integer cannot be read.

- accessing an array outside its bounds
  eg
       a : array[1..9] of integer;
       ...
       z := a[10];  { error, there is no a[10] }

- result of sqr(n), trunc, round > maxint
  eg
     x := 20.0 * maxint; { OK, if x is real }
     i := trunc(x);      { error, since trunc(x) > maxint }

- result of chr() larger than last character

- bounds overrun or underrun errors for pack()
  the pack function copies components of an unpacked array to a packed array.
  It copies enough components to exactly fill the packed array, starting from
  a given index in the unpacked array. If this index is too large, there will
  be too few components in the unpacked array to fill the packed array.
  eg
      a  : array[1..9] of integer;
      pa : packed array [1..4] of integer;
      ...
      pack( a, 7, pa); { error, copies a[7], a[8], a[9] & a[10], ...
                         ... but there is no a[10] }

- bounds overrun or underrun errors for unpack()
  see comments for pack()

- range errors in assignment
  eg
       var percent = 1..100;
       ....
         percent := 0;  { 0 in out of range of the values in variable percent }


- range errors in parameter substitution
  see comments for range error in assignment

- out of bounds errors for read integer or char from a file
  see comments for range error in assignment


- case statement with no case constant matching the case expression
  for example, a statement like this will case a fatal error:

    {$d+}   {-- turn debugging on if it isn't already on}

    case 13 of       {error, there is no case for 13}
      1: write( 1 );
      2: write( 2 );
      3: write( 3 );
    end;

  debug must be turned on at the point that the case keyword is seen, eg

       {$d+} case {$d-} e of  {will detect a case error}
       ...
       end;

  whereas in this case statement, debug is turned off where the
   case keyword is, so there is no debug checking:

       {$d-} case {$d+} e of  {will not detect a case error}
       ...
       end;

- no return value assigned to a function
  eg
    {$d+}
    function f : integer;
    begin
      if someCondition then
         f := 99;
    end; {error: if someCondition is false then f has no return value}

  this option is enabled for the function if debug is enabled at the point of
  the function keyword.  For example, this function is not checked for
  unassigned return value:

    {$d-} function {$d+} notest: integer;
    begin
      writeln( 'unassigned return value not detected because debug disabled' );
    end; { notest }

- division by zero (including mod)

- pred & succ extend a variable outside its declared range
  eg
       type colour = (red, yellow, green);
       ...
       next := succ(green)  { error, there is no value that follows green }

- sqrt of a number less than zero

- natural logarithm of a number less than or equal to zero

- exp(x) exceeds the largest representable real number (typically about 1.8e308)

- integer overflow
  This happens when the result of any integer add, subtract or multiply cannot
  be represented by an integer.
  ie, the result is bigger than maxint, or smaller than -maxint (or more
  likely maxint-1).
  eg
        var i,j: integer;
        ....
          i := maxint - 2;
          j := i+3;            { this is bigger than maxint }


- record variant mismatches tag
  this is controlled by the {$v+} option, see comments above

- a negative number raised to a non-integer power
  eg
    x := -1.2**0.8; {error, only integer powers of negative numbers are allowed}

- attempts to access command line parameters with index outside the range
    0 .. argc-1


Tools
=====

There is a bash shell script called pc that compiles a pascal program.

To compile tp5c.pas, for example, type something like this

 ./pc tp5c

It is set to assume that the p5x compiler and p5x.h are in the directory
defined by the P5CDIR shell variable.

To get a quick help message, run the script with no arguments,

   ./pc


If you need to compile with optimisation turned on, you can add an optimise
option, eg

   ./pc -O1 tp5c

Other optimisation options are available, depending on your version of the gcc compiler.

This script can be called with -cpp as an option,

 ./pc -cpp tp5c

This runs the c preprocessor (cpp) on the pascal source file.

Alternatively, the pc script will run the c preprocessor if it finds one of
the first 10 lines of the pascal source file starts with a '#' character.

You can define preprocessor symbols on the command line with the -D<something>
option. There are 2 forms, for example:
        ./pc -DFOO tp5c
        ./pc -DSIZE=100 tp5c

The first form defines a preprocessor symbol FOO with a default value of 1.
You can use this form for code like this:
#if defined FOO
   ... code that depends on FOO
#endif


The second form defines a preprocessor symbol SIZE with a value of 100.

Defining a value on the command line also forces use of the c preprocessor.

Any other options on the command line are passed directly to the gnu c
compiler.  See your gcc documentation for details. For example, the -v option
shows the phases of the gcc compilation:

       ./pc -v tp5c

The script has logic that determines whether tp5c.pas needs to be compiled or
not, so can avoid recompilation altogether.
Recompilation is necessary on any of the following conditions:
          - the executable file does not exist
          - the source file has been updated
          - the compile options have changed
          - any of the include files in the source have changed


There is another bash shell script called r that similarly compiles and
then runs the pascal program.

To run tp5c.pas, for example, type something like this

 ./r tp5c

To get a quick help message, run the script with no arguments,

   ./r


p5x Utility Programs
=============

pan - the pascal analyser
-------------------------

pan uses gcc's deep code analysis to find bugs and potential bugs in your
code, eg uninitialised variables, overflow, array bounds errors.
It also reports unused variables, functions etc.
It can't identify all errors, but can still quickly spot otherwise hard to
find bugs.

call it like this
   pan -N myprog

where N is a number between 0 and 3 and specifies the level of detail
(and accuracy) in the output report.  'myprog' is your pascal source code,
without the extension (ie .pas or .p).

level 0 looks for uninitialised variables only.
leve1 1 does the same and also looks for unused variables, parameters and
   functions, etc. It uses a deeper level of gcc's code analysis, so can find
   errors missed by level 0, but now the line numbers are less accurate.
level 2 looks for even more errors and potential errors.
level 3 gives an even more detailed report, but sometimes it has a higher
   chance of report reporting false positives (ie code that is merely
   suspicious but otherwise correct).

Level 1 is the default level.
If one of the first 10 lines of your pascal code starts with a '#', the code
is passed through the c preprocessor before being compiled.
pan also accepts an option to run the c pre-processor over the code before
doing the analysis, eg

   pan -cpp myprog

Notes:

-  pan automatically uses the {$n+} option to enable the correct
   (or approximate) line numbers. You don't need to do this.

- all global variables are initialised to 0 by gcc, so pan will never report
  an uninitialised global variable.

- array components that are not initialised are reported like this:

   ‘a[+1]’ is used uninitialized ...

  gcc assumes all arrays start at index 0, and a[+1] means index 1 relative to
  the first index. For example, suppose we have an array

     var ac: array['a'..'z'] of integer;

   and pan reports something like this

     ‘ac[+1]’ is used uninitialized

   then the +1 means that ac['b'] has no initial value, since ac['a'] in
   pascal corresponds to ac[0] in gcc.

- details of the inner workings of this can be found in the documentation for
  your version of gcc.  Look for the section on warning options.


ppr - the pascal printer
------------------------

ppr is a simple script that uses the a2ps program to format a pascal source
file and send it to the local printer.
The settings, like landscape or portrait mode, can be changed by editing the
ppr script.  Consult your a2ps documentation or type 'man a2ps' or 'info a2ps'
for details of available settings.


rv - the pascal memory checker
------------------------------

rv is a shell script that finds uninitialised data and memory that is not
disposed in your program.

rv is called and behaves almost the same as the r build and run script.
The difference is it runs your program in a virtual machine (called valgrind)
that checks every memory reference.  Any references that use uninitialised
data are reported.


coverage analysis
-----------------

How do you know how good your test code is?
One indicator is code coverage - ideally your tests should exercise every line
of code in your program.  In practice this is very difficult, so >90% coverage
is regarded as good.

The tcov.sh shell script is a template to help you analyse code coverage.  It
cannot know how to run your tests, so you need to edit this script to run
them.  Everything else should already be there.

There are four steps to find code coverage:

1 clear the counters.  This is really optional, if the counters are not
  cleared they just count up from the previous run.

2 build an instrumented version of your program.

3 run the tests.  This is the part of tcov.sh that needs to be modified for
  the program under test.

4 show the results.  tcov.sh looks for a program called lcov to turn the
  results into an html page to display with the konqueror web browser.  If
  lcov does not exist the results are displayed as text.

  You may want to modify this part of the script to use your favourite
  display tools rather than lcov and konqueror.  gcovr may be a good
  alternative, see http://gcovr.com/


p5c Implementation Notes
========================

See Pemberton for outline of program structure
See all the docs in p5 project.
you might need a c reference.
    eg "The C book". avaialable at http://publications.gbdirect.co.uk/c_book/
See gcc docs.   gcc has important extensions that are needed by p5c and p5x


Translation to C:

c and pascal are similar in many respects, but a few difference do exist.
Pascal has strong typing so is able to find more errors at compilation time
rather than at run time, etc, etc

Both have a similar approach to data & code, so much Pascal data code
translates directly to c:, eg
    integer       --> int,
    record        --> struct
    while         --> while
    function f(x) --> f(x)

There are, however, enough differences to prevent simple direct compilation:


-- arrays are different.

At first sight they appear the same since they have the
same syntax: array1[i] is the ith component of the array called array1.
In c, however, array1 is not an array as a pascal user would understand it,
but a pointer to an array.
This c code
   array1 = array2
does not copy the contents of the array, it copies the pointer to some array
(or is illegal).
So in this example, array1 & array2 now both refer to the same array, and in c
 the statement
   array1[1] = 5
also sets array2[1] to 5.

p5c and p5x work around this by wrapping all arrays inside structs, so
a pascal array declaration:

type MyArray = array[ 1..9 ] of integer;

is compiled to a c declaration like this:

typedef struct {
   int component[9];
} MyArray;

Now, if arrays a and b are of the same type (or are strings), the pascal
 assignment
 a:= b;
compiles directly into the corersponding c assignment
 a = b;
with the results the pascal programmer expects.


-- goto is different.
c allows local goto only and p5x compiles a local pascal goto into a c goto.

pascal also allows interprocedural gotos, and these are implemented using
c's longjump function.


p5x and p5c use the symbol table produced from compiling the declarations
to create the equivalent c declarations.
In the code generation phase, p5x accesses this table to create
c statements from the pascal statements.

compilation of declarations and compilation of code will now be considered
 separately:


Compiling Declarations:
=======================

labels are compiled into jump buffers.
If a label is used in a subprocedure a jumpbuffer is set up at the start of
the procedure.  Otherwise the jump buffer is left unused, and gcc is free to
optimise it out.

constant identifers are kept internal to the compiler, so no corresponding c
code is emitted.

type declarations
a c type declaration reverses the order of a pascal declaration, eg
   i : ^integer;     ===> int *i;

This is done by procedure genCType

r : record .... end;  ==> struct { ... } r;

but note arrays:
in c, an array is just a pointer, so if arrayA & arrayB are arrays,

  in pascal arrayA := arrayB copies all the components of arrayB  into arrayA,
  and assigning a new value to one of the elents in arrayA has no effect on
  arrayB.

 In c, on the other hand,  the statement arrayA = arrayB makes arrayA point to
the same components as arrayB.
Now a change to one of the components to arrayA will be seen in arrayB as well.

To overcome this, p5x wraps pascal arrays are inside a c struct.

eg

arrayA : array[0..9] of integer;

is translated into the c code

struct {
int component[10] } arrayA;

so that all components of arrayA can be copied when assigned.


sets:
sets are declared in the same way as arrays are.


pointer declarations are different:

In pascal, pointers are resolved at the end of the type declaration block to
enable recursive data structures.
In c, recursive data structures are implemented by forward referencing
structs.

So in pascal, we might have for example:

pr = ^rec1;
rec1 = record
         i : integer;
         link2 : ^rec2;   { this is a forward reference to rec2  }
       end;

rec2 = record
         i : integer;
         link1 : ^rec1;
       end;

In c, these records become

struct rec2;        /* this is a forward reference to rec2  */
typedef struct {
    int i;
    rec2 *link;
} rec1;

rec1 *pr;

typedef struct {
    int i;
    rec1 *link;
} rec2;

Note how the order of the declarations needs to change.
The delaration of pr needs to move down so it is declared after rec1.
Also, the declaration of rec2 is split into 2 parts so that rec1 can forward
reference it.


pascal recursive declarations like self = ^self are not possible in c, so
void * is used instead.

This means that type declarations need to be ordered differently for the 2
languages. In fact the pascal type declaration order is ignored and the
c order of the types is determined from scratch.

This reordering is handled by the procedure declareType.
The declareType function also detects and handles pointer recursion

example

  pascal:

   type
       p1 = ^someType;

       r1 = record
              link2 : ^r2;
              info  : someOtherType;
             end;

       colour = (red, orange, green);

       r2 = record
              link1 : ^r1;
              see   : colour;
            end;

       sometype =  ...

In c, on the other hand sometype must be declared before the pointer
to it, and record r2 must be forward declared so r1 can access it.
Also colour needs to be declared earlier than r2, so if r2 is moved the colour
may need to be moved as well.

The equivalent c code might be like this:

     typedef declaration for sometype;
     typedef someType *p1;
     typedef struct r1 r1;

     typedef int colour;

     typedef struct r1 {
             r1 *link1;
             colour see;
             } r2;

     typedef struct r1 {
             r2 *link2;
             someOtherType info;
             } r2;



identifier names need to be unique, so the declaration level is appended to
the name.  eg for the declaration if an integer declared at the program level,

 i: integer;

might become

  int i_1;

This prevents pascal identifiers clashing with c reserved words, and
 c can refer to the id at the correct level. (this could otherwise happen in
 some pathological cases.)


procedures and functions

These are compiled directly.
gcc allows nested functions, so nested pascal functions are simply compiled
into nested c functions.



Code Compilation
================

This is mostly straight forward.

pascal p5 and hence p5c and p5x are implemented by a recursive descent
compiler.  See Pemberton for a complete explanation.

This means the compiler has almost no memory of the code it has just
processed.
code needs to generated in the same sequence as it is encountered.

p5 (but not p5c or p5x) produces code for an abstract stack machine.
So for example, the p5 compiler takes the following steps when compiling
the expression
       'string1' < 'string2'

- encounter 'string1', push it onto the stack.  At this point, the compiler
  doesn't know what the remainder of the expression will be, and has lost
  information of the previous code.  The compiler will be in the same state as
  if 'string1' is a procedure parameter;

- encounter the '<' relational operator. This is simple enough to remember.

- encounter 'string2', and push it onto the stack

- recall the '<' operator, and compare the strings on the top two stack
  positions.  Replace them with the result of this comparison, ie with true
  or false.

p5c and p5x, in contrast, compile this into c code, so need to produce
something like this:
   (strcmp( "string1", "string2" ) < 0)

Note that p5c needs to emit the strcmp before "string1".  This is not
the order in which it finds them in the code, so it needs to remember
the "string1" until it knows what to do with it.   What if 'string1' was deep
inside an array of records of records of arrays of packed arrays of char
 (or worse)?

For this and other reasons, p5c builds expressions into trees so it can change
  the order in which it emits things.

So for above example, when p5c find the 'string1' < 'string2' expression,
it does this:
- encounter 'string1', make a node for 'string1'
- encounter <, make a node for < string operator, put node for 'string1' on
  left child
- encounter 'string2', make a node for 'string2', put onto right child of the
  < node

Now when the expression has been completely parsed, p5c can use the tree to
generate code. It starts at the root of the tree, ie the string < operator
node, and knows that it needs to emit something like this:

- emit "(strcmp("
- emit left child, ie "string1"
- emit ", "
- emit right child, ie "string2"
- emit ") < 0)

As previously mentioned, The arguments of the strcmp() function could be
arbitrarily complex, and this is easily handled by traversing the left and
right children of the tree.

Also note that this could be part of a larger expression, such as
       ('string1' < 'string2') and ( .... )
in which case there would be a much larger tree, of which the above is
just a subtree, in this case forming the left child of an and-operator node.


For a further example, consider writeln( n:w ), where w is an arbitrary
expression for the total width of the output.  n itself could also be an
arbitrarily complex expression.

the equivalent c code is
      printf( "%*i", w, n );
So the expression for w needs to be emitted before the expression for n. This
is the reverse of the order they appear in the pascal code, so the
expressions need to be remembered in order to generate valid c code.
This is easily accomplished if expression trees are used.


for loops need careful consideration:
    - the end condition must be executed only once
    - the loop variable is assigned only if the loop is entered.
    - be careful not to allow overflow of the loop variable


sets
----

sets are not part of the gcc language, so need to be specially implemented
some simple optimisations are done, eg p5c may keep lists of const elements
and var elements of sets.

Sets are implemented with a c struct containing an array of elements:

     struct { uint8_t element[n] }  mySet;

where n is the number of bytes (ie uint8_t) needed to hol;e the set
elements.

A set union is a  bitwise or operation on the bytes and set intersection
is a bitwise and operation.  Set difference is slightly more complicated
- it is a bitwise and with a bitwise compliment, eg setA - setB becomes
        setA & ~setB

p5c and p5x construct expression trees for sets before generating c code.
This enables analysis of set expressions for possible efficiencies and to
determine how to implement the set operations.

Many pascal compilers make all sets the same size, say 256 bits.  This is
simple to implement, but it is wasteful for small sets and not big enough
for large sets.  In this case a small set like
   colour = set of (red, orange, yellow, green, blue, indigo, violet)
takes 256 bits, even though only 7 are needed.
Also, with fixed set sizes something like this is not possible
   primes = set of 0..50000;

To avoid both these problems, p5c and p5x do not compromise and
make sets the correct size.
This, however, creates different problems to be solved.

Firstly, any alignment problems can be solved if the elements are aligned
in bytes where bit nr = (value mod 8), so mixed sets can be manipulated
without having to shift bits within the sets.

For example this set:

  myset : set of 30..40;

will occupy 3 bytes, with the first byte holding the element values 24..31,
even though only values 30 & 31 are used.  The second byte will hold values
32..39, and the third byte will hold the values 40..47 with only the value 4
being used,

A bigger question is what size does a set need to be to evaluate a
given expression?

For a statement like
             s1 := setExpr;

the answer is straightforward: the size of setExpr needs to be the
same size as s1.  As the tree for setExpr is traversed to generate code,
the required set size is passed to each child node.
For example, in the statement

             s1 := s2 + s3;

where s1, s2 and s3 are sets of different sizes, we now know that the
size of the expression (s2 + s3) must be the same as the size of s1,
whatever the sizes of s2 and s3 are.
The expression (s2 + s3) forms a simple tree with the '+' at the root,
and s2 and s3 forming the left and right branches respectively.
So code generation starts at the root with the size of s1, and recursively
passes this known size to each of its child branches.  When control
returns to the root node, it has 2 sets of exactly the correct size that
it can logically or-op together to produce the result.
This recursion is handled in the procedure genCCode().
The procedure constructSet() generates a set with given bounds.


The set "in" operator doesn't need to construct a set to evaluate its
expression.
Consider, for example, a statement like this:
   if e in [ x, y, z ] then ...

It is more efficient to check if e matches one of x,y,z directly than
it is to put x, y, & z into a set then check if e is in the set.
This also sidesteps the issue of how big a set is needed to hold x,y & z.

Similarly, where s1 and s2 are sets, a statement like

  if e in s1 + s2 then ...

is translated into if (e in s1) or (e in s2) then ...

Also e in s1 * s2 is translated as (e in s1) and (e in s2)

and e in s1 - s2 is translated as (e in s1) and not(e in s2)

Since a set expression is represented by a tree, abitrarily complex set
expressions on the right hand side of the "in" operator can be compiled
recursively without needing to construct intermediate sets (whose size we
might not know).
See the function isMember() for implementation details.


We can also use expression trees to analyse set comparisons.
As the tree is being constructed, each node combines the sizes of its
children to form a new size for its result.

The lower bound of a '+' node (set union) is the minimum of the lower bounds
of each of its children.  Similarly, the resulting upper bound is the maximum
of the upper bounds of its children.
For example the resulting bounds of a subexression like,
    [-12..3] + [8..80]
are -12 and 80 for the lower and upper bounds respectively as this
diagram shows:

 left side    -->         [-12................3]
 right side   -->                  [8................................80]
 left + right -->         [-12.......................................80]

The results are the opposite for a '*' node (set intersection), ie
it's the bigger of the two lower bounds and the smaller of the two upper
bounds.
eg the resulting bounds of
    [ 2..20] * [10..30]
are [[10..20], as shown here:

 left side    -->         [-12................3]
 right side   -->                  [8................................80]
 left * right -->                  [8.. ......3]

Finally set subtraction dooesn't change bounds, ie the bounds of s1 - s2
are eaxctly the same as the bounds of s1.
So the resulting bounds for a '-' node are the exactly the bounds of the
left child as shown here:

 left side    -->         [-12................3]
 right side   -->                  [8................................80]
 left - right -->         [-12................3]


Note that if we say the empty set has a lower bound of maxint and an
upper bound of -maxint, these results work as expected for empty sets.

See the procedure findResBounds() for the implementation details.

Suppose we have constructed a tree for a set comparison, where we have
left and right set expressions and a compare operator.  We also know the
bounds each of the set expressions needs to be to determine its
value.  These two bounds are not necessarily the same, and not necessarily
the set bounds needed to make the comparison. What happens next depends on
the compare operator.

For equality, we have
      setSubExpression1 = setSubExpression2
To compare the two sub-expressions, the left and right hand sides must have
the same bounds.
The lower bound of both the left and right sides must be whichever
is the mininum of the two.  Similarly, the upper bounds of each side must
be whichever is the maximum of the two.
A simple example should make this clear:

s1 : set of [0..20];
s2 : set of [10..50];


if s1 + [24] = s2 + [15] then
   writeln( 'equal' );

While the tree for the set expression  s1+[24] tree is being created, the
bounds of the left hand side of the = operator are determined to be [0..24],
as described above.
Similarly, the bounds of the set expression s2+15 are [10..50].
Each side of the = operator must now be constructed in a set with bounds
[0..50] so that the sets can be compared.

For set inequality, ie <>,  the bound calculations are similar.

For set inclusion, we might have
      setSubExpression1 <= setSubExpression2
Now the bounds of the whole expression are determined solely by the bounds of
setSubExpression1.  Any elements of setSubExpression2 that lie outside these
bounds do not contribute to the result.

So the size of most set expressions can be determined at compile time, but
there is still a problem with expressions like
   if [i,j,k] = [m..n] then ...
where all the varaibles have values that are unknown at compile time.

We know the size of expressions like [x..y], ie -maxint..maxint, but this
cannot be implemented in practice.  At this point, the compiler issues a
warning and uses a default set size of [-255..255].

If this is not suitable, there are a number of options:

- Recalling the rules for the size of set intersection expressions, we find
  that a workaround is to use set intersection to give the compiler
  the information it needs to determine the set sizes, eg
        if [i,j,k]*[0...3000] = [m..n]*[0..3000] then ...
  where in this case 0 and 3000 are known to be the bounds of the set
  expressions.  This is the recommended option.

- To change the default set size in p5x from 255, change the value of the
  constant setMax in pcom.pas and recompile.

- Alternatively, assign the expressions to a set of suitable size, eg
     s1 := [i,j,k];
     s2 := [m..n];
     if s1 = s2 then ...   { now the compiler can use the correct size sets }

If the set size implied by these options is impossibly large, you may need to
consider the {$Z+} compiler option described below.


We have seen already the p5c implementation needs to solve the difficulties of
alignment of the elemnts and determining the size of sets inside set
expressions.  There is another problem to be solved - how do we determine if a
set's bounds are exceeded?

This example is simple enough:
   mySet : set of 1..10;
   ...
   mySet := [0,5];  {!!! ILLEGAL: 0 is out of bounds}

Is it sufficient to check every element in a set expression and issue an error
if any are out of bounds?  Unfortunately not.  This expression is similar, but
it is now not obvious if it is illegal or not:

   mySet := [0,5] - [i];

We don't need to know where each element is of the set expression is, just
whether any element is outside the bounds of mySet.

For now, assume a function, isNonEmpty(expression, range), which checks if
there are elements of the set expression in the range.

Recall that in p5c, we can analyse setexpressions because they are implemented
by trees, and let's say the expression in the above example is represented by
the tree T.

We can now see that our set expression has an error iff

   isNonEmpty(T, -maxint..0) or isNonEmpty(T, 11..maxint)

Also,
        s0 := s1 + s2;

is in error if either s1 or s2 has an element that is out of bounds.
In other words, set expressions that are unions can be split up and examined
individually.  So our isNonEmpty function doesn't need to consider the set +
operator when it is at the top level.

When there is a set + operators at lower a level in the expression tree, we
can use the rule:

   (s1+s2)*s3 = s1*s3 + s2*s3

We can repeat this rule as often as necessary to move all + operators to the
top level of the set expression.

It is beginning to look like our isNonEmpty function needs to examine terms
like s1*s2 to see if they produce any elements inside the specified range.

But what about the set - operator.  Recall that s1 - s2 is implemented as

     s1 * ~s2, where ~s2 is the bitwise complement of s2.

Note that this complement extends over the whole range that the isNonEmpty
function needs to consider.

So if the bounds of s2 are m..n, and isNonEmpty is considering the range
 r0..r1, then s1 - s2 becomes

    s1*( sa + sb + sc), where
    sa = [r0 .. m]
    sb = ~s2
    sc = [n .. r1]

We can now see that the plus operators in this expression can be moved to the
top of the expression tree, and that any set expression can be rewritten as
a union of terms, ie

    sa*sb*sc...*sz  + ssa*ssb*...

Our isNonEmpty function needs to determine whether any of these terms produce
an elemnt in the specified range.  There are two factors that save us some
work:
 - these are set intersections, so the low limit of each term is
          max(low bounds of each set in the term)
   and similarly the high limit is min(high bounds of each set in the term)
   In many cases, the low limit will be greater than the high bound, so there
   is nothing to do.
 - many of the terms are of the form [a..b], so apart from contributing to the
   low & high limits of the termm, they need no further consideration.
   ([a..b] * s = s inside these limits)

Some of this analysis can be done at compile time, some needs to be done at
run time, but after all this we can now catch set assignment errors.

eg Let's see this in action with a worked example:

assume we have a set s, declared like this

      s : set of 1..10;

and we have a statement like this

      s1 := [i..j] - [k];             ---- (1)

We need to know if the rhs expression generates any elements that are outside
the bounds of s, ie are any of the elements of the rhs less than 1 or greater
than 10?

The relevant expression tree, T is '[i..j] - [k]', and the ranges of interest
are [-maxint..0] and [11..maxint], so there is an error iff

     isNonEmpty(T, -maxint..0) or isNonEmpty(T, 11..maxint)

Considering the rhs of this (with the range 11..maxint), we need to manipulate
tree T as follows:

     [i..j] - [k]  =  [i..j] * /[k], where

     /[k] is the inverse of [k].  This is [-maxint..k-1, k+1..maxint], but
     here we are considering only the range 11..maxint, so

               /[k] = [11..k-1, k+1..maxint]

We now have

     [i..j] - [k]  =  [i..j] * ([11..k-1] + [k+1..maxint])
                   =  [i..j] * [11..k-1] + [i..j] * [k+1..maxint]   --(2)

So if the range i..j intersects the range 11..k-1, or if i..j intersects
k+1..maxint, then the expression is non-empty.  That is, the region above the
upper bound of s1 is occupied and there is an error in statement (1) above.
eg, if i is 6 and j and k are both 11, statement (1) is

    s1 := [6..11] - [11];

which is OK.

In equation (2), we have

                      [6..11]*[11..10] + [6..11]*[12..maxint]

both terms of the expression are empty so statement (1) is OK.

On the other hand, if both j and k are now 12, statement (1) is

    s1 := [6..12] - [12];

which is in error because element 11 cannot legally be assigned to s1.

In equation (2), we have

                      [6..12]*[11..11] + [6..12]*[13..maxint]

which reduces to [11] + [] statement (1) is in error.

Returning to equation (2) above, we consider the term on each side of the
plus operator separately. Since each term is just the intersection of 2
ranges, the result is another range whose lower limit is max(lower limit
of each range) and higher limit is min(higher limit of each range).

The result is
                      [max(i,11) .. min(j,k-1)] + [max(i,k+1)..j]

So, statement (1) above is in error if any of these terms are non-empty, ie
iff
          max(i,11) <= min(j,k-1) or max(i,k+1) <= min(j,maxint)

In the p5c world, we'll refer to this technique as 'analytic comparison'.

Notes:

- compilicated set expressions can use up a lot of code space.
  If this matters, you may need to put {$d-} and {$d+} option comments
  around set expressions to disable debugging.

- since many subexpressions are evaluated more than once, it is important to
  consider side efefcts.  We need to store all sub expressions in temporary
  variables to avoid unnecessary side efefcts, and to save reevaluating them
  again.

We can use analytic comparison to enhance some features of set evaluation
mentioned earlier where the set size is unknown or impossibly large.

The benefits of evaluating set expressions analytically go beyond debugging.

We saw earlier that in a statement such as

          if [i..j] - [k] <= s then writeln('less than or equal to');

it is impossible to know how to build a set of the correct size to evaluate
the condition in the if statement.  Then, the way forward was to imply a
set size by, for example, changing the condition to

          if [c1..c2] * [i..j] - [k]) <= s then ...

where c1 & c2 are compile time constants.

But what if [c1..c2] is still impossibly large?

Note that if a and b are set expressions, then a <= b is equivalent to

            a - b = []

We can use now use our isNonEmpty() function to check the set expression a-b
and determine the result of the expression.  This is without creating the
large set(s), but possibly at the cost of creating a large amont of code.

Note also that the set expression a=b is equivalent to

           (a-b) + (b-a) = []

so again we can evalute this expression without knowing the set size.

Finally, the set expression may have a known size, but might be too extreme to
use.
For example this expression:

   if [ i..13, maxint ] <> [3..j, k] then ...


can use analytic comparison to check the set expressions without needing to
know the size of the set expressions.

If you do need to compare set expressions with extreme ranges, there is
the {$Z+} compiler option available.

This option instructs p5c to evaluate the expression analytically without
constructiong the component sets if and only if one of these conditions apply:

- the set expression size is unknown, or

- the set expression size is greater than the default limits set by p5c
  (normally -255..+255)

This is an advanced feature, and may involve a complex tradeoff between
memory needed to evaluate a set comparison on the one hand and the volume of
code needed to make the same comparison.

The volume of code is caused by every term in the set expression needing to
be combined with every other term.  The way to reduce it is by keeping the set
expressions as simple as possible.

For example the expression

                 if s1 * s2 - s3 + s4 = [i3, 8..15] then ...

could be simplified by creating and using a temporary set:

                 tmpSet := s1 * s2 - s3 + s4
                 if tmpSet = [i3, 8..15] then ...



conformant arrays
-----------------

In pascal, a conformant array is declared something like this...

procedure p( a: array[lo..hi:integer] of myType );

... and is compiled into code like this

void p_2( int lo, int hi, void *a_2c );
struct { myType_1 component[hi+1-lo] } a_2;
a_2 = *(typeof(a2)*)a_2c;

Note that this uses gcc's ability to declare arrays with run time expressions.

Apart from slight differences to checking bounds overflow at runtime,
accessing a conformant array is the same as accessing a normal array.


files
-----

pascal files are translated into the following c struct:

struct {
       FILE *f;
       int flags
       char *name
       buffer
}

The pascal file is represented by a c FILE variable.
The name is used only for files declared in the program heading, and points
to the coresponding argv[] in the c argument list.
buffer is the file buffer variable, say f^.
The flags are used to implement lazy input.
Lazy input is used in pascal so input from files is not read until it is used.
Note that for any pascal file, say text file f, that is in read mode, the
buffer variable must appear to contain the next item to be read.
In particular, the program must wait at startup (conceptually) for the first
character to be entered so it can initialise input^.  This is true for any
file that might be connected to an interactive device.

To avoid this wait to enter a character at startup, pascal just pretends that
it has already read the first character and put it into input^.  The input is
not read from the file until it is used by the program.

So for lazy input, the next item is not read from a file until it is needed,
ie as late as possible.

The flags member of the file structure has the following values:

-2    the file is a text file in write mode, and
      the current line has not been terminated with an eoln char
-1    the file is in write mode
 0    the file has not yet read the next item into buffer, see note below.
 1    the flag has read the next item into the buffer
 2    the file is a text file at end of line and buffer contains space char

Additionally, if the file is a text file and flags is zero, the buffer has the
value of flags for the previous file position.  This enables correct handling
of unterminated text files.  So if eof is detected, and the previous char was
not eoln, then an eoln can be returned.  On the other hand, it is an error if
the previous char was eoln.

A pascal file variable contains information about its attached file and the
state of that file.

This has the following consequences:
 - you cannot make a new copy of a file variable, ie if f1 and f2 are both
   file variables, you cannot legally write
          f1 := f2
   You can, however, pass a file variable to a procedure (or function) as a
   var parameter since this is passing the variable itself, not making a new
   copy.
 - file variables must be able to represent "no file attached"
 - when a record variant contains files, the file variables must not share memory
   with other variables since the description of the file must not be overwriten
   and lost.
 - All files can be closed when file variable goes out of scope.

new copy: all variable types know if they contain a file and forbid all
operations that make a new copy

all varaibles that contain a file are initialised to zero.  This indicates that
initially no file is attached to the variable.

when a record variant contains files, this variant is not declared as part
   of a union.  This avoids sharing memory with other variables so file
   variables are not destroyed and the description of the file is kept intact.


Cleanup is more complicated and several scenarios need to be considered:
 - normal return from a function
 - goto out of a function
 - fatal error
 - dispose dynamic memory containing a file variable

For every block (ie procedure/function) that contaions at least one file,
p5c creates a function called _Pcleanup that closes every file declared in
that block.  If that file has a name, it is external and is correctly
terminated with a newline if necessary.
When the procedure or function returns, _Pcleanup is called to close all the
files that are declared in this procedure or function.

There is a global list of all cleanup functions which is updated on every
block entry and exit. The head of this list is _Phead.

Now, when a goto statement leaves a block, all the cleanup functions in this
list are called upto the destination level, and _Phead is updated.

Similarly, a function called _Pexit is defined to call all the active cleanup
functions.  It registers with the c library's atexit() function to run
whenever the program terminates.
When a fatal error occurs, a message is printed and the program terminates.
This activates _Pexit which adds a new line to all external files if necessary
and closes all open files.

Dynamic memory may also include files, and these are closed when the dynamic
memory is disposed.  This is the only case where files in dynamic memory are
closed, so if you don't want to leak file handles (and memory) dispose of all
dynamic memory before it goes out of scope.

The close file statements are generated by the procedure genCloseFiles(),
which takes 2 parameters: a type definition and a procedure parameter that
prints the variable name.  genclosefiles()recursively scans the type
definition and outputs a close file statement for each file declaration that
it finds. Note that:
-  it needs to account for arrays of files by generating a
   loop tp close each member of the array.
-  it calls itself recursively when it finds a record that contains files. In
   this case the procedure parameter that prints the variable name might print
   more than a simple identifier: it could print an arbitrarily complicated
   declaration (say a[i0].r.m).


FAQs
====

Q.      I have ansi c, but not gcc, can I still use p5c?

A.      No p5c uses gcc extensions like nested function calls
        and statement expressions, etc.


Q.      Can I compile pcom.pas with gnu pascal?

A.      Yes, here's linux commands to do that:

        gpc --executable-file-name -g --pointer-checking --stack-checking
        --standard-pascal pcom.pas
        # we now have pcom, a pascal-to-c compiler generated by gpc

        gpc-run ./pcom --gpc-rts=-nprd:pcom.pas --gpc-rts=-nprc:pcom.c < /dev/null > pcom.lst
        # now we have pcom.c,generated by the above compiler

        cc -I . -o p5c -lm pcom.c 2> pcom.err



Acknowledgements
================
p5c is a hobby project and could never have been created from scratch.
It was possible only because others have already done most of the work and
made it available for free.

original P4 creators at ETH

Steve Pemberton for describing the p4 compiler.

Scott Moore, who brought the P4 compiler up to full ISO standard and created P5

p2c, which showed that pascal can be translated into c code

gcc creators & those who ported gcc to just about every system there is.
