INTERNAL DOCUMENTATION OF THE PARSER OF PASCALIGO (Pascal-like LIGO)

This document describes the source code in the directory
ligo/src/parser/pascaligo and some maintenance workflows.

The directory contains the following:

  Doc
    The directory containing this documentation.

  Tests
    The directory containing tests.

  dune
    The Dune file for building the Pascaligo parser.

  pascaligo.ml
    A source needed for building the parser with Dune.

  check_dot_git_is_dir.sh
     A shell initially made to distinguish a git worktree from the
     working directory (currently broken).

  Stubs
    A directory containing Tezos_utils.ml, which is ignored by Dune,
    but linked from the parent directory ligo/src/parser/pascaligo
    when building with the Christian's Makefile for OCaml
    projects. (See http://github.com/rinderknecht/OCaml-build) Ignore
    them.

  .LexerMain.tag
  .Lexer.ml.tag
  .ParserMain.tag
  .Parser.mly.tag
  .links
     As ligo/src/parser/pascaligo/Stubs/Tezos_utils.ml, these files
     are only used by Christian's build system. Ignore them.

  LexerMain.ml
  ParserMain.ml
    Source for two entry points enabling Christian's build system to
    build only a standalone lexer or a standalone parser. Do not
    change, unless you change EvalOpt and use Christian's build system.

  ParserLog.mli
  ParserLog.ml
    Source for printing the AST. Used by ParserMain.ml, pascaligo.ml
    and the translator from this AST to the one needed by the
    type-checker (see directory ligo/src/simplify).

  Utils.mli
  Utils.ml
    Some utility types and functions.

  AST.mli
  AST.ml
    The abstract syntax tree of Pascaligo.

  LexToken.mli
  LexToken.mll
    The second-level lexer of Pascaligo, scanning the (lexical)
    tokens, and used to instantiate the first-level lexer
    (Lexer.mll). If you write a new flavour of LIGO, this lexer is
    likely to be modified, also if you plan to add new lexemes (beware
    that if you add a new token constructor to the type [LexToken.t],
    you may have to change the signature [Lexer.S] so you an instantiate
    the first-level lexer.

  Parser.mly
    The Menhir specification of the grammar of Pascaligo and the
    semantic actions building the AST. The syntax is actually a mix of
    two sub-flavours: one in which compound structures, like blocks,
    records, lists etc., are opened by a keyword denoting the kind of
    structure, like "block", "record", "list" etc., and are closed by
    the key word "end", and one in which those structures are opened
    by a keyword followed by a symbol, like "{", "[" etc. and closed
    by a symbol, lik "}", "]" etc. For instance,
       "record x : t; y : u end" versus "record {x : t; y : u}".
    In the future, these two styles should be separated and, in the
    meantime, it is advise to keep to one style per LIGO contract, for
    readability's sake. A first maintenance task would be to separate
    this file in two, so each parses only one style, and share the
    common parts of the grammar.
      However you change this file, the grammar must remain without LR
    conflicts, without resorting %prec or %assoc annotations.


PASCALIGO

Generalities

Pascaligo is an imperative language for writing smart contracts on the
Tezos blockchain. As such, it is compiled to Michelson, the native
language of the Tezos blockchain. Its design is inspired by Pascal,
OCaml and Michelson.

  An imperative language is a language in which the value bound to a
variable can change over time, as opposed to a constant. That change,
called _side-effect_, is often leveraged through loops, enabling data
to be modified and accumulated repeatedly. For example, here is how
the integer value associated to the variable "x" is incremented by an
instruction, called _assignment_:

x := x + 1;

A loop computing the sum of all integers from 1 to 10 would be written
as follows:

y := 0;
for x := 1 to 10
  begin
    y := y + x
  end

(Note that this is useless in practice, as a closed-form formula
exists for that computation.)

  In Pascaligo, expressions and instructions are
distinguished. Expressions are evaluated and yield values, whilst
instructions are evaluated but do not yield values. Instructions are
meant to perform side-effects, like changing the value of a variable,
whereas expressions are purely computational, like calculating a an
arithmetic means. Instructions and expressions can be compounded, and
instructions can evaluate expressions as a means to perform
side-effects.

  Pascaligo is strongly and statically typed, which means that the
composition of data and functions is contrained so the compiler can
check that no such composition can fail at run-time, e.g., because of
a meaningless expression. Pascaligo requires that variables are
declared together with their type and an initial value.

Declarations of values come in two kinds: either constants or
variables. The former are assigned only once at their declaration, and
the latter can be reassigned. The syntax is slightly different for
both. For example, the variables "x" and "y" above could have been
declared as follows:

var x : nat := 0;
var y : nat := 0;

It is possible to specify that the value of a variable will not change
(the name "variable" is misleading in that context), that is, they
remain constant:

const ten : nat = 10;
const eleven : nat = ten + 1;

Similarly, function declarations have their parameters and return
value annotated with their types. For instance,

function sum (const n : nat; const m : nat) : nat is
  begin
  end with n + m;

declarares the function "sum" that takes as argument two constant
natural numbers and return their sum. The expression whose value is
the result of calling the function is given after the keyword "with".

A another example would be

function factorial (const n : nat) : nat is
  var m : nat := 0;
  var f : nat := 1;
  begin
    if n <= 0 then f := 1
    else
      for m := 1 to n
        begin
          f := f * m
        end
   end with f

Like Pascal, Pascaligo offers procedures, as well as functions. The
difference follows the divide between expressions and instructions:
function calls are expressions, procedure calls are instructions.

A special case of functions are entry points, which are functions that
can be called when interacting with the contract after it has been
originated on the chain. The only difference with function is that
they are introduced by a keyword "entrypoint", instead of "function",
and they (currently) must have a special parameter for the storage,
and the return type must be a pair made of a new storage and a list of
operations. For example,

entrypoint contribute (storage store : store; ...)
  : store * list (operation) is
  var operations : list (operation) := nil
  begin
  ...
  end with (store, operations)

where "storage" is a keyword.

  Pascaligo features predefined types, like integers, natural numbers,
mutez, strings, maps, lists etc. and constructs to combine those into
structured types. Amongst those constructs are the records, which
group and map names (_fields_) to values of potentially different
types. For instance,

type point is
  record
    x : int;
    y : int
  end

defines a record type "store" with three fields, each made of a name
and a type. Values of record types are made by assigning a value to
each field (in any order). Like so:

const origin : point =
  record
    x = 0;
    y = 0
  end

At this point it is perhaps useful to remark that there are actually
two flavours of Pascaligo recognised by the same parser: they should
be separated in the future, and, for now, it is best to not mix both
styles. Those style differ in the manner compound constructs are
delimited

For example, the type "point" above could have been alternatively
defined as follows:

type point is
  record [
    x : int;
    y : int
  ]

and the value as

const origin : point = record [x = 0; y = 0];

When updating the contents of a record, Pascaligo offers some
syntactic support. Instead of writing all the assignments, most of
which are left unchanged, there is the record patch, which corresponds
to a functional update in OCaml. For example,

var p : point := origin;

patch p with record y = 10 end;

will update only the field "y" of "p". Of course, this example is not
impressive, but imagine that one has to update a small number of
fields in a large record. An alternative syntax is

patch p with record [y = 10];

  Another way to combine types are disjunctive types, which are a
generalisation of enumerated types found in OCaml. They can be
interpreted as being a disjoint partition of value sets, each being
disinguished by a unique tag, or _data constructor_. For example,

type u = unit
type t = A | B of u | C of int * string

See OCaml.

As in OCaml, their values can be matched against patterns:

match v with
  A -> "A"
| B Unit -> "B"
| C (_, s) -> s

Of course, we also find type aliases, which simply rename a type. More
importantly, Pascaligo has predefined types that cannot be defined by
the contract programmer. Indeed, user-defined types are monomorphic
and non-recursive, in other words, they are not parameterised and
cannot be defined in terms of themselves. This limitation precludes
defining lists of values, as a list is an inductive data type: a list
is either empty or a pair made of an item (the first item) and another
list (the remaining items). That is why Pascaligo features a native
polymorphic list type, with the condition that all list values must
instantiate the type of the items it contains. Another useful data
abstraction, native to Pascaligo, is the map, which relates values of
a given type to values of another given type. The last type predefined
by Pascaligo is the set.

Pascaligo is inspired by Pascal and OCaml. Semi-colons are separators,
but they can be used as terminators as well.

For example

type store is
  record
    goal     : nat;
    deadline : timestamp;
    backers  : map (address, nat);
    funded   : bool;
  end

can alternatively be written

type store is
  record
    goal     : nat;
    deadline : timestamp;
    backers  : map (address, nat);
    funded   : bool
  end

Only non-recursive types are user-definable in LIGO.

A predefined recursive and polymorphic type is the list (or
stack). The syntax is the same as in OCaml, with some extra syntax for
convenience. For example, the empty list can be written in three
different ways (only one is recommended by contract):

list end
list []
nil

A non-empty list starts with the keyword "list" and comes in two
flavours (only one is recommended by contract):

list 1; 2; 3 end
list [1; 2; 3]

To push (cons) and element in a list, the infix operator is "::", as
in OCaml:

1::2::l

All user-definable values in Pascaligo are monomorphic and must be
annotated with a their types, except in arithmetic or boolean
expressions, or at their declaration (since their type is given in the
left-hand side). In particular, empty lists need to be annotated, like
so

1 :: (nil : list (int))

But

var l : list (int) := list [];

works, as the type is available.