diff --git a/src/proto/alpha/docs/language.md b/src/proto/alpha/docs/language.md index 3ca71af18..ebdcfe69f 100644 --- a/src/proto/alpha/docs/language.md +++ b/src/proto/alpha/docs/language.md @@ -1194,15 +1194,23 @@ These are macros are simply more convenient syntax for various common operations > SET_CDR => CAR; PAIR + * `SET_C[AD]+R`: + A syntactic sugar for setting fields in nested pairs. + + > SET_CA(\rest=[AD]+)R ; C / S => + { DUP ; DIP { CAR ; SET_C(\rest)R } ; CDR ; SWAP ; PAIR } ; C / S + > SET_CD(\rest=[AD]+)R ; C / S => + { DUP ; DIP { CDR ; SET_C(\rest)R } ; CAR ; PAIR } ; C / S IX - Concrete syntax ---------------------- The concrete language is very close to the formal notation of the specification. Its structure is extremely simple: an expression in the -language can only be one of the three following constructs. +language can only be one of the four following constructs. - 1. A constant (integer or string). + 1. An integer. + 1. A character string. 2. The application of a primitive to a sequence of expressions. 3. A sequence of expressions. @@ -1213,109 +1221,54 @@ There are two kinds of constants: 1. Integers or naturals in decimal (no prefix), hexadecimal (0x prefix), octal (0o prefix) or binary (0b prefix). 2. Strings with usual escapes `\n`, `\t`, `\b`, `\r`, `\\`, `\"`. - Strings are encoding agnostic sequences of bytes. Non printable - characters can be escaped by 3 digits decimal codes `\ddd` or - 2 digit hexadecimal codes `\xHH`. + The encoding of a Michelson source file must be UTF-8, + and non-ASCII characters can only appear in strings and comments. + No line break can appear in a string. ### Primitive applications -In the specification, primitive applications always luckily fit on a -single line. In this case, the concrete syntax is exactly the formal -notation. However, it is sometimes necessary to break lines in a real -program, which can be done as follows. +A primitive application is a name followed by arguments -As in Python or Haskell, the concrete syntax of the language is -indentation sensitive. The elements of a syntactical block, such as -all the elements of a sequence, or all the parameters of a primitive, -must be written with the exact same left margin in the program source -code. This is unlike in C-like languages, where blocks are delimited -with braces and the margin is ignored by the compiled. + prim arg1 arg2 -The simplest form requires to break the line after the primitive name -and after every argument. Argument must be indented by at least one -more space than the primitive, and all arguments must sit on the exact -same column. +When a primitive application is the argument to another primitive +application, it must be wrapped with parentheses. - PRIM - arg1 - arg2 - ... - -If an argument of a primitive application is a primitive application -itself, its arguments must be pushed even further on the right, to -lift any ambiguity, as in the following example. - - PRIM1 - PRIM2 - arg1_prim2 - arg2_prim2 - arg2_prim1 - -It is possible to put successive arguments on a single line using -a semicolon as a separator: - - PRIM - arg1; arg2 - arg3; arg4 - -It is also possible to add arguments on the same line as the primitive -as a lighter way to write simple expressions. An other representation -of the first example is: - - PRIM arg1 arg2 ... - -It is possible to mix both notations as in: - - PRIM arg1 arg2 - arg3 - arg4 - -Or even: - - PRIM arg1 arg2 - arg3; arg4 - -Both equivalent to: - - PRIM - arg1 - arg2 - arg3 - arg4 - -Trailing semicolons are ignored: - - PRIM - arg1; - arg2 - -Calling a primitive with a compound argument on a single line is -allowed by wrapping with parentheses. Another notation for the second -example is: - - PRIM1 (PRIM2 arg1_prim2 arg2_prim2) arg2_prim1 + prim (prim1 arg11 arg12) (prim2 arg21 arg22) ### Sequences Successive expression can be grouped as a single sequence expression -using braces delimiters and semicolon separators. +using curly braces as delimiters and semicolon as separators. { expr1 ; expr2 ; expr3 ; expr4 } -A sequence block can be split on several lines. In this situation, the -whole block, including the closing brace, must be indented with -respect to the first instruction. +A sequence can be passed as argument to a primitive. - { expr1 ; expr2 - expr3 ; expr4 } + prim arg1 arg2 { arg3_expr1 ; arg3_expr2 } -Blocks can be passed as argument to a primitive. +Primitive applications right inside a sequence cannot be wrapped. + { (prim arg1 arg2) } # is not ok - PRIM arg1 arg2 - { arg3_expr1 ; arg3_expr2 - arg3_expr3 ; arg3_expr4 } +### Indentation +To remove ambiguities for human readers, the parser enforces some +indentation rules. + + - For sequences: + - All expressions in a sequence must be aligned on the same column. + - An exception is made when consecutive expressions fit on the same + line, as long as the first of them is correctly aligned. + - All expressions in a sequence must be indented to the right of the + opening curly brace by at least one column. + - The closing curly brace cannot be on the left of the opening one. + - For primitive applications: + - All arguments in an application must be aligned on the same column. + - An exception is made when consecutive arguments fit on the same + line, as long as the first of them is correctly aligned. + - All arguments in a sequence must be indented to the right of the + primitive name by at least one column. ### Conventions @@ -1353,11 +1306,8 @@ All domain specific constants are strings with specific formats: To prevent errors, control flow primitives that take instructions as parameters require sequences in the concrete syntax. - IF { instr1_true ; instr2_true ; ... } { instr1_false ; instr2_false ; ... } - - IF - { instr1_true ; instr2_true ; ... } - { instr1_false ; instr2_false ; ... } + IF { instr1_true ; instr2_true ; ... } + { instr1_false ; instr2_false ; ... } ### Main program structure @@ -1373,9 +1323,12 @@ A hash sign (`#`) anywhere outside of a string literal will make the rest of the line (and itself) completely ignored, as in the following example. - PUSH nat 1 # pushes 1 - PUSH nat 2 # pushes 2 - ADD # computes 2 + 1 + { PUSH nat 1 ; # pushes 1 + PUSH nat 2 ; # pushes 2 + ADD } # computes 2 + 1 + +Comments that span on multiple lines or that stop before the end of +the line can also be written, using C-like delimiters (`/* ... */`). X - Examples ------------- @@ -1419,34 +1372,36 @@ Hence, the global data of the contract has the following type 'g = pair - pair timestamp tez - pair (contract unit unit) (contract unit unit) + (pair timestamp tez) + (pair (contract unit unit) (contract unit unit)) Following the contract calling convention, the code is a lambda of type lambda - pair unit 'g - pair unit 'g + (pair unit 'g) + (pair unit 'g) written as lambda - pair unit - pair - pair timestamp tez - pair (contract unit unit) (contract unit unit) - pair unit - pair - pair timestamp tez - pair (contract unit unit) (contract unit unit) + (pair + unit + (pair + (pair timestamp tez) + (pair (contract unit unit) (contract unit unit)))) + (pair + unit + (pair + (pair timestamp tez) + (pair (contract unit unit) (contract unit unit)))) The complete source `reservoir.tz` is: parameter timestamp ; storage - pair - (pair timestamp tez) # T N - (pair (contract unit unit) (contract unit unit)) ; # A B + (pair + (pair timestamp tez) # T N + (pair (contract unit unit) (contract unit unit))) ; # A B return unit ; code { DUP ; CDAAR ; # T @@ -1499,15 +1454,15 @@ The complete source `scrutable_reservoir.tz` is: parameter timestamp ; storage - pair - string # S - pair - timestamp # T - pair - (pair tez tez) ; # P N - pair - (contract unit unit) # X - pair (contract unit unit) (contract unit unit) ; # A B + (pair + string # S + (pair + timestamp # T + (pair + (pair tez tez) ; # P N + (pair + (contract unit unit) # X + (pair (contract unit unit) (contract unit unit)))))) ; # A B return unit ; code { DUP ; CDAR # S @@ -1569,19 +1524,18 @@ peas, and the accounts of the buyer `B`, the seller `S` and the warehouse `W`. These parameters as grouped in the global storage as follows: Pair - Pair (Pair Q (Pair T Z)) - Pair - (Pair K C) - (Pair (Pair B S) W) + (Pair (Pair Q (Pair T Z))) + (Pair + (Pair K C) + (Pair (Pair B S) W)) of type pair - pair nat (pair timestamp timestamp) - pair - pair tez tez - pair (pair account account) account - + (pair nat (pair timestamp timestamp)) + (pair + (pair tez tez) + (pair (pair account account) account)) The 24 hours after timestamp `Z` are for the buyer and seller to store their collateral `(Q * C)`. For this, the contract takes a string as @@ -1614,22 +1568,22 @@ Hence, the global storage is a pair, with the counters on the left, and the constant parameters on the right, initially as follows. Pair - Pair 0 (Pair 0_00 0_00) - Pair - Pair (Pair Q (Pair T Z)) - Pair - (Pair K C) - (Pair (Pair B S) W) + (Pair 0 (Pair 0_00 0_00)) + (Pair + (Pair (Pair Q (Pair T Z))) + (Pair + (Pair K C) + (Pair (Pair B S) W))) of type pair - pair nat (pair tez tez) - pair - pair nat (pair timestamp timestamp) - pair - pair tez tez - pair (pair account account) account + (pair nat (pair tez tez)) + (pair + (pair nat (pair timestamp timestamp)) + (pair + (pair tez tez) + (pair (pair account account) account))) The parameter of the transaction will be either a transfer from the buyer or the seller or a delivery notification from the warehouse of @@ -1657,15 +1611,15 @@ The complete source `forward.tz` is: parameter (or string nat) ; return unit ; storage - pair - pair nat (pair tez tez) # counter from_buyer from_seller - pair - pair nat (pair timestamp timestamp) # Q T Z - pair - pair tez tez # K C - pair - pair (contract unit unit) (contract unit unit) # B S - (contract unit unit); # W + (pair + (pair nat (pair tez tez)) # counter from_buyer from_seller + (pair + (pair nat (pair timestamp timestamp)) # Q T Z + (pair + (pair tez tez) # K C + (pair + (pair (contract unit unit) (contract unit unit)) # B S + (contract unit unit))))) ; # W code { DUP ; CDDADDR ; # Z PUSH nat 86400 ; SWAP ; ADD ; # one day in second