Tutorial: data_encoding

This commit is contained in:
Milo Davis 2017-10-30 16:04:39 +01:00 committed by Benjamin Canou
parent 14597b8f7e
commit 7fb3b3ca85

172
docs/data_encoding.org Normal file
View File

@ -0,0 +1,172 @@
* The data_encoding library
Throughout the Tezos protocol, data is serialized so that it can be used via RPC,
written to disk, or placed in a block. This serialization/deserialization is handled
via the [[../src/minutils/data_encoding.mli][data_encoding library]]
by providing a set primitive encodings and a variety of combinators.
** Examples/Tutorial
*** Encoding an integer
Integers are defined as other concrete data types with a generic encoding type =type 'a encoding=.
This means that it is an encoding to/from type =int=. There are a variety of ways to encode an integer,
depending on what binary serialization you want to achieve:
- =Data_encoding.int8=
- =Data_encoding.uint8=
- =Data_encoding.int16=
- =Data_encoding.uint16=
- =Data_encoding.int31=
- =Data_encoding.int32=
- =Data_encoding.int64=
For example, an encoding that represents a 31 bit integer has type
=Data_encoding.int31 = int Data_encoding.encoding=.
#+BEGIN_SRC ocaml
let int31_encoding = Data_encoding.int31
#+END_SRC
*** Encoding an object
Encoding a single integer is fairly uninteresting. The Data_encoding library provides a number of
combinators that can be used to build more complicated objects. Consider the type that represents an
interval from the first number to the second:
#+BEGIN_SRC ocaml
type interval = int64 * int64
#+END_SRC
We can define an encoding for this type as:
#+BEGIN_SRC ocaml
let interval_encoding =
Data_encoding.(obj2 (req "min" int64) (req "max" int64))
#+END_SRC
In the example above we construct a new value =interval_encoding= by combining
two int64 integers using the =obj2= constructor.
The library provides diffrent constructors, i.e. for objects
that have no data (=Data_encoding.empty=), constructors for object up to 10 fields,
contructors for tuples, list, etc.
These are serialized to binary by converting each internal object to binary and
placing them in the order of the original object and to JSON as a JSON object with field names.
*** Lists, arrays, and options
List, Arrays and options types can by built on top of ground data types.
#+BEGIN_SRC ocaml
type interval_list = interval list
type interval_array = interval array
type interval_option = interval option
#+END_SRC
And the encoders for these types as
#+BEGIN_SRC ocaml
let interval_list_encoding = Data_encoding.list interval_encoding
let interval_array_encoding = Data_encoding.array interval_encoding
let interval_option_encoding = Data_encoding.option interval_encoding
#+END_SRC
*** Union types
The Tezos codebase makes heavy use of variant types. Consider the following
variant type:
#+BEGIN_SRC ocaml
type variant = B of bool
| S of string
#+END_SRC
Encoding for this types can be expressed as:
#+BEGIN_SRC ocaml
let variant_encoding =
Data_encoding.(union ~tag_size:`Uint8
[ case
bool
(function B b -> Some b | _ -> None)
(fun b -> B b) ;
case
string
(function S s -> Some s | _ -> None)
(fun s -> S s) ])
#+END_SRC
This variant encoding is a bit more complicated. Let's look at the parts of the type:
- We include an optimization hint to the binary encoding to inform it of the number of elements we expect in the tag.
In most cases, we can use =`Uint8=, which allows you to have up to 256 possible cases (default).
- We provide a function to wrap the datatype. The encoding works by repeatedly trying to
decode the datatype using these functions until one returns =Some payload=. This payload
is then encoded using the data_encoding specified.
- We specify a function from the encoded type to the actual datatype.
Since the library does not provide an exhaustivity check on these constructors,
the user must be careful when constructucting unin types to avoid unfortunate runtime failures.
** How the Data_encoding module works
This section is 100% optional. You do not need to understand this section to use the library.
The library uses GADTs to provide type-safe serialization/deserialization. From there,
a runtime representation of JSON objects is parsed into the typesafe version.
First we define an untyped JSON AST:
#+BEGIN_SRC ocaml
type json =
[ `O of (string * json) list
| `Bool of bool
| `Float of float
| `A of json list
| `Null
| `String of string ]
#+END_SRC
This is then parsed into a typed AST ( we eliminate several cases for clarity):
#+BEGIN_SRC ocaml
type 'a desc =
| Null : unit desc
| Empty : unit desc
| Bool : bool desc
| Int64 : Int64.t desc
| Float : float desc
| Bytes : Kind.length -> MBytes.t desc
| String : Kind.length -> string desc
| String_enum : Kind.length * (string * 'a) list -> 'a desc
| Array : 'a t -> 'a array desc
| List : 'a t -> 'a list desc
| Obj : 'a field -> 'a desc
| Objs : Kind.t * 'a t * 'b t -> ('a * 'b) desc
| Tup : 'a t -> 'a desc
| Union : Kind.t * tag_size * 'a case list -> 'a desc
| Mu : Kind.enum * string * ('a t -> 'a t) -> 'a desc
| Conv :
{ proj : ('a -> 'b) ;
inj : ('b -> 'a) ;
encoding : 'b t ;
schema : Json_schema.schema option } -> 'a desc
| Describe :
{ title : string option ;
description : string option ;
encoding : 'a t } -> 'a desc
| Def : { name : string ;
encoding : 'a t } -> 'a desc
#+END_SRC
- The first set of constructures define all ground types.
- The constructors for =Bytes=, =String= and =String_enum= includes a length fields in order to provide safe binary serialization.
- The constructors for =Array= and =List= are used by the combinators we saw earlier.
- The =Obj= and =Objs= constructors create JSON objects.
These are wrapped in the =Conv= constructor to remove nesting that results when these constructors are used naively.
- The =Mu= constructor is used to create self-referential definitions.
- The =Conv= constructor allows you to clean up a nested definition or compute another type from an existing one.
- The =Describe= and =Def= constructors are used to add documentation
The library also provides various wrappers and convenience functions to make constructing these objects easier.
Reading the documentation in the [[../src/minutils/data_encoding.mli][mli file]] should orient
you on how to use these functions and their purposes.