173 lines
6.3 KiB
Org Mode
173 lines
6.3 KiB
Org Mode
|
* The data_encoding library
|
||
|
Throughout the Tezos protocol, data is serialized so that it can be used via RPC,
|
||
|
written to disk, or placed in a block. This serialization/deserialization is handled
|
||
|
via the [[../src/minutils/data_encoding.mli][data_encoding library]]
|
||
|
by providing a set primitive encodings and a variety of combinators.
|
||
|
|
||
|
** Examples/Tutorial
|
||
|
*** Encoding an integer
|
||
|
|
||
|
Integers are defined as other concrete data types with a generic encoding type =type 'a encoding=.
|
||
|
This means that it is an encoding to/from type =int=. There are a variety of ways to encode an integer,
|
||
|
depending on what binary serialization you want to achieve:
|
||
|
- =Data_encoding.int8=
|
||
|
- =Data_encoding.uint8=
|
||
|
- =Data_encoding.int16=
|
||
|
- =Data_encoding.uint16=
|
||
|
- =Data_encoding.int31=
|
||
|
- =Data_encoding.int32=
|
||
|
- =Data_encoding.int64=
|
||
|
|
||
|
For example, an encoding that represents a 31 bit integer has type
|
||
|
=Data_encoding.int31 = int Data_encoding.encoding=.
|
||
|
|
||
|
#+BEGIN_SRC ocaml
|
||
|
let int31_encoding = Data_encoding.int31
|
||
|
#+END_SRC
|
||
|
|
||
|
|
||
|
*** Encoding an object
|
||
|
Encoding a single integer is fairly uninteresting. The Data_encoding library provides a number of
|
||
|
combinators that can be used to build more complicated objects. Consider the type that represents an
|
||
|
interval from the first number to the second:
|
||
|
|
||
|
#+BEGIN_SRC ocaml
|
||
|
type interval = int64 * int64
|
||
|
#+END_SRC
|
||
|
|
||
|
We can define an encoding for this type as:
|
||
|
|
||
|
#+BEGIN_SRC ocaml
|
||
|
let interval_encoding =
|
||
|
Data_encoding.(obj2 (req "min" int64) (req "max" int64))
|
||
|
#+END_SRC
|
||
|
|
||
|
In the example above we construct a new value =interval_encoding= by combining
|
||
|
two int64 integers using the =obj2= constructor.
|
||
|
|
||
|
The library provides diffrent constructors, i.e. for objects
|
||
|
that have no data (=Data_encoding.empty=), constructors for object up to 10 fields,
|
||
|
contructors for tuples, list, etc.
|
||
|
|
||
|
These are serialized to binary by converting each internal object to binary and
|
||
|
placing them in the order of the original object and to JSON as a JSON object with field names.
|
||
|
|
||
|
*** Lists, arrays, and options
|
||
|
List, Arrays and options types can by built on top of ground data types.
|
||
|
|
||
|
#+BEGIN_SRC ocaml
|
||
|
type interval_list = interval list
|
||
|
|
||
|
type interval_array = interval array
|
||
|
|
||
|
type interval_option = interval option
|
||
|
#+END_SRC
|
||
|
|
||
|
And the encoders for these types as
|
||
|
|
||
|
#+BEGIN_SRC ocaml
|
||
|
let interval_list_encoding = Data_encoding.list interval_encoding
|
||
|
let interval_array_encoding = Data_encoding.array interval_encoding
|
||
|
let interval_option_encoding = Data_encoding.option interval_encoding
|
||
|
#+END_SRC
|
||
|
|
||
|
*** Union types
|
||
|
The Tezos codebase makes heavy use of variant types. Consider the following
|
||
|
variant type:
|
||
|
|
||
|
#+BEGIN_SRC ocaml
|
||
|
type variant = B of bool
|
||
|
| S of string
|
||
|
#+END_SRC
|
||
|
|
||
|
Encoding for this types can be expressed as:
|
||
|
|
||
|
#+BEGIN_SRC ocaml
|
||
|
let variant_encoding =
|
||
|
Data_encoding.(union ~tag_size:`Uint8
|
||
|
[ case
|
||
|
bool
|
||
|
(function B b -> Some b | _ -> None)
|
||
|
(fun b -> B b) ;
|
||
|
case
|
||
|
string
|
||
|
(function S s -> Some s | _ -> None)
|
||
|
(fun s -> S s) ])
|
||
|
#+END_SRC
|
||
|
|
||
|
This variant encoding is a bit more complicated. Let's look at the parts of the type:
|
||
|
- We include an optimization hint to the binary encoding to inform it of the number of elements we expect in the tag.
|
||
|
In most cases, we can use =`Uint8=, which allows you to have up to 256 possible cases (default).
|
||
|
- We provide a function to wrap the datatype. The encoding works by repeatedly trying to
|
||
|
decode the datatype using these functions until one returns =Some payload=. This payload
|
||
|
is then encoded using the data_encoding specified.
|
||
|
- We specify a function from the encoded type to the actual datatype.
|
||
|
|
||
|
Since the library does not provide an exhaustivity check on these constructors,
|
||
|
the user must be careful when constructucting unin types to avoid unfortunate runtime failures.
|
||
|
|
||
|
** How the Data_encoding module works
|
||
|
|
||
|
This section is 100% optional. You do not need to understand this section to use the library.
|
||
|
|
||
|
The library uses GADTs to provide type-safe serialization/deserialization. From there,
|
||
|
a runtime representation of JSON objects is parsed into the typesafe version.
|
||
|
|
||
|
First we define an untyped JSON AST:
|
||
|
|
||
|
#+BEGIN_SRC ocaml
|
||
|
type json =
|
||
|
[ `O of (string * json) list
|
||
|
| `Bool of bool
|
||
|
| `Float of float
|
||
|
| `A of json list
|
||
|
| `Null
|
||
|
| `String of string ]
|
||
|
#+END_SRC
|
||
|
|
||
|
This is then parsed into a typed AST ( we eliminate several cases for clarity):
|
||
|
|
||
|
#+BEGIN_SRC ocaml
|
||
|
type 'a desc =
|
||
|
| Null : unit desc
|
||
|
| Empty : unit desc
|
||
|
| Bool : bool desc
|
||
|
| Int64 : Int64.t desc
|
||
|
| Float : float desc
|
||
|
| Bytes : Kind.length -> MBytes.t desc
|
||
|
| String : Kind.length -> string desc
|
||
|
| String_enum : Kind.length * (string * 'a) list -> 'a desc
|
||
|
| Array : 'a t -> 'a array desc
|
||
|
| List : 'a t -> 'a list desc
|
||
|
| Obj : 'a field -> 'a desc
|
||
|
| Objs : Kind.t * 'a t * 'b t -> ('a * 'b) desc
|
||
|
| Tup : 'a t -> 'a desc
|
||
|
| Union : Kind.t * tag_size * 'a case list -> 'a desc
|
||
|
| Mu : Kind.enum * string * ('a t -> 'a t) -> 'a desc
|
||
|
| Conv :
|
||
|
{ proj : ('a -> 'b) ;
|
||
|
inj : ('b -> 'a) ;
|
||
|
encoding : 'b t ;
|
||
|
schema : Json_schema.schema option } -> 'a desc
|
||
|
| Describe :
|
||
|
{ title : string option ;
|
||
|
description : string option ;
|
||
|
encoding : 'a t } -> 'a desc
|
||
|
| Def : { name : string ;
|
||
|
encoding : 'a t } -> 'a desc
|
||
|
#+END_SRC
|
||
|
|
||
|
- The first set of constructures define all ground types.
|
||
|
- The constructors for =Bytes=, =String= and =String_enum= includes a length fields in order to provide safe binary serialization.
|
||
|
- The constructors for =Array= and =List= are used by the combinators we saw earlier.
|
||
|
- The =Obj= and =Objs= constructors create JSON objects.
|
||
|
These are wrapped in the =Conv= constructor to remove nesting that results when these constructors are used naively.
|
||
|
- The =Mu= constructor is used to create self-referential definitions.
|
||
|
- The =Conv= constructor allows you to clean up a nested definition or compute another type from an existing one.
|
||
|
- The =Describe= and =Def= constructors are used to add documentation
|
||
|
|
||
|
The library also provides various wrappers and convenience functions to make constructing these objects easier.
|
||
|
Reading the documentation in the [[../src/minutils/data_encoding.mli][mli file]] should orient
|
||
|
you on how to use these functions and their purposes.
|
||
|
|