Tutorial: data_encoding
This commit is contained in:
parent
14597b8f7e
commit
7fb3b3ca85
172
docs/data_encoding.org
Normal file
172
docs/data_encoding.org
Normal file
@ -0,0 +1,172 @@
|
||||
* The data_encoding library
|
||||
Throughout the Tezos protocol, data is serialized so that it can be used via RPC,
|
||||
written to disk, or placed in a block. This serialization/deserialization is handled
|
||||
via the [[../src/minutils/data_encoding.mli][data_encoding library]]
|
||||
by providing a set primitive encodings and a variety of combinators.
|
||||
|
||||
** Examples/Tutorial
|
||||
*** Encoding an integer
|
||||
|
||||
Integers are defined as other concrete data types with a generic encoding type =type 'a encoding=.
|
||||
This means that it is an encoding to/from type =int=. There are a variety of ways to encode an integer,
|
||||
depending on what binary serialization you want to achieve:
|
||||
- =Data_encoding.int8=
|
||||
- =Data_encoding.uint8=
|
||||
- =Data_encoding.int16=
|
||||
- =Data_encoding.uint16=
|
||||
- =Data_encoding.int31=
|
||||
- =Data_encoding.int32=
|
||||
- =Data_encoding.int64=
|
||||
|
||||
For example, an encoding that represents a 31 bit integer has type
|
||||
=Data_encoding.int31 = int Data_encoding.encoding=.
|
||||
|
||||
#+BEGIN_SRC ocaml
|
||||
let int31_encoding = Data_encoding.int31
|
||||
#+END_SRC
|
||||
|
||||
|
||||
*** Encoding an object
|
||||
Encoding a single integer is fairly uninteresting. The Data_encoding library provides a number of
|
||||
combinators that can be used to build more complicated objects. Consider the type that represents an
|
||||
interval from the first number to the second:
|
||||
|
||||
#+BEGIN_SRC ocaml
|
||||
type interval = int64 * int64
|
||||
#+END_SRC
|
||||
|
||||
We can define an encoding for this type as:
|
||||
|
||||
#+BEGIN_SRC ocaml
|
||||
let interval_encoding =
|
||||
Data_encoding.(obj2 (req "min" int64) (req "max" int64))
|
||||
#+END_SRC
|
||||
|
||||
In the example above we construct a new value =interval_encoding= by combining
|
||||
two int64 integers using the =obj2= constructor.
|
||||
|
||||
The library provides diffrent constructors, i.e. for objects
|
||||
that have no data (=Data_encoding.empty=), constructors for object up to 10 fields,
|
||||
contructors for tuples, list, etc.
|
||||
|
||||
These are serialized to binary by converting each internal object to binary and
|
||||
placing them in the order of the original object and to JSON as a JSON object with field names.
|
||||
|
||||
*** Lists, arrays, and options
|
||||
List, Arrays and options types can by built on top of ground data types.
|
||||
|
||||
#+BEGIN_SRC ocaml
|
||||
type interval_list = interval list
|
||||
|
||||
type interval_array = interval array
|
||||
|
||||
type interval_option = interval option
|
||||
#+END_SRC
|
||||
|
||||
And the encoders for these types as
|
||||
|
||||
#+BEGIN_SRC ocaml
|
||||
let interval_list_encoding = Data_encoding.list interval_encoding
|
||||
let interval_array_encoding = Data_encoding.array interval_encoding
|
||||
let interval_option_encoding = Data_encoding.option interval_encoding
|
||||
#+END_SRC
|
||||
|
||||
*** Union types
|
||||
The Tezos codebase makes heavy use of variant types. Consider the following
|
||||
variant type:
|
||||
|
||||
#+BEGIN_SRC ocaml
|
||||
type variant = B of bool
|
||||
| S of string
|
||||
#+END_SRC
|
||||
|
||||
Encoding for this types can be expressed as:
|
||||
|
||||
#+BEGIN_SRC ocaml
|
||||
let variant_encoding =
|
||||
Data_encoding.(union ~tag_size:`Uint8
|
||||
[ case
|
||||
bool
|
||||
(function B b -> Some b | _ -> None)
|
||||
(fun b -> B b) ;
|
||||
case
|
||||
string
|
||||
(function S s -> Some s | _ -> None)
|
||||
(fun s -> S s) ])
|
||||
#+END_SRC
|
||||
|
||||
This variant encoding is a bit more complicated. Let's look at the parts of the type:
|
||||
- We include an optimization hint to the binary encoding to inform it of the number of elements we expect in the tag.
|
||||
In most cases, we can use =`Uint8=, which allows you to have up to 256 possible cases (default).
|
||||
- We provide a function to wrap the datatype. The encoding works by repeatedly trying to
|
||||
decode the datatype using these functions until one returns =Some payload=. This payload
|
||||
is then encoded using the data_encoding specified.
|
||||
- We specify a function from the encoded type to the actual datatype.
|
||||
|
||||
Since the library does not provide an exhaustivity check on these constructors,
|
||||
the user must be careful when constructucting unin types to avoid unfortunate runtime failures.
|
||||
|
||||
** How the Data_encoding module works
|
||||
|
||||
This section is 100% optional. You do not need to understand this section to use the library.
|
||||
|
||||
The library uses GADTs to provide type-safe serialization/deserialization. From there,
|
||||
a runtime representation of JSON objects is parsed into the typesafe version.
|
||||
|
||||
First we define an untyped JSON AST:
|
||||
|
||||
#+BEGIN_SRC ocaml
|
||||
type json =
|
||||
[ `O of (string * json) list
|
||||
| `Bool of bool
|
||||
| `Float of float
|
||||
| `A of json list
|
||||
| `Null
|
||||
| `String of string ]
|
||||
#+END_SRC
|
||||
|
||||
This is then parsed into a typed AST ( we eliminate several cases for clarity):
|
||||
|
||||
#+BEGIN_SRC ocaml
|
||||
type 'a desc =
|
||||
| Null : unit desc
|
||||
| Empty : unit desc
|
||||
| Bool : bool desc
|
||||
| Int64 : Int64.t desc
|
||||
| Float : float desc
|
||||
| Bytes : Kind.length -> MBytes.t desc
|
||||
| String : Kind.length -> string desc
|
||||
| String_enum : Kind.length * (string * 'a) list -> 'a desc
|
||||
| Array : 'a t -> 'a array desc
|
||||
| List : 'a t -> 'a list desc
|
||||
| Obj : 'a field -> 'a desc
|
||||
| Objs : Kind.t * 'a t * 'b t -> ('a * 'b) desc
|
||||
| Tup : 'a t -> 'a desc
|
||||
| Union : Kind.t * tag_size * 'a case list -> 'a desc
|
||||
| Mu : Kind.enum * string * ('a t -> 'a t) -> 'a desc
|
||||
| Conv :
|
||||
{ proj : ('a -> 'b) ;
|
||||
inj : ('b -> 'a) ;
|
||||
encoding : 'b t ;
|
||||
schema : Json_schema.schema option } -> 'a desc
|
||||
| Describe :
|
||||
{ title : string option ;
|
||||
description : string option ;
|
||||
encoding : 'a t } -> 'a desc
|
||||
| Def : { name : string ;
|
||||
encoding : 'a t } -> 'a desc
|
||||
#+END_SRC
|
||||
|
||||
- The first set of constructures define all ground types.
|
||||
- The constructors for =Bytes=, =String= and =String_enum= includes a length fields in order to provide safe binary serialization.
|
||||
- The constructors for =Array= and =List= are used by the combinators we saw earlier.
|
||||
- The =Obj= and =Objs= constructors create JSON objects.
|
||||
These are wrapped in the =Conv= constructor to remove nesting that results when these constructors are used naively.
|
||||
- The =Mu= constructor is used to create self-referential definitions.
|
||||
- The =Conv= constructor allows you to clean up a nested definition or compute another type from an existing one.
|
||||
- The =Describe= and =Def= constructors are used to add documentation
|
||||
|
||||
The library also provides various wrappers and convenience functions to make constructing these objects easier.
|
||||
Reading the documentation in the [[../src/minutils/data_encoding.mli][mli file]] should orient
|
||||
you on how to use these functions and their purposes.
|
||||
|
Loading…
Reference in New Issue
Block a user