Proto/Env: remove Bytes.unsafe_of_string
This commit is contained in:
parent
c277b5b56d
commit
863869eb40
@ -301,125 +301,6 @@ val equal: t -> t -> bool
|
|||||||
always-correct {!Bytes.to_string} and {!Bytes.of_string} instead.
|
always-correct {!Bytes.to_string} and {!Bytes.of_string} instead.
|
||||||
*)
|
*)
|
||||||
|
|
||||||
val unsafe_to_string : bytes -> string
|
|
||||||
(** Unsafely convert a byte sequence into a string.
|
|
||||||
|
|
||||||
To reason about the use of [unsafe_to_string], it is convenient to
|
|
||||||
consider an "ownership" discipline. A piece of code that
|
|
||||||
manipulates some data "owns" it; there are several disjoint ownership
|
|
||||||
modes, including:
|
|
||||||
- Unique ownership: the data may be accessed and mutated
|
|
||||||
- Shared ownership: the data has several owners, that may only
|
|
||||||
access it, not mutate it.
|
|
||||||
|
|
||||||
Unique ownership is linear: passing the data to another piece of
|
|
||||||
code means giving up ownership (we cannot write the
|
|
||||||
data again). A unique owner may decide to make the data shared
|
|
||||||
(giving up mutation rights on it), but shared data may not become
|
|
||||||
uniquely-owned again.
|
|
||||||
|
|
||||||
[unsafe_to_string s] can only be used when the caller owns the byte
|
|
||||||
sequence [s] -- either uniquely or as shared immutable data. The
|
|
||||||
caller gives up ownership of [s], and gains ownership of the
|
|
||||||
returned string.
|
|
||||||
|
|
||||||
There are two valid use-cases that respect this ownership
|
|
||||||
discipline:
|
|
||||||
|
|
||||||
1. Creating a string by initializing and mutating a byte sequence
|
|
||||||
that is never changed after initialization is performed.
|
|
||||||
|
|
||||||
{[
|
|
||||||
let string_init len f : string =
|
|
||||||
let s = Bytes.create len in
|
|
||||||
for i = 0 to len - 1 do Bytes.set s i (f i) done;
|
|
||||||
Bytes.unsafe_to_string s
|
|
||||||
]}
|
|
||||||
|
|
||||||
This function is safe because the byte sequence [s] will never be
|
|
||||||
accessed or mutated after [unsafe_to_string] is called. The
|
|
||||||
[string_init] code gives up ownership of [s], and returns the
|
|
||||||
ownership of the resulting string to its caller.
|
|
||||||
|
|
||||||
Note that it would be unsafe if [s] was passed as an additional
|
|
||||||
parameter to the function [f] as it could escape this way and be
|
|
||||||
mutated in the future -- [string_init] would give up ownership of
|
|
||||||
[s] to pass it to [f], and could not call [unsafe_to_string]
|
|
||||||
safely.
|
|
||||||
|
|
||||||
We have provided the {!String.init}, {!String.map} and
|
|
||||||
{!String.mapi} functions to cover most cases of building
|
|
||||||
new strings. You should prefer those over [to_string] or
|
|
||||||
[unsafe_to_string] whenever applicable.
|
|
||||||
|
|
||||||
2. Temporarily giving ownership of a byte sequence to a function
|
|
||||||
that expects a uniquely owned string and returns ownership back, so
|
|
||||||
that we can mutate the sequence again after the call ended.
|
|
||||||
|
|
||||||
{[
|
|
||||||
let bytes_length (s : bytes) =
|
|
||||||
String.length (Bytes.unsafe_to_string s)
|
|
||||||
]}
|
|
||||||
|
|
||||||
In this use-case, we do not promise that [s] will never be mutated
|
|
||||||
after the call to [bytes_length s]. The {!String.length} function
|
|
||||||
temporarily borrows unique ownership of the byte sequence
|
|
||||||
(and sees it as a [string]), but returns this ownership back to
|
|
||||||
the caller, which may assume that [s] is still a valid byte
|
|
||||||
sequence after the call. Note that this is only correct because we
|
|
||||||
know that {!String.length} does not capture its argument -- it could
|
|
||||||
escape by a side-channel such as a memoization combinator.
|
|
||||||
|
|
||||||
The caller may not mutate [s] while the string is borrowed (it has
|
|
||||||
temporarily given up ownership). This affects concurrent programs,
|
|
||||||
but also higher-order functions: if [String.length] returned
|
|
||||||
a closure to be called later, [s] should not be mutated until this
|
|
||||||
closure is fully applied and returns ownership.
|
|
||||||
*)
|
|
||||||
|
|
||||||
val unsafe_of_string : string -> bytes
|
|
||||||
(** Unsafely convert a shared string to a byte sequence that should
|
|
||||||
not be mutated.
|
|
||||||
|
|
||||||
The same ownership discipline that makes [unsafe_to_string]
|
|
||||||
correct applies to [unsafe_of_string]: you may use it if you were
|
|
||||||
the owner of the [string] value, and you will own the return
|
|
||||||
[bytes] in the same mode.
|
|
||||||
|
|
||||||
In practice, unique ownership of string values is extremely
|
|
||||||
difficult to reason about correctly. You should always assume
|
|
||||||
strings are shared, never uniquely owned.
|
|
||||||
|
|
||||||
For example, string literals are implicitly shared by the
|
|
||||||
compiler, so you never uniquely own them.
|
|
||||||
|
|
||||||
{[
|
|
||||||
let incorrect = Bytes.unsafe_of_string "hello"
|
|
||||||
let s = Bytes.of_string "hello"
|
|
||||||
]}
|
|
||||||
|
|
||||||
The first declaration is incorrect, because the string literal
|
|
||||||
["hello"] could be shared by the compiler with other parts of the
|
|
||||||
program, and mutating [incorrect] is a bug. You must always use
|
|
||||||
the second version, which performs a copy and is thus correct.
|
|
||||||
|
|
||||||
Assuming unique ownership of strings that are not string
|
|
||||||
literals, but are (partly) built from string literals, is also
|
|
||||||
incorrect. For example, mutating [unsafe_of_string ("foo" ^ s)]
|
|
||||||
could mutate the shared string ["foo"] -- assuming a rope-like
|
|
||||||
representation of strings. More generally, functions operating on
|
|
||||||
strings will assume shared ownership, they do not preserve unique
|
|
||||||
ownership. It is thus incorrect to assume unique ownership of the
|
|
||||||
result of [unsafe_of_string].
|
|
||||||
|
|
||||||
The only case we have reasonable confidence is safe is if the
|
|
||||||
produced [bytes] is shared -- used as an immutable byte
|
|
||||||
sequence. This is possibly useful for incremental migration of
|
|
||||||
low-level programs that manipulate immutable sequences of bytes
|
|
||||||
(for example {!Marshal.from_bytes}) and previously used the
|
|
||||||
[string] type for this purpose.
|
|
||||||
*)
|
|
||||||
|
|
||||||
(** Functions reading and writing bytes *)
|
(** Functions reading and writing bytes *)
|
||||||
|
|
||||||
val get_char: t -> int -> char
|
val get_char: t -> int -> char
|
||||||
|
Loading…
Reference in New Issue
Block a user