module type B = sig val msg : string end module Blinux : B = struct let msg = "linux msg" end module Bmacos : B = struct let msg = "macos msg" end
OCaml: Language
Very rough as of 5/25/22. The goal is to provide a very concise overview of the language, in particular those parts/features of the language that affect the build discipline, such as modules, separate compilation, top-level module aliases and namespacing, PPXs, etc. |
Why OCaml? Aside from practical considerations, OCaml is directly based on the Mother of All (Higher-order,Strictly typed,Functional,Algebraic) Programming Languages: ML. It wasn’t the first language to include most of these features, but it was the first language to include type inference. (source: 10 MOST(LY DEAD) INFLUENTIAL PROGRAMMING LANGUAGES)
History
ML began as the Meta Language of Edinburgh LCF.
Ch. 2.5: Goal-directed programming (tactics)
Also used by NuPrl: Implementing Mathematics with The Nuprl Proof Development System
The Module System
What is a module? Simple question with a surprisingly complicated answer.
Compare: what is an integer? Easy answer: an integer is a token ("value") of type Int. Why "token" instead of "value"? The latter suggests evaluation, which is a computation notion that does not apply to types and tokens. A type is a universal; a token is a particular, end of story. "x is an Int" just means that x is a particular (token) subsumed under the general concept (type) Int.
module is a keyword in the OCaml language, but it is not a
type constant; there is no module type in OCaml.
|
"Module" is not a first-class object. That is, modules are not values of a Module type; there is no type Module. In fact a "module" is neither a value nor a type. It’s a pragmatic term used to refer to a pairing of a struct and a sig. Or rather, its a generic term used to refer to a whole class of module types, just as "list" is often used to refer to Int List, String List, etc.
Signatures and Structures
Compatible sigfile - sigfile satisfied by structfile, irrespective of location and principle name.
Matching sigfile - a compatible sigfile with same principle name, in same directory
I.e. compatibility is structural, matching is nominal and structural.
Compatible structfile - a structfile satisfying the sig, irrespective of location and principle name.
Matching structfile - compatible structfile with same principle name in same dir as sigfile.
From an algebraic perspective, the type constructed by sig…end
expresses an algebraic signature. So it makes sense to call it a
"signature type" or just "signature" with the caveat that "signature"
never means "token of type signature". The tokens of such a type are
structs, not signatures.
From the perspective of OCaml’s module system, the type constructed by
sig…end
expresses a module type, whose tokens are structs.
A module is thus the binding of a module (signature) type and a struct.
In the literature "module" is often used to refer to the struct component of a module. |
TODO: explicate "struct satisfies signature"
A signature can be bound to any struct that satisfies it. In OCaml such type assignments are static, fixed at compile time.
Developers may want to select a struct for binding to a signature, based on configuration constraints. A common case is selection of platform-dependent code, where we have one struct for linux and one for macos. Static typing means that this selection and type assignment must occur at compile time.
The problem with this is that clients would have to use the platform-specific name. The modules here are named Blinux and Bmacos, not B.
Recall that the module names are independent of sig and struct
names. We could have a sig named ASig, a struct named AStruct, and
bind them to produce a module named A: module A : ASig = AStruct
|
OCaml uses different namespaces for module (signature) type
names and module names, so the same name, e.g. A, can be used for both
a sig and a module, e.g. module A : A = struct…end : the first A
names a module and the second names a module (signature) type.
|
To get a module named B, binding signature B to a struct selected based on platform, we need something like the following:
module B : B = if platform == "linux" then Blinux else Bmacos
Unfortunately that’s syntactically malformed so it won’t compile. The
syntax on the right-hand side of the equality sign =
in a module
definition cannot be an if-then expression.
This won’t work either:
if platform == "linux" then module B : B = Blinux else module B : B = Bmacos
But this does work:
if platform == "linux" then let module B : B = Blinux in begin print_endline B.msg end else let module B : B = Bmacos in begin print_endline B.msg end
The problem with this is that the selected module B is only accessible
within the local begin…end
, so we might as well just write:
if platform == "linux" then print_endline Blinux.msg else print_endline Bmacos.msg
Summary: this kind of selection cannot be expressed within the language. But it can be expressed using filesystem modules; see XXX for details.
Principal Signature
Module Bindings
Module Types as Dependent Types
Example:
module M : sig val x : int end = struct let x = 42 end
This means that M
is a token constructed by struct let x = 42 end
,
whose type is constructed by sig val x : int end
.
A type constructed in this manner is a kind of dependent type. Under
standard usage, "dependent type" means a type parameterized by a value
of some (first-order) type such as Int
; for example, the type of
lists of strings of length four. The type constructor for such a type
would be parameterized by the type constant String
and the integer
value 4
.
The type constructor sig … end
is parameterized by a set of
"fields", which are type declarations like val x:int
. (TODO: list
the allowable fields).
the type constructed by sig … end should not be treated as a
signature or signature type. "signature" is the type constructor, not the type.
|
The types constructed by sig…end
may be named by using module type
; for example:
module type MSig = sig ... end
This declares that MSig
names the type constructed by sig…end
.
Compare: let f = λx.x+1
, which declares that f
names the function
described by the lambda expression λx.x+1
.
Module as family of types. Type ctor is sig…end
; token ctor is
struct …end
.
Structs do not have intrinsic types, so there is no "natural" type that goes with a struct; the type of a struct must be assigned by binding it to a signature. Modules are thus always in a sense ad-hoc.
However, every struct does have a principal signature…
"x is an Int" means x is a value whose type is Int.
"xs is an Int List" means x is a value whose type is Int List.
"A is a module" does not mean that A is a value whose type is Module.
What does it mean? That A names a value whose type was
constructed by sig … end
. Its type may or may not be named.
Compare: lambda expression as definite description of a function. The function and its type remain unnamed.
syntactic quirks
semicolons
"For compatibility with toplevel phrases (chapter 12), optional ;;
are allowed after and before each definition in a structure. These ;;
have no semantic meanings. Similarly, an expr preceded by ;;
is
allowed as a component of a structure. It is equivalent to let _ =
expr
, i.e. expr is evaluated for its side-effects but is not bound to
any identifier. If expr is the first component of a structure, the
preceding ;;
can be omitted." — 11.2
Structures
misc
module MyModule : sig val x: int end
module MyModule = struct let x = 1 end
module type MSig = sig ... end
module M : MSig = struct ... end
"In OCaml the namespaces for modules and module types are distinct, so it’s perfectly valid to have a module named ListStack and and module type named ListStack. "
Alternative syntaxs:
module ListStackAlias : LIST_STACK = ListStack
(* equivalently *)
module ListStackAlias = (ListStack : LIST_STACK)
module M : sig val x : int end = struct let x = 42 end
(* equivalently *)
module M = (struct let x = 42 end : sig val x : int end)
from Cornell book
It uses "type annotation" where we use "type assignment".
module MyModule = struct (1)
let inc x = x + 1
type primary_color = Red | Green | Blue
exception Oops
end
module MyModule : (2)
sig
val inc : int -> int
type primary_color = Red | Green | Blue
exception Oops
end
1 | defines |
2 | declares |
"Module" is not a first-class object. That is, modules are not values of a Module type; there is no type Module. In fact a "module" is neither a value nor a type. It’s a pragmatic term used to refer to a pairing of a struct and a sig.
Module as family of types. Type ctor is sig…end
; token ctor is
struct …end
.
Structs do not have intrinsic types, so there is no "natural" sig that goes with a module. Modules are always in a sense ad-hoc.
"x is an Int" means x is a value whose type is Int.
"xs is an Int List" means x is a value whose type is Int List.
"A is a module" does not mean that A is a value whose type is Module.
What does it mean? That A names a value whose type was
constructed by sig … end
. Its type remains unnamed.
Compare: lambda expression as definite description of a function. The function and its type remain unnamed.
Namespacing
Module aliasing
Compiler Extension Hooks (PPX system)
Programming languages and proof assistants based on OCaml
OCaml is popular among researchers and programming language designers as an implementation language. Here are some products that leverage the OCaml toolchain:
-
F* (pronounced F star) is ""a general-purpose functional programming language with effects aimed at program verification. It puts together the automation of an SMT-backed deductive verification tool with the expressive power of a proof assistant based on dependent types." F* is written entirely in F*, and bootstraps via OCaml.
-
λProlog "is a logic programming language based on higher-order intuitionistic logic in the style of Church’s Simple Theory of Types."
-
ELPI - Embeddable λProlog Interpreter "designed to be embedded into larger applications written in OCaml as an extension language."
-
Makam "is a metalanguage: a language for implementing languages. It supports concise declarative definitions, aimed at allowing rapid prototyping and experimentation with new programming language research ideas. The design of Makam is based on higher-order logic programming and is a refinement of the λProlog language. Makam is implemented from scratch in OCaml."
-
Teyjus "is an efficient implementation of the higher-order logic programming language Lambda Prolog. " (moribund?)
-
Abella, an interactive theorem prover for λProlog programs
-
-
HOL Light - interactive theorem prover
-
Hack - "an object-oriented programming language for building reliable websites at epic scale"
-
Haxe - a "high-level strictly-typed programming language with a fast optimizing cross-compiler."
-
Eff - a functional programming language based on algebraic effect handlers.
-
Links - "Allows web programs to be written in a single programming language…"
-
Compilers, Typecheckers, and Parsers - OCamlverse