OCaml: Language

Very rough as of 5/25/22. The goal is to provide a very concise overview of the language, in particular those parts/features of the language that affect the build discipline, such as modules, separate compilation, top-level module aliases and namespacing, PPXs, etc.

Why OCaml? Aside from practical considerations, OCaml is directly based on the Mother of All (Higher-order,Strictly typed,Functional,Algebraic) Programming Languages: ML. It wasn’t the first language to include most of these features, but it was the first language to include type inference. (source: 10 MOST(LY DEAD) INFLUENTIAL PROGRAMMING LANGUAGES)

History

ML began as the Meta Language of Edinburgh LCF.

Ch. 2.5: Goal-directed programming (tactics)

The Module System

What is a module? Simple question with a surprisingly complicated answer.

Compare: what is an integer? Easy answer: an integer is a token ("value") of type Int. Why "token" instead of "value"? The latter suggests evaluation, which is a computation notion that does not apply to types and tokens. A type is a universal; a token is a particular, end of story. "x is an Int" just means that x is a particular (token) subsumed under the general concept (type) Int.

module is a keyword in the OCaml language, but it is not a type constant; there is no module type in OCaml.

"Module" is not a first-class object. That is, modules are not values of a Module type; there is no type Module. In fact a "module" is neither a value nor a type. It’s a pragmatic term used to refer to a pairing of a struct and a sig. Or rather, its a generic term used to refer to a whole class of module types, just as "list" is often used to refer to Int List, String List, etc.

Signatures and Structures

Compatible sigfile - sigfile satisfied by structfile, irrespective of location and principle name.

Matching sigfile - a compatible sigfile with same principle name, in same directory

I.e. compatibility is structural, matching is nominal and structural.

Compatible structfile - a structfile satisfying the sig, irrespective of location and principle name.

Matching structfile - compatible structfile with same principle name in same dir as sigfile.

From an algebraic perspective, the type constructed by sig…​end expresses an algebraic signature. So it makes sense to call it a "signature type" or just "signature" with the caveat that "signature" never means "token of type signature". The tokens of such a type are structs, not signatures.

From the perspective of OCaml’s module system, the type constructed by sig…​end expresses a module type, whose tokens are structs.

A module is thus the binding of a module (signature) type and a struct.

In the literature "module" is often used to refer to the struct component of a module.

TODO: explicate "struct satisfies signature"

A signature can be bound to any struct that satisfies it. In OCaml such type assignments are static, fixed at compile time.

Developers may want to select a struct for binding to a signature, based on configuration constraints. A common case is selection of platform-dependent code, where we have one struct for linux and one for macos. Static typing means that this selection and type assignment must occur at compile time.

module type B = sig val msg : string end

module  Blinux : B = struct let msg = "linux msg" end
module  Bmacos : B = struct let msg = "macos msg" end

The problem with this is that clients would have to use the platform-specific name. The modules here are named Blinux and Bmacos, not B.

Recall that the module names are independent of sig and struct names. We could have a sig named ASig, a struct named AStruct, and bind them to produce a module named A: module A : ASig = AStruct
OCaml uses different namespaces for module (signature) type names and module names, so the same name, e.g. A, can be used for both a sig and a module, e.g. module A : A = struct…​end: the first A names a module and the second names a module (signature) type.

To get a module named B, binding signature B to a struct selected based on platform, we need something like the following:

module B : B = if platform == "linux" then Blinux else Bmacos

Unfortunately that’s syntactically malformed so it won’t compile. The syntax on the right-hand side of the equality sign = in a module definition cannot be an if-then expression.

This won’t work either:

if platform == "linux" then module B : B = Blinux else module B : B = Bmacos

But this does work:

if platform == "linux"
then let module B : B = Blinux in
     begin print_endline B.msg end
else let module B : B = Bmacos in
     begin print_endline B.msg end

The problem with this is that the selected module B is only accessible within the local begin…​end, so we might as well just write:

if platform == "linux"
then print_endline Blinux.msg
else print_endline Bmacos.msg

Summary: this kind of selection cannot be expressed within the language. But it can be expressed using filesystem modules; see XXX for details.

Principal Signature

Module Bindings

Module Types as Dependent Types

Example:

module M : sig val x : int end = struct let x = 42 end

This means that M is a token constructed by struct let x = 42 end, whose type is constructed by sig val x : int end.

A type constructed in this manner is a kind of dependent type. Under standard usage, "dependent type" means a type parameterized by a value of some (first-order) type such as Int; for example, the type of lists of strings of length four. The type constructor for such a type would be parameterized by the type constant String and the integer value 4.

The type constructor sig …​ end is parameterized by a set of "fields", which are type declarations like val x:int. (TODO: list the allowable fields).

the type constructed by sig …​ end should not be treated as a signature or signature type. "signature" is the type constructor, not the type.

The types constructed by sig…​end may be named by using module type; for example:

module type MSig = sig ... end

This declares that MSig names the type constructed by sig…​end. Compare: let f = λx.x+1, which declares that f names the function described by the lambda expression λx.x+1.

Module as family of types. Type ctor is sig…​end; token ctor is struct …​end.

Structs do not have intrinsic types, so there is no "natural" type that goes with a struct; the type of a struct must be assigned by binding it to a signature. Modules are thus always in a sense ad-hoc.

However, every struct does have a principal signature…​

"x is an Int" means x is a value whose type is Int.

"xs is an Int List" means x is a value whose type is Int List.

"A is a module" does not mean that A is a value whose type is Module.

What does it mean? That A names a value whose type was constructed by sig …​ end. Its type may or may not be named.

Compare: lambda expression as definite description of a function. The function and its type remain unnamed.

syntactic quirks

semicolons

"For compatibility with toplevel phrases (chapter 12), optional ;; are allowed after and before each definition in a structure. These ;; have no semantic meanings. Similarly, an expr preceded by ;; is allowed as a component of a structure. It is equivalent to let _ = expr, i.e. expr is evaluated for its side-effects but is not bound to any identifier. If expr is the first component of a structure, the preceding ;; can be omitted." — 11.2 Structures

misc

module MyModule : sig val x: int end

module MyModule = struct let x = 1 end

module type MSig = sig ... end

module M : MSig = struct ... end

"In OCaml the namespaces for modules and module types are distinct, so it’s perfectly valid to have a module named ListStack and and module type named ListStack. "

Alternative syntaxs:

module ListStackAlias : LIST_STACK = ListStack
(* equivalently *)
module ListStackAlias = (ListStack : LIST_STACK)

module M : sig val x : int end = struct let x = 42 end
(* equivalently *)
module M = (struct let x = 42 end : sig val x : int end)

It uses "type annotation" where we use "type assignment".

module MyModule = struct (1)
  let inc x = x + 1
  type primary_color = Red | Green | Blue
  exception Oops
  end

module MyModule :  (2)
  sig
    val inc : int -> int
    type primary_color = Red | Green | Blue
    exception Oops
  end
1 defines
2 declares

"Module" is not a first-class object. That is, modules are not values of a Module type; there is no type Module. In fact a "module" is neither a value nor a type. It’s a pragmatic term used to refer to a pairing of a struct and a sig.

Module as family of types. Type ctor is sig…​end; token ctor is struct …​end.

Structs do not have intrinsic types, so there is no "natural" sig that goes with a module. Modules are always in a sense ad-hoc.

"x is an Int" means x is a value whose type is Int.

"xs is an Int List" means x is a value whose type is Int List.

"A is a module" does not mean that A is a value whose type is Module.

What does it mean? That A names a value whose type was constructed by sig …​ end. Its type remains unnamed.

Compare: lambda expression as definite description of a function. The function and its type remain unnamed.

Namespacing

Module aliasing

Compiler Extension Hooks (PPX system)

Programming languages and proof assistants based on OCaml

OCaml is popular among researchers and programming language designers as an implementation language. Here are some products that leverage the OCaml toolchain:

  • The Coq Proof Assistant

  • F* (pronounced F star) is ""a general-purpose functional programming language with effects aimed at program verification. It puts together the automation of an SMT-backed deductive verification tool with the expressive power of a proof assistant based on dependent types." F* is written entirely in F*, and bootstraps via OCaml.

  • λProlog "is a logic programming language based on higher-order intuitionistic logic in the style of Church’s Simple Theory of Types."

    • ELPI - Embeddable λProlog Interpreter "designed to be embedded into larger applications written in OCaml as an extension language."

    • Makam "is a metalanguage: a language for implementing languages. It supports concise declarative definitions, aimed at allowing rapid prototyping and experimentation with new programming language research ideas. The design of Makam is based on higher-order logic programming and is a refinement of the λProlog language. Makam is implemented from scratch in OCaml."

    • Teyjus "is an efficient implementation of the higher-order logic programming language Lambda Prolog. " (moribund?)

    • Abella, an interactive theorem prover for λProlog programs

  • HOL Light - interactive theorem prover

  • Hack - "an object-oriented programming language for building reliable websites at epic scale"

  • Haxe - a "high-level strictly-typed programming language with a fast optimizing cross-compiler."

  • WebAssembly Reference Interpreter

  • Eff - a functional programming language based on algebraic effect handlers.

  • Links - "Allows web programs to be written in a single programming language…​"

  • Compilers, Typecheckers, and Parsers - OCamlverse