Depencency Models

Concepts and Terminology
Build Action Dependencies
Module Dependencies
- Principal API
- Principal SPI
Cross-module Optimization
Fragile dependencies
References

Status: initial draft

Dependency managment for OCaml builds is rather complex. OCaml builds involve multiple dependency graphs that, while related, must be managed separately.

Concepts and Terminology

API v. SPI
provider v. consumer
direct v. indirect deps
compile-time v. link-time deps
sig deps v. struct deps
build actions: compile, archive, link

API: an OCaml signature, composed of fields

API provider: an OCaml struct

SPI: a library composed of (compiled) signatures

SPI provider: a collection of (compiled) modules and sigs?

NB: SPI provider is a build concept. The OCaml language does not define either "library" or "archive"; it has no first-class concept corresponding to "SP provider".

An API provider (i.e. an OCaml struct) must provide at least the fields listed in the API;
An API consumer (i.e. an OCaml struct?) may reference at most the fields listed in the API;
An SPI provider (i.e. a library) must provide at least the modules listed in the SPI;
An SPI consumer (i.e. an OCaml struct?) may reference at most the modules listed in the SPI.

Build Action Dependencies

Bazel makes a clear distinction between built targets and build actions. A build target is expressed as a rule instantiation, which lists all direct target dependencies. Each Bazel rule may have one or more build actions, each of which has a list of inputs and outputs. A rule implementation may select a subset of target dependencies to use as action inputs.

For example, a module build target (expressed by rule ocaml_module) will list direct module dependencies, but it will not include dependent modules as inputs to the compile action if cross-module optimization is disabled (by passing the -opaque compile flag).

On the other hand, dependent modules will always be passed on as outputs of such a build target; this makes them available as inputs to any link actions (which are the build actions of rule ocaml_binary).

Compilation

Signatures

Structures

Modules

Archiving

Linking Executables

Module Dependencies

Remember that OCaml has two type systems, one for modules and one for everything else. When referring to modules, type means module type.

A module is a binding: a pairing of a signature and a structure, where the latter satisfies the former. It follows that modules have two dependency graphs.

The meaning of "module A depends on module B" is surprisingly complicated.

Compiler perspective: The structure component of module A directly depends on the signature of module B, but not on the structural component of module B. The compilation model is similar to that of C, where compilation of a file that depends on a library requires the header file(s) of the library, but not its implementation. (An exception to this protocol involves cross-module optimization, explained below.)
Linker perspective: The compiled structure component of module A depends on the compiled structure component of module B, but not its signature. Again this is analogous to the model of C linkage, which requires dependent libraries but not their headers.

It follows that a dependency of (Bazel) target A on (Bazel) target B does not imply dependency of the build actions for A on the build actions of B! For example, the compile action for A need not depend on the compile action for B; that is, the result of compiling B may not be an input to the compile action of A, although it may be an input to any link action for which A is an imput. The OBazl ruleset handles these dependencies automatically.

Build system perspective: Different build systems may handle module dependencies in different ways. The OCaml compiler requires that signatures be compiled before structures, which makes structures dependent on signatures for compilation. For dependency resolution this is reversed: the depending (structure component of the) module depends on the signature component of its dependency, and in fact does not even depend on the structure component (that dependency only takes effect for the link action). So a build system has various ways to interpret "A depends on B". It could record a compile-dependency of struct A on sig B and a separate link-dependency of struct A on struct B, for example.
Ordering: This is where another distinction between dependency graph types is in order. Build languages allow the expression of dependencies; in OBazl we call those "target dependencies". But the build program must distinguish between such target dependencies and the dependencies of build actions, which are not necessarily the same thing.

For example, given "A depends on B", the target depgraph (for A) will contain both struct B and sig B (with the former dependent on the latter). But the depgraph of the compile action for A will include sig B but not struct B, and the depgraph of any link action that includes A will include struct B but not sig B.

"Module A depends on module B" does not necessarily mean that the signature of module A depends on the signature of module B. That may be the case, but it is not entailed by the module dependency.

Principal API

Every structure has a principal API, which is expressed by its principal (module) type (a/k/a principal signature).

Every module has a public API, expressed by its signature component. The public API of a module is a subset of the principal API of its structure component.

The principal API of a structfile may be extracted from its source code using the -i switch of the compiler. [TODO: cross-ref]

Principal SPI

Every structure (implementation component of a module) has a Service Programming Interface (SPI). The SPI is composed of all the modules directly referenced by the code of the structure.

Since each module has an SPI, we can form the transitiive closure of all SPIs, which gives us the (ordered) list of all module dependencies needed to compile. But each SPI contains only direct dependencies.

If we think of a module as a service provider, then the transitive closure of a module’s SPI represents the collection of services that must be provided to the compiler (by the environment, in practice the build system) in order for the structure (module) to compile and function.

SPIs are conceptual; unlike APIs, which are encoded as .mli/.cmi files, SPIs have no formal representation in either the language or any build systems that I know of. But they are expressed in build languages as dependency lists.

minimal SPI: the least set of dependencies sufficient for compilation
principal SPI: one dep for each explicit ref in the source, without duplicates

Building a module involves (symmettrically) satisfying both the API and the SPI.

To build a module, we bind its signature to a structure that satisfies the signature.

To compile a structure, we need to "bind its SPI" (so to speak) to a "structure" of modules (dependencies). In practice what this means is we need to make available to the compiler whatever modules it needs to resolve symbols in the structfile. But structurally it’s just like binding a structure to a signature, where the structure makes available whatever is needed to define the symbols in the signature.

So by analogy we will call a collection of modules satisfying a structure’s SPI a "depstruct" (???)

Cross-module Optimization

Fragile dependencies

The problem: if direct and indirect dependencies are treated the same, then a source file could reference symbols in indirect deps. Then changes in those deps may break the compilation. If the depgraph is large finding and fixing the problem could be difficult.

Policy: restrict dependencies to direct deps only. So source may not refer to a symbol defined in an indirect dep.

That’s what the new -H option is for (introduced in 5.2).

Fragile deps may also be introduced indirectly, so to speak. Example:

(* A.ml : imported hidden *)
type t = Toto

(* B.mli: imported visible *)
val f : A.t -> unit

(* C.ml: being type-checked *)
let () = B.f Toto

(src: https://github.com/ocaml/ocaml/pull/12246)

In this example C has a direct dependency on B and an indirect dependency on A, so we should have ocamlc -I path/to/b -H path/to/a.

But B.f (in C src) introduces a fragile dependency of C on A. If the type of t changes, the compile will fail. Since the dep graph could be deep this could be hard to diagnose. Adding A as a direct dep of C will not help.

Nice to have: a warning about the fragile dependency.

References

Add -Ihidden in addition to -I for avoiding transitive dependencies in the initial scope #31