Dependencies Last updated May 4, 2022

TODO: how OBazl manages transitive deps.

Dependency classes

  • direct and indirect

  • module v. interface

  • configurable ("dynamic") deps

  • runtime deps (data v. code)

  • PPX co-dependencies

  • local v. external deps

  • OPAM pkg deps (special case)

transitivity

Trickier than it seems, because there are several different kinds of depgraph.

  • target depgraph

  • action depgraph

  • module (implementation) depgraph

  • interface depgraph

Depgraphs are fully transitive wrt build actions: when you build a target, its complete depgraph gets built.

OTOH, the kind of transitivity involved in constructing build commands is only partially transitive. Interface depgraphs are treated differently than imlementation depgraphs.

Each ocaml_module target propagates its module depgraph to its clients. That includes its own interface dep, but excludes the interface deps of its module deps.

For example, suppose A.cmo depends on B.cmo, and A.cmi depends on B.cmi. A.cmo’s depgraph will include A.cmi and B.cmo as direct dependencies. And since B.cmo depends directy on B.cmi, the latter will be included indirectly in A.cmo’s depgraph. But A.cmi’s depgraph will not be included; only A.cmi will be included in A.cmo’s depgraph.

Now suppose A.cmi depends on C.cmi in addition to B.cmi, but neither A.cmo nor B.cmo depend C.cmo. [Can this happen?] Then A.cmo depends on A.cmi, but not C.cmi; iow, it only uses the API declared by A.cmi. Building A.cmi will ccause C.cmi to be built, but the result need not be added to the depgraph of A.cmo, because the compiler will not need to find C.cmo in order to compile A.cmo. It only needs to find A.cmi.

To build A, we need B.cmi and B.cmo on the search path. Since A.cmo depends on B.cmo, and B.cmo depends on B.cmi, …​ B.cmi must be listed as an input the the A build action, so it must be provided by the B.cmo depgraph. But it is not provided by the A.cmi dep, even though A.cmi depends on B.cmi.

discovery v. normalization v. optimization

Separation of concerns: dep mgmt v. build

Build tasks involve explicit build commands, but they also always involve a critical bit of information that is often hidden or only implicit, namely the graph of the target’s dependencies. Dependency management is a well-known pain point for OCaml builds; the entire dependency graph of a build target must be made available to the compiler, and for archive and executable targets, must be listed explicitly on the command line in dependency order. By and large, managing dependencies by hand is infeasible for all but the most simple projects.

The strategy for managing dependencies adopted by Bazel (and thus OBazl) is starkly different from that of most other build systems.

[FIXME: three points:

  • finding and listing deps as input to the build

  • transparency of actual depgraphs (inspection using query "deps(…​), aquery, --output_groups=closure, etc.)

  • how OBazl normalizes and propagates depgraphs, advantages compared to dune, e.g. ppx_codeps]

Many build systems, Dune included, conflate dependency discovery and the build process. For example, Makefiles for building OCaml projects usually run ocamldep to generate .depends files listing dependencies, and build targets depend on these dynamically generated files. Dune build stanzas list direct dependencies, but indirect dependencies are discovered (using ocamldep) and added to the build dependency graph as part of the build process.

By contrast, Bazel enforces a strict separation between dependency discovery and the build process. All dependencies must be explicitly enumerated for Bazel before the build process begins; discovering and adding a dependency in the course of the build process is disallowed. This is a necessary feature of any hermetic (replicable) build process: if you want to design a replicable experiment, you start by fixing the initial conditions. Build systems that allow dynamic discovery and injection of dependencies cannot guarantee hermeticity. [WARNING: this is not accurate, to be revised]

The downside of having to explicitly enumerate the entire dependency graph for a project is that you have to explicitly enumerate the entire dependency graph for the project. But this is obviously a task that can and should be addressed by a build tool, just as it is for systems that do this discovery during the build process. The only difference is that for Bazel we run the dependency discovery tool before the build process commences, and we record its results and pass them as input to the build process.

Version 2 of OBazl includes tooling that can largely if not entirely automate the enumeration of dependencies. Currently there are some cases where it is difficult to discover all dependencies; for example targets that involve lots of indirection, -open arguments and include directives in source files. A goal of the OBazl project is to perfect this tool so that can always emit complete and correct dependency graphs.

Another notable feature of OBazl with respect to dependencies is that we get correct ordering for free, so to speak. Dependency ordering for compiler inputs, and for most build tools, is expressed syntactically, as list ordering (which is in part why managing deps in such systems is difficult). But OBazl maintains dependencies as a graph structure, so ordering is expressed as hierarchy. The only way to express a dependency of A on B is to list B explicitly in the deps attribute of A; there is no way to express it as the list B A, as one must do on the compiler command line. In particular, listing ["A", "B"] in a `deps` attribute does not express a dependency of B on A. In fact, it could be the case that A depends on B (either directly or indirectly), so when we serialize the graph derived from this list we will get B A. It follows that dependencies can be listed in any order; you can list them in alphabetical order if you wish.

The critical feature here is that Bazel provides out-of-the-box support for merging dependency graphs. If your dependencies are expressed as ordered lists, and you have multiple dependencies, then you have the task of merging the ordered lists in such a way that dependency order is maintained, which is non-trivial, since the same item may occur in different contexts in more than one list. [TODO: simple example]. Bazel provides a depset facility that handles such merging automatically and efficiently. OBazl rules use depsets to manage all dependencies.

deps that require special handling by the build engine: runtime data deps; runtime code deps (plugins); ppx-codeps

Configurable ("dynamic") deps

solves same problem as Dune’s (select …​ from …​) ( "alternative dependencies")

OCaml Dependencies

all direct dependencies must be explicitly listed. OBazl will not analyse implicit dependencies. However it will automatically handle indirect dependencies.

PPX Dependencies

PPX Codependencies

Sometimes PPX processing injects dependencies that are needed to compile the result of a PPX transformation. These are often called "runtime" dependencies, but OBazl calls them ppx_codependencies, since they are not in fact runtime dependencies. Runtime dependencies of a module or executable are needed when that module or executable is executed. These dependencies do not fit that description.

Dune

dune dependencies are expressed as libraries, or packages. In the latter case they are package-manager entitities. Bazel deps may be individual modules, libraries, archives, etc. but they are always build targets, not package-manager entities.

I.e. with dune one says "this build depends on that package", which really means that it depends on compiled entities contained in the package. With bazel one says "this target depends on that target". That does not always imply something compiled by bazel; targets may deliver strings, for example.

But in the case of local or immediate deps, there is an ambiguity.

Example: src/lib/syncable_ledger. The dunefile lists a library with name "syncable_ledger". The directory contains a file named "syncable_ledger.ml". So if we depend on "syncable_ledger", what is the dep, exactly?

The problem is that we use (by convention) target label ":foo" for "foo.ml". But if the lib name is also "foo" then we want to use ":foo" for the library, not the module.

Option: use ":foo_cm" for individual modules/files. The disadvantage of this is in label aesthetics.

Label concepts and aesthetics: the idea is that we treat each pkg as a conceptual unit, and the directory name as the concept name. So we get labels like "foo/bar:bar", which abbreviates to "foo/bar". The package may have additional targets, e.g. "foo/bar:baz", but the core concept of the package is captured by the name "foo/bar".

Normally (?) a package will correspond to a library/archive containing multiple modules. The tricky bit, with dune, is that the library name may match a module name. This requires some renaming and module aliasing.

Another option: use a naming convention for libs and archives, e.g. "foo_lib" and "foo_archive". But this prevents naming as above, i.e. we would have foo/bar:bar_lib instead of just foo/bar.

Rule of thumb: use same name for directory and for library/archive target, which should be the concept name. For modules in the lib/archive (i.e. source files in the dir), use :_Filename (only visible within the package) or :Filename (if the target is used outside the package).

ocamldep lists deps as module names. It doesn’t have any idea of library or package.

CC dependencies

Support for C/C++ deps is still under development. What’s there works but the interface will likely change since there are still some issues to be worked out, such as how best to support different link modes and options for each CC dependency.

A "CC dependency" is usually a C/C++ library, but CC dependencies need not necessarily be produced from C/C++ source code. Rather the term refers to the standard file format for object files, archives, etc., which is historically is closely associated with C. Many other languages (including OCaml, Rust, Go, etc.) are capable of producing such files; OBazl uses 'CC' terminology (e.g. CC library, cc_deps) to refer to such files no matter what language was used to produce them.

CC Libraries

A library is a collection of code units. CC libraries come in several flavors:

  • an unpackaged collection of object files (with .o extension)

  • a static 'library': a collection of code units (object files) packaged as an archive file, whose extension (by convention) is .a.

  • a dynamic shared 'library': code units (object files) packaged as a dynamic shared object (DSO) file. Yes, the terminology conflates two distinct concepts; OBazl generally treats 'dynamic' and 'shared' as synonyms. On Linux, these are .so files; on MacOS, they are .dylib files (but MacOS also supports .so files).

CC Linkmode

Rules producing CC libs commonly produce both a static archive (.a files) and one or more shared libraries (.so or .dylib files). In Obazl rules, link mode determines which type of library is used for linking. Possible values:

  • static: statically link to .a file.

  • dynamic: depending on the OS, link to .dylib file (MacOS) or .so file (Linux or MacOS).

  • shared: synonym for 'dynamic'

  • default: 'static' on Linux, 'dynamic' on MacOS.

NOTE The 'default' linkmode is configurable. The default value for linkmode 'default' is as noted above. To override the default for all rules, pass command-line option --@ocaml//linkmode; for example, to set default value for linkmode 'default' to 'dynamic' pass --@ocaml//linkmode:dynamic.

Dynamic Loading ('Plugins')

[TODO: document dynamic loading of CC deps; compare OCaml *.cmxs files. Distinction between linking and loading.]

Static Binaries

Executable binaries can be linked in several ways:

  • Dynamic - all deps are shared libs

  • Partially static - non-system libs may be statically linked, but system libs are dynamically linked

  • Fully static — all dependencies, including system libraries, are statically linked, resulting in a complete standalone executabe.

WARNING MacOS does not support statically linked executables. See Statically linked binaries on Mac OS X

Dependency Cycles

Legacy packages may include circular or mutual dependencies. Bazel disallows such dependencies.

Example: libfqfft/evaluation_domain/domains and libfqfft/polynomial_arithmetic are mutually dependent.

In this case listing each in the deps attribute of the other will fail, since Bazel will detect the dependency cycle.

The "proper" way to address this is to refactor the packages to eliminate the cycles. But we cannot do that with legacy code we do not control.

The workaround seems to be to use 'include_prefix' …​ This will make compilation work, at the cost of removing at least one of the dependencies, so a change in one will not force a recompile of the other. [Why does this work?]

MacOS

Linking dynamic libs to an executable on MacOS is an open issue. The workaround for now is to link to static libs.

Demo: [demos/interop/cc_deps]()

Resources: