Dependencies Last updated May 4, 2022
TODO: how OBazl manages transitive deps.
Dependency classes
-
direct and indirect
-
module v. interface
-
configurable ("dynamic") deps
-
runtime deps (data v. code)
-
PPX co-dependencies
-
local v. external deps
-
OPAM pkg deps (special case)
transitivity
Trickier than it seems, because there are several different kinds of depgraph.
-
target depgraph
-
action depgraph
-
module (implementation) depgraph
-
interface depgraph
Depgraphs are fully transitive wrt build actions: when you build a target, its complete depgraph gets built.
OTOH, the kind of transitivity involved in constructing build commands is only partially transitive. Interface depgraphs are treated differently than imlementation depgraphs.
Each ocaml_module
target propagates its module depgraph to its
clients. That includes its own interface dep, but excludes the
interface deps of its module deps.
For example, suppose A.cmo depends on B.cmo, and A.cmi depends on B.cmi. A.cmo’s depgraph will include A.cmi and B.cmo as direct dependencies. And since B.cmo depends directy on B.cmi, the latter will be included indirectly in A.cmo’s depgraph. But A.cmi’s depgraph will not be included; only A.cmi will be included in A.cmo’s depgraph.
Now suppose A.cmi depends on C.cmi in addition to B.cmi, but neither A.cmo nor B.cmo depend C.cmo. [Can this happen?] Then A.cmo depends on A.cmi, but not C.cmi; iow, it only uses the API declared by A.cmi. Building A.cmi will ccause C.cmi to be built, but the result need not be added to the depgraph of A.cmo, because the compiler will not need to find C.cmo in order to compile A.cmo. It only needs to find A.cmi.
To build A, we need B.cmi and B.cmo on the search path. Since A.cmo depends on B.cmo, and B.cmo depends on B.cmi, … B.cmi must be listed as an input the the A build action, so it must be provided by the B.cmo depgraph. But it is not provided by the A.cmi dep, even though A.cmi depends on B.cmi.
discovery v. normalization v. optimization
Separation of concerns: dep mgmt v. build
Build tasks involve explicit build commands, but they also always involve a critical bit of information that is often hidden or only implicit, namely the graph of the target’s dependencies. Dependency management is a well-known pain point for OCaml builds; the entire dependency graph of a build target must be made available to the compiler, and for archive and executable targets, must be listed explicitly on the command line in dependency order. By and large, managing dependencies by hand is infeasible for all but the most simple projects.
The strategy for managing dependencies adopted by Bazel (and thus OBazl) is starkly different from that of most other build systems.
[FIXME: three points:
-
finding and listing deps as input to the build
-
transparency of actual depgraphs (inspection using
query "deps(…)
,aquery
,--output_groups=closure
, etc.) -
how OBazl normalizes and propagates depgraphs, advantages compared to dune, e.g. ppx_codeps]
Many build systems, Dune included, conflate dependency discovery and
the build process. For example, Makefiles for building OCaml projects
usually run ocamldep
to generate .depends
files listing
dependencies, and build targets depend on these dynamically generated
files. Dune build stanzas list direct dependencies, but indirect
dependencies are discovered (using ocamldep
) and added to the build
dependency graph as part of the build process.
By contrast, Bazel enforces a strict separation between dependency discovery and the build process. All dependencies must be explicitly enumerated for Bazel before the build process begins; discovering and adding a dependency in the course of the build process is disallowed. This is a necessary feature of any hermetic (replicable) build process: if you want to design a replicable experiment, you start by fixing the initial conditions. Build systems that allow dynamic discovery and injection of dependencies cannot guarantee hermeticity. [WARNING: this is not accurate, to be revised]
The downside of having to explicitly enumerate the entire dependency graph for a project is that you have to explicitly enumerate the entire dependency graph for the project. But this is obviously a task that can and should be addressed by a build tool, just as it is for systems that do this discovery during the build process. The only difference is that for Bazel we run the dependency discovery tool before the build process commences, and we record its results and pass them as input to the build process.
Version 2 of OBazl includes tooling that can largely if not entirely
automate the enumeration of dependencies. Currently there are some
cases where it is difficult to discover all dependencies; for example
targets that involve lots of indirection, -open
arguments and
include
directives in source files. A goal of the OBazl project is
to perfect this tool so that can always emit complete and correct
dependency graphs.
Another notable feature of OBazl with respect to dependencies is that
we get correct ordering for free, so to speak. Dependency ordering for
compiler inputs, and for most build tools, is expressed syntactically,
as list ordering (which is in part why managing deps in such systems
is difficult). But OBazl maintains dependencies as a graph structure,
so ordering is expressed as hierarchy. The only way to express a
dependency of A on B is to list B explicitly in the deps
attribute
of A; there is no way to express it as the list B A
, as one must do
on the compiler command line. In particular, listing ["A", "B"] in a
`deps`
attribute does not express a dependency of B on A. In fact, it
could be the case that A depends on B (either directly or indirectly),
so when we serialize the graph derived from this list we will get B
A
. It follows that dependencies can be listed in any order; you can
list them in alphabetical order if you wish.
The critical feature here is that Bazel provides out-of-the-box
support for merging dependency graphs. If your dependencies are
expressed as ordered lists, and you have multiple dependencies, then
you have the task of merging the ordered lists in such a way that
dependency order is maintained, which is non-trivial, since the same
item may occur in different contexts in more than one list. [TODO:
simple example]. Bazel provides a depset
facility that handles such
merging automatically and efficiently. OBazl rules use depsets to
manage all dependencies.
deps that require special handling by the build engine: runtime data deps; runtime code deps (plugins); ppx-codeps
Configurable ("dynamic") deps
solves same problem as Dune’s (select … from …)
(
"alternative dependencies")
OCaml Dependencies
all direct dependencies must be explicitly listed. OBazl will not analyse implicit dependencies. However it will automatically handle indirect dependencies. |
PPX Dependencies
PPX Codependencies
Sometimes PPX processing injects dependencies that are needed to compile the result of a PPX transformation. These are often called "runtime" dependencies, but OBazl calls them ppx_codependencies, since they are not in fact runtime dependencies. Runtime dependencies of a module or executable are needed when that module or executable is executed. These dependencies do not fit that description.
Dune
dune dependencies are expressed as libraries, or packages. In the latter case they are package-manager entitities. Bazel deps may be individual modules, libraries, archives, etc. but they are always build targets, not package-manager entities. |
I.e. with dune one says "this build depends on that package", which really means that it depends on compiled entities contained in the package. With bazel one says "this target depends on that target". That does not always imply something compiled by bazel; targets may deliver strings, for example.
But in the case of local or immediate deps, there is an ambiguity.
Example: src/lib/syncable_ledger
. The dunefile lists a library with
name "syncable_ledger". The directory contains a file named
"syncable_ledger.ml". So if we depend on "syncable_ledger", what is
the dep, exactly?
The problem is that we use (by convention) target label ":foo" for "foo.ml". But if the lib name is also "foo" then we want to use ":foo" for the library, not the module.
Option: use ":foo_cm" for individual modules/files. The disadvantage of this is in label aesthetics.
Label concepts and aesthetics: the idea is that we treat each pkg as a conceptual unit, and the directory name as the concept name. So we get labels like "foo/bar:bar", which abbreviates to "foo/bar". The package may have additional targets, e.g. "foo/bar:baz", but the core concept of the package is captured by the name "foo/bar".
Normally (?) a package will correspond to a library/archive containing multiple modules. The tricky bit, with dune, is that the library name may match a module name. This requires some renaming and module aliasing.
Another option: use a naming convention for libs and archives, e.g. "foo_lib" and "foo_archive". But this prevents naming as above, i.e. we would have foo/bar:bar_lib instead of just foo/bar.
Rule of thumb: use same name for directory and for library/archive target, which should be the concept name. For modules in the lib/archive (i.e. source files in the dir), use :_Filename (only visible within the package) or :Filename (if the target is used outside the package).
ocamldep lists deps as module names. It doesn’t have any idea of library or package. |
CC dependencies
Support for C/C++ deps is still under development. What’s there works but the interface will likely change since there are still some issues to be worked out, such as how best to support different link modes and options for each CC dependency. |
A "CC dependency" is usually a C/C++ library, but CC dependencies need
not necessarily be produced from C/C++ source code. Rather the term refers
to the standard file format for object files, archives, etc., which is
historically is closely associated with C. Many other languages
(including OCaml, Rust, Go, etc.) are capable of producing such files;
OBazl uses 'CC' terminology (e.g. CC library
, cc_deps
) to refer to
such files no matter what language was used to produce them.
CC Libraries
A library is a collection of code units. CC libraries come in several flavors:
-
an unpackaged collection of object files (with
.o
extension) -
a static 'library': a collection of code units (object files) packaged as an archive file, whose extension (by convention) is
.a
. -
a dynamic shared 'library': code units (object files) packaged as a dynamic shared object (DSO) file. Yes, the terminology conflates two distinct concepts; OBazl generally treats 'dynamic' and 'shared' as synonyms. On Linux, these are
.so
files; on MacOS, they are.dylib
files (but MacOS also supports.so
files).
CC Linkmode
Rules producing CC libs commonly produce both a static archive (.a
files) and one or more shared libraries (.so
or .dylib
files). In
Obazl rules, link mode determines which type of library is used for
linking. Possible values:
-
static
: statically link to.a
file. -
dynamic
: depending on the OS, link to.dylib
file (MacOS) or.so
file (Linux or MacOS). -
shared
: synonym for 'dynamic' -
default
: 'static' on Linux, 'dynamic' on MacOS.
NOTE The 'default' linkmode is configurable. The default value
for linkmode 'default' is as noted above. To override the default
for all rules, pass command-line option --@ocaml//linkmode
; for
example, to set default value for linkmode 'default' to 'dynamic'
pass --@ocaml//linkmode:dynamic
.
Dynamic Loading ('Plugins')
[TODO: document dynamic loading of CC deps; compare OCaml *.cmxs files. Distinction between linking and loading.]
Static Binaries
Executable binaries can be linked in several ways:
-
Dynamic - all deps are shared libs
-
Partially static - non-system libs may be statically linked, but system libs are dynamically linked
-
Fully static — all dependencies, including system libraries, are statically linked, resulting in a complete standalone executabe.
WARNING MacOS does not support statically linked executables. See Statically linked binaries on Mac OS X
Dependency Cycles
Legacy packages may include circular or mutual dependencies. Bazel disallows such dependencies.
Example: libfqfft/evaluation_domain/domains and libfqfft/polynomial_arithmetic are mutually dependent.
In this case listing each in the deps attribute of the other will fail, since Bazel will detect the dependency cycle.
The "proper" way to address this is to refactor the packages to eliminate the cycles. But we cannot do that with legacy code we do not control.
The workaround seems to be to use 'include_prefix' … This will make compilation work, at the cost of removing at least one of the dependencies, so a change in one will not force a recompile of the other. [Why does this work?]
MacOS
Linking dynamic libs to an executable on MacOS is an open issue. The workaround for now is to link to static libs.
Demo: [demos/interop/cc_deps]()
Resources:
-
Dynamic Libraries, RPATH, and Mac OS Blog article from 2008, still useful.