ns_env(aliases = glob(["*.ml"], exclude = ["escape.ml"]))
Namespacing Last updated May 4, 2022
Overview
The Problem
OCaml supports namespaces within the language. But unlike many other
languages that support articulated names like A.B.C
, it does not map
such "module paths" to filesystem paths.
OCaml uses a flat namespace for module source file names. If you use the same module file name in different file system locations within a project you will get a name clash. Every module source file must be uniquely named.
The Solution
There is only one way to address the problem: give your source files names that are likely to be unique. The easy way to do this is to decide on a prefix string likely to be unique. That might start with your project name, or it might replicate the filesystem path locating your source file, by replacing filesystem separator (usually '/') with '_' or some other character.
The downside of this solution is that it creates another problem. Now
instead of having e.g. a/b.ml
, we need a/a__b.ml
or the like.
A better solution is to adopt a hybrid strategy that uses both filename prefixing and OCaml aliasing to implement a form of namespacing. That is the strategy pursued by both Dune and OBazl.
The "top-level module aliases" facility provides a mechanism that tools can use to emulate hierarchical filesystem-based namespacing.
Special considerations: -no-alias-deps
, -opaque
, and -linkall
.
Building Blocks
Type-Level Module Aliases
OCaml has a sophisticated module system that is partially tied to the file system.
Each OCaml "compilation unit" determines a module, whose name is the
file name, capitalized and truncated to remove the extension. Thus
foo.ml
determines module Foo
.
The OBazl Demos repo contains some basic examples, using simple makefiles, demonstrating the interplay of module aliasing equations, namespaces, and the file system.
File names including double underscores, such as foo__bar.ml
, receive
special treatment. The compiler will treat the double underscore as a
dot, in this case yielding Foo.bar
.
[T]he compiler uses the following heuristic when printing paths: given a pathLib__fooBar
, ifLib.FooBar
exists and is an alias forLib__fooBar
, then the compiler will always displayLib.FooBar
instead ofLib__fooBar
. This way the longMylib__
names stay hidden and all the user sees is the nicer dot names. This is how the OCaml standard library is compiled.
Translated into English, this seems to mean
that, for example. if lib.ml
contains module FooBar = Lib__fooBar
,
then Lib.FooBar
corresponds to Lib__fooBar
.
This use of double underscores is a convention, not a rule.
Aliasing may use any legal module name. In particular module A = A
is legal.
|
Module names are opaque! OCaml will not interpet a module name like A__B as the name of a module B in a namespace A .
|
References
-
Better namespaces through module aliases (blogpost, 2014)
Resolver Modules
Automatic Renaming
Namespace Models
OBazl uses the term namespace to refer to a collection of modules
and/or interfaces named with a namespace prefix (such as Stdlib__
),
together with resolver module containing an aliasing equation for
each submodule (e.g. module List = Stdlib__List
). The name of the
resolver module serves as the name of the namespace; the "entry point"
to the namespace, as it were.
Dune calls them "wrapped" libraries. They are very commonly used, if
only because the Dune library
stanza builds them by default. OBazl
provides support for two distinct namespace models, top-down and
bottom-up.
Dune uses variations on the term "wrapped library" where OBazl uses namespacing terminology. |
What OBazl calls a "namespace resolver module" (or just "resolver") is sometimes called a "mapper" or "wrapper". |
A namespace may be packaged as an OCaml archive, but this is not a
requirement; under OBazl, bottom-up namespaces may be aggregated using
ocaml_archive
or ocaml_library
, or may not be aggregated at all.
'Namespace' as used by OBazl is not a first-class concept in the OCaml language. There is no Namespace type or keyword. |
NS Resolver Modules
Both the top-down and the bottom-up models depend essentially on a module containing the module aliasing equations that determine submodules; OBazl calls such modules ns resolver modules or resolvers for short. They are express using rule ocaml-ns-resolver.
Top-down
Top-down namespaces are defined at the aggregate level. This is the
strategy pursued by Dune: membership in a namespace is expressed by
the list of modules in a library
stanza. Dune automatically renames
the modules, adding the namespace prefix (e.g. Foo__
) to the module
names, and generates a resolver ("wrapper") module containing the
required module aliasing equations ; then everything is packaged up in
an archive file.
To support top-down namespaces, OBazl provides rules
ocaml_ns_archive
and ocaml_ns_library
; members of the namespace
are listed in the submodules
attribute. OBazl does the same thing
Dune does: generates the resolver and renames the submodules. Both
Dune and OBazl support user-defined resolver modules for top-down
namespaces. Dune always packages namespaces in archives, but in OBazl
they may also be organized as OBazl libraries, or they may not be
aggregated at all. (See Aggregators for more on the
distinction OBazl draws between libraries and archives.)
Top-down namespaces are easy to define, but they are limited. Targets
that use a namespaced module must depend on the namespace aggregate
(ocaml_ns_library
or ocaml_ns_archive
) containing the module; they
cannot depend directly on submodules. If changes are made to a
submodule, all targets that depend on its namespace aggregate will be
rebuilt, whether or not they depend on the module that was changed.
Namespace Aggregators
Each of these rules has a submodules
attribute which contains a list
of the labels of modules to be included in the namespace.
For example, if the name of an ocaml_ns_library
rule is foo
, and
it contains submodule :bar
, then the ns module will be Foo.cmx
,
and the bar submodule will be renamed to Foo__Bar.cmx
. To produce
Foo.cmx
, OBazl will generate Foo.ml
, containing aliasing equations
like module Bar = Foo__Bar
.
How it works
This approach involves a circularity: in order to generate and compile
Foo.cmx
, the ocaml_ns_library
rule must depend on the submodules;
but the submodules in turn must depend on the ns resolver module
(Foo.cmx
in this case). OBazl can get around this, though, since in
fact the ns resolver module only depends on the module names, not the
compiled modules. This is achieved using the -no-alias-deps
option.
That solves half of the problem; the other problem to be resolved is that each submodule must depend on the resolver module. A submodule cannot depend on the ns aggregator rule that contains it, on pain of circularity; yet it must depend on the resolver, and the aggregator rule contains the information needed to generate the resolver module source code.
We get around this circularity by subterfuge. We use a combination of hidden label-typed build settings attributes and user-defined transitions to pass configuration information down the dependency chain, so that the bottom node in the chain depends on the top node - but only for configuration data. In other words, we split the circular dependency into a module dependency tree going from aggregator to submodule to resolver, and a configuration dependency going the other way around.
Rules involved in top-down namespacing (ocaml_ns_library
,
ocaml_ns_archive
, ocaml_module
and ocaml_signature
) have a
hidden attribute, _ns_resolver
, that expresses a dependency on a
single ocaml_ns_resolver
target. They also have a hidden
_ns_submodules
attribute. Both of these are
label-typed build settings.
The ocaml_ns_resolver
target, in turn,
depends on some other label attributes. The transition functions set
these attributes at build time; in effect, they allow us to give this
resolver target "reverse dependencies": the attributes that control
its build are set by targets that depend on it. Submodules depend on
these two deps, but since the parameters controlling them are set
dynamically, at build time, the object depended on will be customized
for the submodule that depends on it.
More specifically: for all rules the hidden _ns_resolver
attribute
has default value @rules_ocaml//cfg/ns
. That target is a
'label_setting' whose value is [the label of] an ocaml_ns_resolver
target. This makes each rule (target) depend on the same ns resolver
module. The build parameters for that module are set dynamically using
transition functions. In particular, the hidden _ns_submodules
attribute has default value @ocaml//ns:submodules
, which is a
string_list_flag
; it too is set by transition functions at build time.
The result is that building an ocaml_ns_library
or
ocaml_ns_archive
target causes transition functions to propagate the
list of submodule names (as strings) to both the submodule dependency
targets and the hidden ns resolver target. The ocaml_module
(and
ocaml_signature
) implementations check this list to see if they are
included as submodules; if so, they rename the source file, prefixing
the namespace name, before compiling. The ocaml_ns_resolver
target
uses the list to generate a structfile with the namespace name,
containing the module aliasing equations that define the namespace
membership.
For example, when we build an ocaml_ns_library
target, the
transition functions will set the value of _ns_resolver
to the
desired namespace, and _ns_submodules
to the list of submodules for
the namespace. These settings will be set before bazel proceeds to
build the submodules. When the time comes to build a submodule, Bazel
will see that it depends on the ns resolver, so it will first build
the latter. The build rule for it uses the values set by the
transition functions, so the result is a resolver that depends on the
information needed to make it work to compile the submodule.
Bottom-up
Top-down namespaces have one major shortcoming, as noted above:
clients can only depend on the aggregates; they may not depend
directly on submodules. Bottom-up namespaces eliminate this
shortcoming. Targets may depend directly on namespaced modules;
furthermore, bottom-up namespaces need not be organized as library
or archive
aggregates at all. They are determined by explicitly
defining an ocaml_ns_resolver
specifying the namespace prefix and
listing its submodules. The submodules (which may include interfaces)
indicate their membership in a namespace directly, by passing an
ocaml_ns_resolver
target label via the ns
attribute of
ocaml_module
and ocaml_signature
.
A less serious shortcoming of top-down namespaces is the use of transition functions with hidden label-typed attributes, which adds overhead (and considerable complexity, if you ever need to debug them). Bottom-up namespaces use neither hidden attributes nor transition functions.
Top-down namespaces select their submodules; the submodules in a bottom-up namespace elect membership.
Bottom-up namespaces are much more powerful and flexible than top-down namespaces. Targets can depend directly on namespaced submodules; this can be used to optimize builds. When a bottom-up submodule is changed only targets that depend on it are rebuilt. And since aggregation and namespacing are orthogonal, namespaced submodules can be aggregated ad libitum. For example, if a set of targets depends on a subset of three submodules in a namespace that contains ten submodules, this subset can be aggregated as a library or archive. Multiple aggregates can contain submodules from the same namespace. Aggregates can even contain submodules from multiple namespaces. The OBazl rules will ensure that the resolver module is always included in the dependency graphs of submodules, and OBazl’s dependency manager will always normalize the graphs to remove duplicates while retaining dependency order.
Another way to look at it: in most languages that explicitly support some form of namespacing, namespaces are closed, in the sense that the only way to access an element in the namespace is by going through the namespace, so to speak. OBazl’s bottom-up namespaces are open: we can access the submodules in a bottom-up namespace without reference to the namespace name.
Which is to say that such "namespaces", being based on OCaml’s module
aliasing mechanism, are only pseudo-namespaces. The OCaml language
does not know anything about such namespaces; it only knows how to
resolve module aliases. For example, a reference A.B
might be
aliased to A__B
(i.e. a__b.ml
, automatically renamed from
b.ml
), but module names are opaque; OCaml will not interpret A__B
as "submodule B
in namespace A
". So we can access that
(sub)module directly, without "going through" module A. In fact we can
include it in any namespace we like; for example, we can put it in
namespace Foo
by putting the following aliasing equation in the
resolver module foo.ml
: module B = A__B
. We can also expose it
under a different name: module Bar = A__B
would expose it as
Foo.Bar
.
As an example: just about everything in the OCaml compiler sources
depends on the standard library, which is packaged as an archive
stdlib.cma
built by target //stdlib
. If those dependencies are
expressed as dependencies on //stdlib
, then a change in any stdlib
submodule will trigger a rebuild of almost everything. But if they are
expressed as direct submodule dependencies, e.g.
//stdlib:Stdlib.List
, then the rebuild triggered by a change to one
submodule will include only those targets that genuinely depend on it,
directly or indirectly. (Example: parsing/BUILD.bazel)
-
bottom-up ns does not automatically entail an aggregate. Aggregates containing namespaced modules must be explicitly defined, and they may contain a subset of the submodules in an ns, or submodules from multiple namespaces. IOW, aggregation and namespacing are orthogonal.
-
clients cannot depend on a namespace; they can only depend on aggregates or singletons (modules, sigs).
-
a change to a submodule in a ns will cause a recompile of any aggregate that contains it, and of anything that depends on the aggregate. but targets that depend on a submod directly will not be affected by changes to other submods in the ns. Whereas with a top-down ns, targets can only depend on the ns-aggregates, so any change in any submodule will force a recompile of all cllients.
-
changing one submodule does not entail a rebuild of any sibling submodules.
-
the user may provide a custom resolver module, which can be any module that contains the module aliasing equations needed to support the ns. submodules then just list this module’s label in their
ns
attribute. This is what happens with the Stdlib modules of the compiler. -
supports direct dependency on individual submodules in the namespace. We cannot depend on a dotted module path, but we can depend on a module in a namespace, and we can use a naming convention to me it look like a dotted path. For example, the bazelized version of the OCaml compiler uses dotted names for the Stdlib; so the target name to compile the
buffer.ml
module of the stdlib isStdlib.Buffer
; to build it:bazel build //stdlib/Stdlib.Buffer
. NB this is just a convention. -
normalized/optimized build files can be queried to show optimized dep graphs i.e. no spurious dependencies. I.e. if you depend on a a top-down
ocaml_ns_library
, the dep graph will show a dependency on all submodules in the ns lib. With bottom-up namespacing and optimized build files no spurious deps will be shown. -
OTOH, if you depend on the
ns_resolver
of a bottom-up namespace, the dep graph will not include the submodules, since the submodules depend on the resolver, not the other way around. So there are trade-offs.-
FIXME: is there a way to write a query that will show the submodules too? probably. can this be done by an aspect?
-
Troubleshooting
Case studies
Multiple submodules with same name
Case A
This situation arose during OBazl development. To develop a tool we
wanted to borrow some code from Dune for parsing Dune files. The Dune
code contains src/dune_lang/escape.ml
and src/stdune/escape.ml
(and their interface files). If both were included in ns libraries
then name clashes could emerge. This is because namespace aliasing
always starts with the original module (file) name. So in this case we
had two namespaces both of whose resolvers contained aliasing equations
for 'Escape'.
The compile for dune_lang/template.ml
, which depends on Escape
,
was failing with Unbound value
for Escape.escape
. The problem was
not that OCaml could not resolve the reference to Escape
, but that
it resolved it to stdune/escape.ml
instead of the intended
dune_lang/escape.ml
, which does not define escape
.
The reason was that template.ml
began with open Stdune
, so the ns
resolver for that namespace was used to look up Escape
, yielding a
reference to stdune/escape.ml
.
But if template.ml
starts by opening Stdune
, then how else could a
reference to Escape
be resolved? This turned out to by my error: I
had included both escape.ml
files in their respective package
namespace libraries, without bothering to closely inspect the 'main'
ns modules (stdune/stdune.ml
and dune_lang/dune_lang.ml
). These
did not include aliasing equations for Escape
. So the reference to
it within dune_lang/template.ml
would be resolved without using any
namespace (i.e. aliasign) lookups.
To make this work in OBazl use the following technique:
WARNING the following is obsolete (our namespacing strategy has changed)
-
Exclude the non-namespaced files from the ns-env. One way to do this is to use the
exclude
parameter of theglob
function; for example:
-
Do not list the non-namespaced module in the
submodules
dictionary of theocaml_ns_library
rule. -
Do not use a
prefix
attribute on theocaml_module
rule instances used to build the non-namespaced modules. -
If the non-namespaced module depends on a namespaced module, you must '-open' the namespace containing the latter. Use the prefix of your
ns_env()
as the module name. For example:
opts = ["-open", "Demos_Obazl_Stdune__00_ns_env"]
Version 2 supports an open attribute for rules ocaml_module and ocaml_signature .
|
Currently this must be done manually, but will eventually be automated.
Case B
Same problem involving module Glob
, found in src/dune_engine
and other_libs/dune_glob
.
The error message:
File "bazel-out/darwin-fastbuild/bin/obazl/dune_engine/_obazl_/Demos_Obazl_Dune_engine__Predicate_lang.ml", line 1:
Error: The implementation bazel-out/darwin-fastbuild/bin/obazl/dune_engine/_obazl_/Demos_Obazl_Dune_engine__Predicate_lang.ml
does not match the interface bazel-out/darwin-fastbuild/bin/obazl/dune_engine/_obazl_/Demos_Obazl_Dune_engine__Predicate_lang.cmi:
...
In module Glob:
Values do not match:
val of_glob :
Demos_Obazl_Dune_engine__Glob.t -> (string -> bool) t/2
is not included in
val of_glob : Demos_Obazl_Dune_glob__Glob.t -> t/1
File "bazel-out/darwin-fastbuild/bin/obazl/dune_engine/_obazl_/Demos_Obazl_Dune_engine__Predicate_lang.mli", line 49, characters 2-27:
Expected declaration
File "bazel-out/darwin-fastbuild/bin/obazl/dune_engine/_obazl_/Demos_Obazl_Dune_engine__Predicate_lang.ml", line 133, characters 6-13:
Actual declaration
File "bazel-out/darwin-fastbuild/bin/obazl/dune_engine/_obazl_/Demos_Obazl_Dune_engine__Predicate_lang.ml", line 116, characters 2-24:
Definition of type t/1
File "bazel-out/darwin-fastbuild/bin/obazl/dune_engine/_obazl_/Demos_Obazl_Dune_engine__Predicate_lang.ml", lines 3-8, characters 0-22:
Definition of type t/2
Target //obazl/dune_engine:_Predicate_lang failed to build
In short: the problem arose because of the way OBazl handles
dependencies. It retains transitive deps and strictly preserves
ordering. In this case, the way we listed dependencies resulted in the
insertion of dune_glob/glob.cmo
between predicate_lang.mli
and
dune_engine/glob.cmo
, so it and predicate_lang.ml
used different
Glob
modules.
Long story short: sometimes this can happen if a structfile and its sigfile have different deps. Still not sure what causes this problem, but the workaround was to move the dep on //obazl/dune_glob from _Glob to _Glob.cmi.
B Same name for ns main module and ns submodule
Demo set035/case03: ocaml_ns_module.name = color, contains submodule:
"//namespaces/obazl/set030/case01:color": "Color",
Only way around this is to change the main ns name?
Tips
-
Count your underscores! It’s easy to write
Foo_Bar_Baz
when you should writeFoo__Bar_Baz
, in which case you may get an 'Unbound module' warning. -
If you use a main module, you probably need to exclude it from the ns_env. Otherwise it will be aliased. e.g. from dune_glob:
ns_env(aliases = glob( ["*.ml"], exclude = ["dune_glob.ml"] ) + ["lexer.mll"])
inconsistent assumptions over interface
File "namespaces/obazl/set300/case370/foo-bar/test.ml", line 1:
Error: Files namespaces/obazl/set300/case370/foo-bar/test.cmo
and bazel-out/darwin-fastbuild/bin/namespaces/obazl/set300/case370/foo-bar/_obazl_/Demos_Namespaces_Obazl_Set300_Case370_Foo_bar__Red.cmo
make inconsistent assumptions over interface Demos_Namespaces_Obazl_Set300_Case370_Foo_bar__Red