Namespacing Last updated May 4, 2022

Overview

The Problem

OCaml supports namespaces within the language. But unlike many other languages that support articulated names like A.B.C, it does not map such "module paths" to filesystem paths.

OCaml uses a flat namespace for module source file names. If you use the same module file name in different file system locations within a project you will get a name clash. Every module source file must be uniquely named.

The Solution

There is only one way to address the problem: give your source files names that are likely to be unique. The easy way to do this is to decide on a prefix string likely to be unique. That might start with your project name, or it might replicate the filesystem path locating your source file, by replacing filesystem separator (usually '/') with '_' or some other character.

The downside of this solution is that it creates another problem. Now instead of having e.g. a/b.ml, we need a/a__b.ml or the like.

A better solution is to adopt a hybrid strategy that uses both filename prefixing and OCaml aliasing to implement a form of namespacing. That is the strategy pursued by both Dune and OBazl.

The "top-level module aliases" facility provides a mechanism that tools can use to emulate hierarchical filesystem-based namespacing.

Special considerations: -no-alias-deps, -opaque, and -linkall.

Building Blocks

Type-Level Module Aliases

OCaml has a sophisticated module system that is partially tied to the file system.

Each OCaml "compilation unit" determines a module, whose name is the file name, capitalized and truncated to remove the extension. Thus foo.ml determines module Foo.

The OBazl Demos repo contains some basic examples, using simple makefiles, demonstrating the interplay of module aliasing equations, namespaces, and the file system.

File names including double underscores, such as foo__bar.ml, receive special treatment. The compiler will treat the double underscore as a dot, in this case yielding Foo.bar.

[T]he compiler uses the following heuristic when printing paths: given a path Lib__fooBar, if Lib.FooBar exists and is an alias for Lib__fooBar, then the compiler will always display Lib.FooBar instead of Lib__fooBar. This way the long Mylib__ names stay hidden and all the user sees is the nicer dot names. This is how the OCaml standard library is compiled.

Translated into English, this seems to mean that, for example. if lib.ml contains module FooBar = Lib__fooBar, then Lib.FooBar corresponds to Lib__fooBar.

This use of double underscores is a convention, not a rule. Aliasing may use any legal module name. In particular module A = A is legal.
Module names are opaque! OCaml will not interpet a module name like A__B as the name of a module B in a namespace A.

Resolver Modules

Automatic Renaming

Namespace Models

OBazl uses the term namespace to refer to a collection of modules and/or interfaces named with a namespace prefix (such as Stdlib__), together with resolver module containing an aliasing equation for each submodule (e.g. module List = Stdlib__List). The name of the resolver module serves as the name of the namespace; the "entry point" to the namespace, as it were.

Dune calls them "wrapped" libraries. They are very commonly used, if only because the Dune library stanza builds them by default. OBazl provides support for two distinct namespace models, top-down and bottom-up.

Dune uses variations on the term "wrapped library" where OBazl uses namespacing terminology.
What OBazl calls a "namespace resolver module" (or just "resolver") is sometimes called a "mapper" or "wrapper".

A namespace may be packaged as an OCaml archive, but this is not a requirement; under OBazl, bottom-up namespaces may be aggregated using ocaml_archive or ocaml_library, or may not be aggregated at all.

'Namespace' as used by OBazl is not a first-class concept in the OCaml language. There is no Namespace type or keyword.

NS Resolver Modules

Both the top-down and the bottom-up models depend essentially on a module containing the module aliasing equations that determine submodules; OBazl calls such modules ns resolver modules or resolvers for short. They are express using rule ocaml-ns-resolver.

Top-down

Top-down namespaces are defined at the aggregate level. This is the strategy pursued by Dune: membership in a namespace is expressed by the list of modules in a library stanza. Dune automatically renames the modules, adding the namespace prefix (e.g. Foo__) to the module names, and generates a resolver ("wrapper") module containing the required module aliasing equations ; then everything is packaged up in an archive file.

To support top-down namespaces, OBazl provides rules ocaml_ns_archive and ocaml_ns_library; members of the namespace are listed in the submodules attribute. OBazl does the same thing Dune does: generates the resolver and renames the submodules. Both Dune and OBazl support user-defined resolver modules for top-down namespaces. Dune always packages namespaces in archives, but in OBazl they may also be organized as OBazl libraries, or they may not be aggregated at all. (See Aggregators for more on the distinction OBazl draws between libraries and archives.)

Top-down namespaces are easy to define, but they are limited. Targets that use a namespaced module must depend on the namespace aggregate (ocaml_ns_library or ocaml_ns_archive) containing the module; they cannot depend directly on submodules. If changes are made to a submodule, all targets that depend on its namespace aggregate will be rebuilt, whether or not they depend on the module that was changed.

Namespace Aggregators

Each of these rules has a submodules attribute which contains a list of the labels of modules to be included in the namespace.

For example, if the name of an ocaml_ns_library rule is foo, and it contains submodule :bar, then the ns module will be Foo.cmx, and the bar submodule will be renamed to Foo__Bar.cmx. To produce Foo.cmx, OBazl will generate Foo.ml, containing aliasing equations like module Bar = Foo__Bar.

How it works

This approach involves a circularity: in order to generate and compile Foo.cmx, the ocaml_ns_library rule must depend on the submodules; but the submodules in turn must depend on the ns resolver module (Foo.cmx in this case). OBazl can get around this, though, since in fact the ns resolver module only depends on the module names, not the compiled modules. This is achieved using the -no-alias-deps option.

That solves half of the problem; the other problem to be resolved is that each submodule must depend on the resolver module. A submodule cannot depend on the ns aggregator rule that contains it, on pain of circularity; yet it must depend on the resolver, and the aggregator rule contains the information needed to generate the resolver module source code.

We get around this circularity by subterfuge. We use a combination of hidden label-typed build settings attributes and user-defined transitions to pass configuration information down the dependency chain, so that the bottom node in the chain depends on the top node - but only for configuration data. In other words, we split the circular dependency into a module dependency tree going from aggregator to submodule to resolver, and a configuration dependency going the other way around.

Rules involved in top-down namespacing (ocaml_ns_library, ocaml_ns_archive, ocaml_module and ocaml_signature) have a hidden attribute, _ns_resolver, that expresses a dependency on a single ocaml_ns_resolver target. They also have a hidden _ns_submodules attribute. Both of these are label-typed build settings.

The ocaml_ns_resolver target, in turn, depends on some other label attributes. The transition functions set these attributes at build time; in effect, they allow us to give this resolver target "reverse dependencies": the attributes that control its build are set by targets that depend on it. Submodules depend on these two deps, but since the parameters controlling them are set dynamically, at build time, the object depended on will be customized for the submodule that depends on it.

More specifically: for all rules the hidden _ns_resolver attribute has default value @rules_ocaml//cfg/ns. That target is a 'label_setting' whose value is [the label of] an ocaml_ns_resolver target. This makes each rule (target) depend on the same ns resolver module. The build parameters for that module are set dynamically using transition functions. In particular, the hidden _ns_submodules attribute has default value @ocaml//ns:submodules, which is a string_list_flag; it too is set by transition functions at build time.

The result is that building an ocaml_ns_library or ocaml_ns_archive target causes transition functions to propagate the list of submodule names (as strings) to both the submodule dependency targets and the hidden ns resolver target. The ocaml_module (and ocaml_signature) implementations check this list to see if they are included as submodules; if so, they rename the source file, prefixing the namespace name, before compiling. The ocaml_ns_resolver target uses the list to generate a structfile with the namespace name, containing the module aliasing equations that define the namespace membership.

For example, when we build an ocaml_ns_library target, the transition functions will set the value of _ns_resolver to the desired namespace, and _ns_submodules to the list of submodules for the namespace. These settings will be set before bazel proceeds to build the submodules. When the time comes to build a submodule, Bazel will see that it depends on the ns resolver, so it will first build the latter. The build rule for it uses the values set by the transition functions, so the result is a resolver that depends on the information needed to make it work to compile the submodule.

Bottom-up

Top-down namespaces have one major shortcoming, as noted above: clients can only depend on the aggregates; they may not depend directly on submodules. Bottom-up namespaces eliminate this shortcoming. Targets may depend directly on namespaced modules; furthermore, bottom-up namespaces need not be organized as library or archive aggregates at all. They are determined by explicitly defining an ocaml_ns_resolver specifying the namespace prefix and listing its submodules. The submodules (which may include interfaces) indicate their membership in a namespace directly, by passing an ocaml_ns_resolver target label via the ns attribute of ocaml_module and ocaml_signature.

A less serious shortcoming of top-down namespaces is the use of transition functions with hidden label-typed attributes, which adds overhead (and considerable complexity, if you ever need to debug them). Bottom-up namespaces use neither hidden attributes nor transition functions.

Top-down namespaces select their submodules; the submodules in a bottom-up namespace elect membership.

Bottom-up namespaces are much more powerful and flexible than top-down namespaces. Targets can depend directly on namespaced submodules; this can be used to optimize builds. When a bottom-up submodule is changed only targets that depend on it are rebuilt. And since aggregation and namespacing are orthogonal, namespaced submodules can be aggregated ad libitum. For example, if a set of targets depends on a subset of three submodules in a namespace that contains ten submodules, this subset can be aggregated as a library or archive. Multiple aggregates can contain submodules from the same namespace. Aggregates can even contain submodules from multiple namespaces. The OBazl rules will ensure that the resolver module is always included in the dependency graphs of submodules, and OBazl’s dependency manager will always normalize the graphs to remove duplicates while retaining dependency order.

Another way to look at it: in most languages that explicitly support some form of namespacing, namespaces are closed, in the sense that the only way to access an element in the namespace is by going through the namespace, so to speak. OBazl’s bottom-up namespaces are open: we can access the submodules in a bottom-up namespace without reference to the namespace name.

Which is to say that such "namespaces", being based on OCaml’s module aliasing mechanism, are only pseudo-namespaces. The OCaml language does not know anything about such namespaces; it only knows how to resolve module aliases. For example, a reference A.B might be aliased to A__B (i.e. a__b.ml, automatically renamed from b.ml), but module names are opaque; OCaml will not interpret A__B as "submodule B in namespace A ". So we can access that (sub)module directly, without "going through" module A. In fact we can include it in any namespace we like; for example, we can put it in namespace Foo by putting the following aliasing equation in the resolver module foo.ml: module B = A__B. We can also expose it under a different name: module Bar = A__B would expose it as Foo.Bar.

As an example: just about everything in the OCaml compiler sources depends on the standard library, which is packaged as an archive stdlib.cma built by target //stdlib. If those dependencies are expressed as dependencies on //stdlib, then a change in any stdlib submodule will trigger a rebuild of almost everything. But if they are expressed as direct submodule dependencies, e.g. //stdlib:Stdlib.List, then the rebuild triggered by a change to one submodule will include only those targets that genuinely depend on it, directly or indirectly. (Example: parsing/BUILD.bazel)


  • bottom-up ns does not automatically entail an aggregate. Aggregates containing namespaced modules must be explicitly defined, and they may contain a subset of the submodules in an ns, or submodules from multiple namespaces. IOW, aggregation and namespacing are orthogonal.

  • clients cannot depend on a namespace; they can only depend on aggregates or singletons (modules, sigs).

  • a change to a submodule in a ns will cause a recompile of any aggregate that contains it, and of anything that depends on the aggregate. but targets that depend on a submod directly will not be affected by changes to other submods in the ns. Whereas with a top-down ns, targets can only depend on the ns-aggregates, so any change in any submodule will force a recompile of all cllients.

  • changing one submodule does not entail a rebuild of any sibling submodules.

  • the user may provide a custom resolver module, which can be any module that contains the module aliasing equations needed to support the ns. submodules then just list this module’s label in their ns attribute. This is what happens with the Stdlib modules of the compiler.

  • supports direct dependency on individual submodules in the namespace. We cannot depend on a dotted module path, but we can depend on a module in a namespace, and we can use a naming convention to me it look like a dotted path. For example, the bazelized version of the OCaml compiler uses dotted names for the Stdlib; so the target name to compile the buffer.ml module of the stdlib is Stdlib.Buffer; to build it: bazel build //stdlib/Stdlib.Buffer. NB this is just a convention.

  • normalized/optimized build files can be queried to show optimized dep graphs i.e. no spurious dependencies. I.e. if you depend on a a top-down ocaml_ns_library, the dep graph will show a dependency on all submodules in the ns lib. With bottom-up namespacing and optimized build files no spurious deps will be shown.

  • OTOH, if you depend on the ns_resolver of a bottom-up namespace, the dep graph will not include the submodules, since the submodules depend on the resolver, not the other way around. So there are trade-offs.

    • FIXME: is there a way to write a query that will show the submodules too? probably. can this be done by an aspect?

Troubleshooting

Case studies

Multiple submodules with same name

Case A

This situation arose during OBazl development. To develop a tool we wanted to borrow some code from Dune for parsing Dune files. The Dune code contains src/dune_lang/escape.ml and src/stdune/escape.ml (and their interface files). If both were included in ns libraries then name clashes could emerge. This is because namespace aliasing always starts with the original module (file) name. So in this case we had two namespaces both of whose resolvers contained aliasing equations for 'Escape'.

The compile for dune_lang/template.ml, which depends on Escape, was failing with Unbound value for Escape.escape. The problem was not that OCaml could not resolve the reference to Escape, but that it resolved it to stdune/escape.ml instead of the intended dune_lang/escape.ml, which does not define escape.

The reason was that template.ml began with open Stdune, so the ns resolver for that namespace was used to look up Escape, yielding a reference to stdune/escape.ml.

But if template.ml starts by opening Stdune, then how else could a reference to Escape be resolved? This turned out to by my error: I had included both escape.ml files in their respective package namespace libraries, without bothering to closely inspect the 'main' ns modules (stdune/stdune.ml and dune_lang/dune_lang.ml). These did not include aliasing equations for Escape. So the reference to it within dune_lang/template.ml would be resolved without using any namespace (i.e. aliasign) lookups.

To make this work in OBazl use the following technique:

WARNING the following is obsolete (our namespacing strategy has changed)

  • Exclude the non-namespaced files from the ns-env. One way to do this is to use the exclude parameter of the glob function; for example:

    ns_env(aliases = glob(["*.ml"], exclude = ["escape.ml"]))
  • Do not list the non-namespaced module in the submodules dictionary of the ocaml_ns_library rule.

  • Do not use a prefix attribute on the ocaml_module rule instances used to build the non-namespaced modules.

  • If the non-namespaced module depends on a namespaced module, you must '-open' the namespace containing the latter. Use the prefix of your ns_env() as the module name. For example:

    opts = ["-open", "Demos_Obazl_Stdune__00_ns_env"]
Version 2 supports an open attribute for rules ocaml_module and ocaml_signature.
Currently this must be done manually, but will eventually be automated.
Case B

Same problem involving module Glob, found in src/dune_engine and other_libs/dune_glob.

The error message:

File "bazel-out/darwin-fastbuild/bin/obazl/dune_engine/_obazl_/Demos_Obazl_Dune_engine__Predicate_lang.ml", line 1:
Error: The implementation bazel-out/darwin-fastbuild/bin/obazl/dune_engine/_obazl_/Demos_Obazl_Dune_engine__Predicate_lang.ml
       does not match the interface bazel-out/darwin-fastbuild/bin/obazl/dune_engine/_obazl_/Demos_Obazl_Dune_engine__Predicate_lang.cmi:
       ...
       In module Glob:
       Values do not match:
         val of_glob :
           Demos_Obazl_Dune_engine__Glob.t -> (string -> bool) t/2
       is not included in
         val of_glob : Demos_Obazl_Dune_glob__Glob.t -> t/1
       File "bazel-out/darwin-fastbuild/bin/obazl/dune_engine/_obazl_/Demos_Obazl_Dune_engine__Predicate_lang.mli", line 49, characters 2-27:
         Expected declaration
       File "bazel-out/darwin-fastbuild/bin/obazl/dune_engine/_obazl_/Demos_Obazl_Dune_engine__Predicate_lang.ml", line 133, characters 6-13:
         Actual declaration
       File "bazel-out/darwin-fastbuild/bin/obazl/dune_engine/_obazl_/Demos_Obazl_Dune_engine__Predicate_lang.ml", line 116, characters 2-24:
         Definition of type t/1
       File "bazel-out/darwin-fastbuild/bin/obazl/dune_engine/_obazl_/Demos_Obazl_Dune_engine__Predicate_lang.ml", lines 3-8, characters 0-22:
         Definition of type t/2
Target //obazl/dune_engine:_Predicate_lang failed to build

In short: the problem arose because of the way OBazl handles dependencies. It retains transitive deps and strictly preserves ordering. In this case, the way we listed dependencies resulted in the insertion of dune_glob/glob.cmo between predicate_lang.mli and dune_engine/glob.cmo, so it and predicate_lang.ml used different Glob modules.

Long story short: sometimes this can happen if a structfile and its sigfile have different deps. Still not sure what causes this problem, but the workaround was to move the dep on //obazl/dune_glob from _Glob to _Glob.cmi.

B Same name for ns main module and ns submodule

Demo set035/case03: ocaml_ns_module.name = color, contains submodule:

"//namespaces/obazl/set030/case01:color": "Color",

Only way around this is to change the main ns name?


Tips

  • Count your underscores! It’s easy to write Foo_Bar_Baz when you should write Foo__Bar_Baz, in which case you may get an 'Unbound module' warning.

  • If you use a main module, you probably need to exclude it from the ns_env. Otherwise it will be aliased. e.g. from dune_glob:

ns_env(aliases = glob( ["*.ml"], exclude = ["dune_glob.ml"] ) + ["lexer.mll"])

inconsistent assumptions over interface

File "namespaces/obazl/set300/case370/foo-bar/test.ml", line 1:
Error: Files namespaces/obazl/set300/case370/foo-bar/test.cmo
       and bazel-out/darwin-fastbuild/bin/namespaces/obazl/set300/case370/foo-bar/_obazl_/Demos_Namespaces_Obazl_Set300_Case370_Foo_bar__Red.cmo
       make inconsistent assumptions over interface Demos_Namespaces_Obazl_Set300_Case370_Foo_bar__Red