Unpacking Module Bundlers Part 1: What is a module?
At 4/19/2024
This is the first of three posts in a series about JavaScript client-side modules and code packaging.
The JavaScript community has this incredible superpower where standards of practice will emerge not from any formal process, but just from use, and will converge on a remarkably consistent, simple, and well-defined API (or small set of competing APIs). There are a lot of reasons for this — natural selection at work, the widespread practice of learning mostly by sharing code recipes causing effective and simple-to-use code patterns to replicate quickly, and of course 800-pound-gorilla projects like node
/npm
, AMD
, and gulp
. Call it the bright side of hipsterism, if you will.
It’s not a perfect world, though — ask any ten JS devs what their major points of pain are and all of them will name client-side modules and code packaging:
It’s one of the areas, like Streams, where the community has done a solid job of defining a de-facto standard API for something and sticking to it, and a rather schizophrenic job of documenting what problems the de-facto standard is solving and how it’s solving them.
It’s easy to find boilerplate and recipes for things like setting RequireJS up to make your client-side code modular and even maybe concatenate and minify it for you, but what actually is a module? How do the different systems compare? When should you use AMD or CommonJS or an ES6 polyfill? Just what exactly is Browserify actually doing? I’d like to explore confusing questions like this in this series.
What’s the problem?
In short: We need to organize and reuse our JavaScript code. But we’re faced with incompatible module syntaxes and various module loaders and tools, and choosing the right one is mysterious.
It’s inevitably necessary in any nontrivial software project to split code into a tree of interdependent segments. And we need to be able to reuse those segments.
But say segment foo
relies on segment bar
, and bar
in turn absolutely needs the segment baz
to get its job done. How do we manage that? Does foo
then need to directly ensure the presence of baz
(and anything baz
needs and so on up the whole chain)? That’s crazy-making and untenable, and it’s the reason that it’s not enough just to split your code into separate files and include the right script tags on each page.
But what’s the bigger picture?
The real-life problem is that, as our own Lyza Danger Gardner points out, it’s a lost cause to try and cover every specific tool, interaction, and failure mode of shipping JavaScript to the browser.
Instead, I want to help you understand the context and the pieces involved in modules and packaging. The goal of this first of three articles is to give you a bigger picture of what modules—the building-block segments of our code—are at an abstract level so that we can start putting them together and making them work.
Then what’s a module?
Let’s step back from any particular language, library or standard and think about modules in the abstract for a moment. What, exactly, is a module? In the strict computer science sense, a module is a way of associating a value (most usually a collection of named subroutines) with a name of some type — and that’s it.
To make this actually useful, however, one also needs a mechanism for specifying module dependencies, that is, a way to say “This piece of code needs the values provided by this list of module names in order to be understood”.
These two parts, taken together with some under-the-hood code responsible for connecting a module name to its associated value, is what we mean when we talk about module systems.
For a concrete example, in the world of JavaScript vis-a-vis node.js, a module is a single JavaScript file, and within each module, that module’s dependencies are specified using the require
statement to assign the value associated with that module name to a local variable. Like so:
/* module-c.js */
var moduleA = require('module-a');
var moduleB = require('module-b');
Code language: JavaScript (javascript)
Most languages (Java, Python, even C…after a fashion) include some type of module system, but not JavaScript (yet — more on this in an upcoming post), so we have had to come up with our own.
There are a number of different, competing specifications for defining modules—module definition schemes. Examples are AMD, UMD, CommonJS, and node.js
.
Fortunately, although there are numerous module schemes, the community has settled on a very simple formal definition of what a module itself actually is that is common across all schemes.
JavaScript Modules, the Formal Definition
As encountered in the wild, JavaScript modules all have the following characteristics:
1. A module is a JavaScript value
A module is a JavaScript value. That’s it. Valid examples:
3
'Hello world'
{ readSync: function() {...},
write: function() {...},
/* ... */
}
Code language: JavaScript (javascript)
Read those examples carefully: you really can, in all of the major module definition schemes, create a module whose value is just the integer 3
— or anything else that can be returned from a function in JavaScript. Usually, though, it’s a function
or Object
.
As we’ll see to be a trend, this all actually isn’t quite true for ES6 modules — but isn’t quite false, either. More on this in a future post.
2. A module is returned by a factory function
A factory function is a bit of code that gets called and returns the value of the module. Usually, this takes the form of a function you supply to the module system that will get called when the value of your module is needed.
That said, it may or may not look like you’re writing a function when you create your module. In AMD
, it does — this is the function passed to define
— while in node.js
, your file (module) is effectively wrapped in a function that returns the module.exports
object.
This value, the one returned by the factory function, is the value that users of your module get when they request your module by name from the module system (e.g. in node
, this looks like var foo = require('foo-module');
).
Each time the module name associated with this module gets requested, the module system is expected to return whatever the value of the module should be by invoking this factory function. Generally this happens once per runtime environment (that is, once per web page load), and the module system caches the value for later use if more things request it.
3. A module is associated with a string name
Every module needs to have a string-valued name, e.g.
'foo'
'jquery'
'underscore'
'bar/baz/quux'
'./myLocalModule'
Watch out! This is not a filename, though it sometimes looks like one. It’s just an identifier to associate a value with.
Some module loading schemes do apply some level of structure to this — e.g. AMD
allows relative pseudo-file-paths (to be explained in a future post, don’t worry).
Modules: putting the pieces together
A JavaScript module is a JavaScript value, returned by a factory function or file, associated with a string
-valued name…
…and that’s it. All nontrivial browser-side module loading schemes share this definition of a module (but as usual, we’ll see that ES6 modules are the exception), which means the main difference is the syntax for how modules get defined and resolved. This is why the UMD
module definition scheme can work — also to be explained shortly.
Recall that a module is not, necessarily, associated with a file on a filesystem or web server. Many module systems do make this association in some way, but it’s not a part of the fundamental concept of a module and it’s important to understand this distinction so that you can understand that one of the important questions about any given module system is how it maps modules to files.
What’s Next:
Code module handling is one of those simple-in-theory, fiddly-in-practice concepts, and nowhere is this more true than in the browser, but hopefully this post has given you a compass by which to navigate some of these subtleties.
In the next part of this series, we’ll explore the landscape of JavaScript module schemes and learn a lot more about putting these ideas into practice, so stay tuned!
Cloud Four team members Tyler Sticka, Erik Jung and Lyza Gardner contributed to this post.