Jupyter notebooks (and notebook programming in general) great; if you’re a data scientist or someone who uses code in some kind of research/iterative setting, they’re amazing for keeping a paper trail of reproducible experiments, or for making sure your notes or hypotheses are illustrated by real code and real outputs.
Clojure is an amazing language for data science, thanks to its awesome take on data structure manipulation as well as its java interop which opens a huge set of performant libraries to it.
A Clojure kernel?
The current ecosystem for Clojure kernels is, however, pretty limited. The main project on this front is Clojupyter which, though it works, wasn’t enough for me for the following reasons:
- When I first tried it I had some important stability concerns (and wanted to use it for a professional purpose).
- The codebase isn’t readable/maintainable enough to my taste: uses an important number of dependencies, has everything in one namespace, and doesn’t do enough to contain mutability to where it’s absolutely necessary.
- It lacked advanced Jupyter capabilities (like the possibility to plot or display tables), but point 2. meant that it would be a pain to contribute and add them.
- Its approach was to provide a project that would compile to a kernel, and which one would thus need to tweak to suit one’s own context.
All this lead me to write my own implementation, CLJ-Jupyter, where I tried to solve some of these problems; I’ve tried to build this kernel in a sane way, splitting into namespaces and keeping my code tidy for easy refactoring (I’ve also implemented a basic reloaded workflow for easier development). In addition, it uses as few dependencies as possible. This is especially important if you plan on using this in conjunction to an existing project. Fewer dependencies means you’re less likely to encounter conflicts when adding your own libraries.
As it stands, my kernel is able to evaluate basic code and shouldn’t have issues handling exceptions. I can’t say I’ve battle tested it much though!
My primary purpose in building this was to learn more about Clojure as well as Jupyter. Here’s what I’ve learned:
Be mindful of adding dependencies
No matter how popular, I’ve learned to be mindful of how I use dependencies. They sometimes lead you to writing code that’s more complex than if you had avoided them. For example, I began by implementing the kernel using Component which I absolutely adore, but which was totally unnecessary and was almost a bit overkill. I did reuse many of its concepts and ideas though!
Java is your friend
Clojure comes packed with the full Java standard library, which means you have at your disposal an enormous amount of battle-tested algorithms and tools. A typical example was the implementation of the signing function. Additionally, I wouldn’t always go for Clojure wrappers around Java libraries. The ZMQ library, which is used in the Jupyter messaging protocol, has an API mostly made of static functions, and so is quite pleasant to use within a Clojure codebase!
Build quickly, structure after
Unless you’re very good or have given a lot of thought to your project, you’re likely to be missing knowledge which you’ll only find out on the go. I’m not a socket or Jupyter expert, which means I had to rebuild everything (almost) from scratch regulary, after finding I had assumed too much about how different pieces fit together.
Coding is learning
More than anything this has showed me how starting a full project and working all the way to a state where I’m satisfied is a great way to enjoy high quality learning. By high quality I mean: learning that force me to think about structure and flow of data, rather than simply elements of syntax and idiomaticity. I’m also pretty sure that in a month I’ll feel like rebuilding it again from scratch in a better way!
My current priorities are:
- Implementing some kind of plotting.
- Building a lein plugin which use this project to turn any codebase into a Jupyter kernel.