Integrating Jupyter with Girder

Introduction

Girder is a free and open source web-based data management platform developed by Kitware as part of the Resonant data and analytics ecosystem. One of Girder’s core capabilities is data organization and dissemination. It provides a filesystem-like structure allowing files to be stored in a hierarchy. For a more detailed introduction checkout the docs.

Jupyter Notebook is a popular open source web application that allows you to create and share documents that contain live code, equations, visualizations and narrative text. It has a wide range of applications and is heavily used by the scientific community. The Jupyter Notebook web application provides a graphical interface for creating, opening, renaming, and deleting files in a virtual filesystem. This virtual filesystem is backed by a server component called a contents manager. The default implementation of this virtual file system stores files on the filesystem local to where the server is running. They are stored in the home directory of the user running the server.

We have developed girder_jupyter, a Python package that implements a contents manager that enables Girder to become a contents manager for Jupyter. This allows notebooks and files to be stored in a Girder server from within Jupyter.

One of the motivations behind the development of this extension is to facilitate the use of a Jupyter server that is launched by JupyterHub and runs in a Docker container. The user may not have an account on the machine running the container so the default file system based contents manager can’t easily be used. In this scenario a JupyterHub authenticator allows users to authenticate using their Girder credentials. JupyterHub then spawns a Jupyter server instance running in a Docker container configured with girder_jupyter and the appropriate Girder token. The user is then able to create content that will be stored in Girder.

Getting Started

The following steps are needed to setup the contents manager in JupyterLab. This is intended as a simple example to experiment with the use of the contents manager, and it doesn’t go into the details of using in a more complex multi-user environment.

1. Create a new Python virtual environment (optional)

2. Install JupyterLab

3. Install the contents manager package

4. Generate a default Jupyter configuration (note this step is not necessary if you have already run Jupyter before)

5. Now add the following line to the new configuration file (~/.jupyter/jupyter_notebook_config.py)

Where <api key> is an API key associated with your Girder server and <api url> is the URL to your Girder server.

A public instance of Girder is available here, you just need to register an account.  The <api_url> for this instance is  https://data.kitware.com/api/v1 .

6. Now you can run Jupyter Lab

The application will load and the content browser will show your folders in Girder. In this case we see the Public and Private folders.

You can now upload, create files, and/or folders from within the notebook and they will be stored in Girder. As you can see the folder listing in Girder’s interface shows the same files/folders shown in the notebook.

Conclusion

The Girder contents manager package provides a simple integration point to allow notebooks to store assets in Girder. It can be used in a single-user environment to provide a server based storage mechanism or combined with JupyterHub to provide a multi-user environment where a local file system based approach is not possible.

This material is based upon work supported by the U.S. Department of Energy, Office of Science, Office of Basic Energy Sciences, Small Business Innovation Research (SBIR) program under Award Number DE-SC0017193.

 

Questions or comments are always welcome!

X