. The pre-installed common environment means that users don't have to deal with tricky software installation and dependency management issues. We developed a tool called
clonenotebooks (based on
nbviewer) for browsing Jupyter Notebooks in repositories such as GitHub, and cloning them over to NERSC with a pointer to an execution environment.
Exploring parameter spaces is important in tomographic analysis, where one may need to experiment with different combinations of parameters to determine the best fit for the data at hand.
Papermill provides a mechanism to execute parameterized Jupyter Notebooks. A notebook cell tagged with the “parameters” tag, allows substitution for any variables defined in that cell with the execution parameters provided. This enables interactive exploration of parameter spaces - each parameter set can be applied to the same Notebook using Papermill. This can be used to quickly explore the space with HPC resources and identify a good parameter fit. The final parameter set can then be applied to additional data sets. HPC parallelism is achieved through Dask, in a similar manner to the earlier use cases
Conclusion
The patterns illustrated in the above use cases are very flexible and can be applied to other science domains as well. We see common themes around parallel execution, reproducible environments and interactive visualization come up repeatedly in our work enabling Jupyter for science on HPC systems.