Table 1. Examples of open-source molecular simulation codes and related supporting utilities developed within the chemical engineering molecular modeling community. Website links are to home pages of the codes or to code repositories. In addition, several other open-source codes emerging from the chemical engineering community are highlighted.
In this regard, the molecular simulation community in chemical engineering is particularly noted for sharing methods and capabilities by making software developed within the community freely available under open-source licenses, as described in a recent review article30. Table 1 provides examples of open-source molecular simulation tools developed within the ChEC, divided into simulation codes and other utilities. Similar to GEMC, many of these algorithms developed are primarily implemented within MC and hence it is not surprising that the bulk of open-source simulation engines developed within the ChEC (see Table 1) are for performing MC simulations. The need for community-developed simulation engines, whether they are MD or MC, stems from the fact that such codes have become increasingly difficult to develop, extend, and maintain for a single individual or single research group. This is due not only to an ever growing set of features and algorithms, but also due to changes in computing hardware utilized in a research environment: we have been through the era of vector architectures (e.g., Cray, Hitachi), parallel vector computers (a small number of coupled vector processors, such as Cray YMP), massively parallel shared memory computers (MPP, such as the Intel Paragon, in which a large number of the same commodity central processor units – CPUs – used in deskside computers are linked together and communicate over a communication network), multicore processors (such as Intel Xeon that has gone from 6 cores to more than 50) both stand alone and as part of an MPP, and more recently the inclusion of massively multicore graphical processing units (GPUs, which have migrated from the gaming industry into scientific computing and data manipulation). A modern supercomputer typically consists of nodes, connected via an interconnect (from vendors such as Mellanox and Intel)11These interconnects can vary from standardized ethernet connections to more specialized, proprietary high performance interconnects from various vendors. At the time of writing, the current top 500 list includes numerous systems with propriety interconnects such as Mellanox Infiniband (now owned by Nvidia), Intel Omni-Path, Cray Aries, and Fujitsu Tofu along with standard ethernet connections ranging from 10G to 100G.22, where each node houses multiple commodity multicore CPUs and GPUs. This is the dominant architecture of the supercomputers on the top 500 list of the fastest computers in the world31, with the top 5 supercomputers having between 1.5 and in excess of 10.5 million total computing cores at the time of writing; designing and maintaining simulation codes that perform efficiently on these rapidly evolving computer architectures is a significant challenge. Beyond community developed simulation engines, we have also seen the rise of other community developed utilities to support simulation, e.g., in the form of general analysis packages as well as software that makes it easier to accurately and reproducibility initialize configurations, apply force fields to molecules, and create input files for a variety of simulation engines.
In the remainder of this Perspective, as an example of ChEC open-source software, we focus our discussion on the Molecular Simulation Design Framework (MoSDeF), to which all the authors are contributors. MoSDeF is a set of Python tools to facilitate the initialization and parameterization of systems, with the goal of enabling transparent and reproducible molecular simulation workflows that, at the same time, are user-friendly and extensible.