STORMM: Alpha v0.2.0
Writing this as a quick update: but I have been honored and excited to contribute to the release of STORMM alpha v0.2.0. STORMM is a molecular mechanics package-- currently developed by Psivant Therapeutics-- that utilizes GPU Compute to conduct multiple molecular mechanics simulations in parallel.
You can read all about this update under the release page, as well as read everything about the toolchain and how to get started with it in the STORMM Website which I also helped develop and deploy.
My role in this project, once I was done learning and interning, was to undertake the engineering aspects of STORMM. The following post is more aligned towards writing some reflections and learnings as we optimized STORMM, lowered the barrier of entry, made it more accessible for Open Source communities, and deployed the next version.
Before we begin, I would also love to take a moment and thank Dave Cerutti and Woody Sherman for spearheading the project and giving me an opportunity to contribute; as well as colleagues from Psivant and broader Open Source contributors for their valuable insights.
The Barrier of Entry
A big undertaking that was taken with the second alpha release of STORMM was to generate more interest in the software from the broader open source community. This required a comprehensive overhaul of the build system and packaging process. We focused on making STORMM easier to build, deploy, and use across different platforms.
Part of this effort involved containerization, which means we've provided ready-to-use Dockerfiles. These files allow users to easily set up both CPU-only and CPU+GPU versions of STORMM without having to manually install all the dependencies. For more advanced users, we also created pipelines to deploy custom Singularity images, giving you full control over your environment. This significantly lowers the barrier to entry by abstracting away complex installation steps and dependency management.
We also put a lot of work into documentation. A lot of open-source projects fail to provide sufficient documentation to get started. Each program in STORMM now has a command-line navigable manual of all its input commands, as well as a website acting as a single source of truth. Moreover, the website has doxygen-generated pages which can be used in conjunction with the STORMM source code to gather further information on how everything works.
Space Complexity
Dynamic Hardware Detection
External Package Management
A significant aspect of improving STORMM's build process was tackling the complex and often painful world of C/C++ dependency management. Unlike other languages with well-established package managers, C++ can be a challenge. We needed a reliable way to handle external libraries, ensuring they could be easily integrated and built consistently across all supported platforms.
My work centered on creating a robust pipeline that automates this process. The new system allows developers to add new C/C++ libraries with minimal effort. This involves a modular approach to our build system, which can now fetch, compile, and link external dependencies without requiring manual configuration by the user.
The big aspect of package management in STORMM relies on the sandboxing of external packages in the context of our build directory, and not touching any system directories. You do not need root privileges to build/compile STORMM or the external packages it requires, and removing the build directory ensures all STORMM-related files and packages have been removed from the system. The linking of external packages is also done dynamically in the background to ensure any C++ compiler recognizes paths to such references in STORMM source code. Third party libraries and packages are all downloaded and compiled in [{stormm_build_dir}/third_party].
To validate this new system, we successfully integrated two critical third-party libraries: PocketFFT and NetCDF. PocketFFT is a highly optimized Fast Fourier Transform library, which will be instrumental for future performance-critical computations. NetCDF is a widely-used format for storing scientific data, and its integration allows STORMM to handle and export data in a standardized, interoperable format that is familiar to the scientific computing community. This pipeline paves the way for a more adaptable STORMM, where the community can easily add their own tools and packages.
Documentation
One of the most crucial undertakings for STORMM v0.2.0 was creating a comprehensive documentation website. With a codebase exceeding 300,000 lines, we knew that welcoming new contributors and users required more than just in-code comments. A centralized, searchable, and well-structured resource was non-negotiable.
The goal was to create a "single source of truth" for all things STORMM. The website serves multiple audiences:
Users: They need to get started quickly. We created a guided user manual, tutorials for common use cases, and clear installation instructions for our containerized environments.
Developers: They need to understand the codebase and contribution process. We developed a detailed developer guide that explains the system architecture, code standards, and how to use the new developer tools to create new keywords and control blocks. The website is also where we host a detailed API reference, automatically generated from the source code, to help with navigation.
Community: The website provides a home for our project. It links to our release notes, our GitHub repository, and provides a clear outline of our project's mission and how to get involved.
Any changes to the documentation files are automatically built and deployed to the live site, ensuring that our documentation always stays up to date with the latest code. This focus on documentation not only lowers the barrier to entry but also fosters a more engaged and empowered open-source community around STORMM.