About STEP
The Software Tools Ecosystem Project (STEP) is a US Department of Energy initiative to provide the ongoing support and enhancement needed for critical High Performance Computing (HPC) software tools to remain effective, efficient, and relevant in the rapidly evolving field of HPC. We define Tools to mean the collection of software packages that can be applied to monitor, analyze, and diagnose performance and behavior of computational science applications and systems.
Tools are closely bound to architectures and system software in ways that other types of software, such as libraries and scientific applications, are not. For example, a tool that tracks how an application uses computing resources must be able to measure low-level architectural events and metrics and relate them to program progress and source code. As a result, tools must closely track changes to applications, architectures, and their software stacks. To complicate matters, the need for tools is most acute for understanding code performance on systems that push the boundaries of technology and scale, but these systems’ novelty makes them extremely difficult for tool developers to support when first deployed. The advent of exascale systems, increases in architectural diversity, and additional complexity driven by heterogenity, all make providing effective tools for use by scientists and engineers essential to avoid impeding scientific discovery.
Community Vision
While stewardship and advancement of software is a pervasive need, HPC tools have several urgent and domain- specific challenges which further complicate sustainability:
- Exploding hardware complexity: The rapid pace of increasing hardware complexity and heterogeneity greatly expands tools’ targets and forces HPC tool developers to respond in a reactive manner.
- Exploding use cases: New and emerging application paradigms, including AI/ML, edge, and embedded instrumentation are shifting the usages that tools need to support. Additionally, there are new opportunities for tools in traditional HPC areas, such as feedback-driven dynamic resource management.
- The coordination challenge: Tools themselves are uniquely and closely tied to design decisions across different layers of the execution stack, including: hardware, system software, middleware, and applications.
- The management challenge: Building a sustainable tools ecosystem will require plans for organizing, operating budgets, community standards, technology tracking, workforce development with particular attention to promoting inclusive and equitable research (PIER).
History
The sunset of the Exascale Computing Project (ECP) is an important transition point for the HPC Tools ecosystem. STEP is funded under the Next Generation Scientific Software Technologies (NGSST) program which is a new ASCR cross cutting activity to understand the capabilities and opportunities of the HPC Tools community and find ways to sustain and improve the Tools ecosystem going forward in the wake of ECP’s conclusion.
Phase I of STEP began in April of 2023. During the last 8 months of 2023, STEP established a series of three Town Hall meetings to understand the opportunities and challenges faced by the HPC software tools community. Each Town Hall meeting was a significant multi-day face-to-face event with between 30 and 50 experts representing the HPC tools developers, vendors, HPC facilities and application teams communities. These disparate communities have not typically or regularly interacted as a group, but have significant inter-dependencies. The outcome of these Town Halls was a series of recommendations collated into reports; these reports are available on our Resources webpage.
Current Project Status
The STEP project formally transitioned to Phase II in January 2024 with an initial portfolio of five existing software packages that have already proven critical to the HPC community:
- PAPI (software package lead: Heike Jagode, University of Tennessee, Heike’s contact info)
- HPCToolkit (software package lead: John Mellor-Crummey, Rice University, John’s contact info)
- TAU (software package lead: Sameer Shende, University of Oregon, Sameer’s contact info)
- Dyninst (software package lead: Bart Miller, University of Wisconsin, Bart’s contact info)
- Darshan (software package lead: Shane Snyder, Argonne National Laboratory, Shane’s contact info)
STEP’s support for these projects emphasizes the ongoing sustainment and enhancement necessary for them to remain effective, efficient, and relevant in the rapidly evolving field of HPC.
STEP is seeking opportunities not only to expand this portfolio of software tools but also to pursue cross-cutting codesign and community activities that strengthen the tools community as a whole.
Contact Info
(Also refer to https://ascr-step.org/leadership )
• Terry Jones, STEP PI, <[email protected]>
• Phil Carns, STEP Deputy PI, <[email protected]>
• Mike Jantz, Diversity/Equity/Inclusion Point of Contact, <[email protected]>