The latest HPC and AI software development tools for 2024

There is a wide range of software tools available to HPC users. This article focuses on freely available or open-source tools that scientists can use to improve their software performance or increase software portability. While many different categories of tools are available to the HPC community, the exascale projects in the US and Europe are focused on the development of open source software, or software that can facilitate the use of a wide range of resources.

The trend towards open source – or at least collaboratively produced, freely available HPC tools – helps to harness the expertise of a fragmented software ecosystem to accelerate exascale development. This is particularly relevant for the development of the first applications for exascale, as these science codes will be the first applications to run at this kind of scale.

Hardware diversity is the driving trend for portability and the development of tools can make it easier for scientists to use the wide variety of potential exascale HPC architectures. LLVM, Raja, Kokkos and SYCL are all examples of software tools currently being used by the US Department of Energy National Labs in the development of the Exascale Computing Project (ECP). While these tools support different aspects of the HPC software stack, they share a common goal in promoting access to a wide range of resources and help scientists increase the portability of their applications.

LEGaTO is an example of a European-funded software framework for exascale computing. The software toolset has recently been released to the public and was designed to accelerate the use of heterogeneous resources and specific application areas, such as machine learning, healthcare and IoT applications for smart cities.

The current iteration of the EU-funded DEEP projects, ‘DEEP-SEA’, started on

1 April 2021 and will help to underpin the European Processor Initiative (EPI), which is developing hardware for exascale systems.

DEEP-SEA will deliver the programming environment for future European exascale systems, adapting all levels of the software stack to support highly heterogeneous compute and memory configurations. While this project is only just starting development, the goals are to allow code optimisation across existing and future architectures and systems. The software stack includes low-level drivers, computation and communication libraries, resource management and programming abstractions with associated run-time systems and tools.

Software tools

Arm Developer¹ provides a suite of software tools to help port and optimise applications, including porting and optimising HPC applications for Arm and the Arm SVE. These tools are split into groups based on their application area: biosciences, chemistry and materials, computational fluid dynamics (CFD), high-energy physics, weather and climate, benchmarks and mini-apps and visualisation.

Intel² provides several software tools aimed at helping developers optimise HPC applications and software, including frameworks for AI and data analytics running on Intel architecture. This includes open-source HPC platform software through OpenHPC, Intel Parallel Studio XE, Intel Distribution for Python and Intel oneAPI Toolkits.

LEGaTO³ (Low Energy Toolset for Heterogeneous Computing) is a programming framework designed to support heterogeneous systems. The toolset enables scientists to make use of CPU, GPU and FPGA resources that can offload specific tasks to different acceleration technologies through its own run-time system.

After three years of research, the various elements of the European-funded project software toolset have been integrated together to facilitate the porting of future use cases to the energy-efficient LEGaTO hardware/software platform.

The LLVM Project⁴ is a collection of modular and reusable compiler and toolchain technologies written in C++. The LLVM Core libraries provide a modern source- and target-independent optimiser, along with code generation support for many popular CPUs. These libraries are built around a well-specified code representation known as the LLVM intermediate representation (LLVM IR).

The LLVM code representation provides in-memory compiler IR, and a human-readable assembly language representation. This allows LLVM to provide a powerful intermediate representation for efficient compiler transformations and analysis, while providing a natural means to debug and visualise the transformations.

The Kokkos C++ Performance Portability EcoSystem⁵ is a production-level solution for writing modern C++ applications in a hardware-agnostic way. It is part of the US Department of Energies ECP. The EcoSystem consists of multiple libraries addressing the primary concerns for developing and maintaining applications in a portable way. The three main components are the Kokkos Core Programming Model, the Kokkos Kernels Math Libraries and the Kokkos Profiling and Debugging Tools. The Nvidia HPC Software Development Kit (SDK)6 includes compilers, libraries and software tools essential to maximising developer productivity and the performance and portability of HPC applications.

The Nvidia HPC SDK⁶ C, C++ and Fortran compilers support GPU acceleration of HPC modelling and simulation applications with standard C++ and Fortran, OpenACC directives and CUDA. GPU-accelerated math libraries maximise performance on common HPC algorithms, and optimised communications libraries enable standards-based multi-GPU and scalable systems programming.

OpenCL (Open Computing Language)⁷ is an open, royalty-free standard for cross-platform, parallel programming of diverse accelerators found in supercomputers, cloud servers, personal computers, mobile devices and embedded platforms. OpenCL improves the speed and responsiveness of a wide spectrum of applications in numerous market categories, including professional creative tools, scientific and medical software, vision processing and neural network training and inferencing.

Together with the OpenCL 3.0 specification, the working group has released an early initial Khronos OpenCL SDK that developers can use to easily begin OpenCL coding. The SDK is open sourced on the Khronos GitHub under the Apache 2.0 license and will be continuously updated and expanded. This initial SDK release includes a new OpenCL guide, headers including vendor extensions, some small sample programs to illustrate how to use the SDK build system (with CI), and an ICD Loader that will soon support installable development layers.

OpenMP⁸ is a specification for a set of compiler directives, library routines and environment variables that can be used to specify high-level parallelism in Fortran and C/C++ programs. OpenMP allows users to create, manage, debug and analyse parallel programs while helping to support portability. The directives extend the C, C++ and Fortran base languages with single program multiple data (SPMD) constructs, tasking constructs, device constructs, worksharing constructs and synchronisation constructs, and they provide support for sharing, mapping and privatising data.

OpenHPC⁹ is a Linux Foundation Collaborative Project whose mission is to provide a reference collection of open-source HPC software components and best practices, lowering barriers to deployment, and advancement and use of modern HPC methods and tools.

OpenHPC v2.0 was the most recent significant update that targets support for two new major OS distro versions: CentOS8 and OpenSUSE Leap 15. As the OpenHPC 2.x series targets major new distro versions, please note it is not intended to be backwards compatible with the previous OpenHPC 1.3.x series. OpenHPC, v2.3 is the current update intended primarily to enable resource manager support with the newer hwloc included in RHEL 8.4.

RAJA¹⁰ is a software library of C++ abstractions, developed at Lawrence Livermore National Laboratory (LLNL), which enables architecture and programming model portability for HPC applications. RAJA has two main goals: to enable application portability with manageable disruption to existing algorithms and programming styles; and to achieve performance comparable to using common programming models (for example, OpenMP and CUDA.)

RAJA is part of a portability suite that includes other tools: CHAI, UMPIRE and CAMP. All these tools are developed by LLNL and are freely available on GitHub.

SYCL¹¹ (pronounced ‘sickle’) is a royalty-free, cross-platform abstraction layer that enables code for heterogeneous processors to be written using standard ISO C++, with the host and kernel code for an application contained in the same source file.

First introduced in 2014, SYCL is a C++ based heterogeneous parallel programming framework for accelerating HPC, machine learning, embedded computing, and compute-intensive desktop applications on a wide range of processor architectures, including CPUs, GPUs, FPGAs and tensor accelerators.