Skip to main content

Optimising life science code for HPC

Life sceicne HPC

Credit: TanyaJoy/Shutterstock

The life sciences are generating unprecedented amounts of data, driven by advancements in genomics, proteomics, and biomedical imaging and instruments that produce much larger quantities of data, such as high throughput sequencing machines.

High-performance computing (HPC) environments can provide the scalable resources necessary to process and analyse these vast datasets but this unparalleled performance comes at a cost. Achieving optimal performance in HPC environments requires careful code optimisation tailored to the unique demands of life science applications and the HPC infrastructure.

Code optimisation can be done in-house if the existing teams have the required expertise. Alternatively, an organisation can use free or commercial tools to provide assistance or engage with companies that specialise in optimising HPC codes. In each case, the goal is the same: to deliver better optimisation, performance and efficiency for a real-world set of codes on a given HPC architecture.

Expert support

Optimising life science code to exploit HPC systems fully is a multifaceted challenge that involves technical proficiency and strategic planning. Alongside traditional optimisation approaches, combining cloud computing and the Outcome-as-a-Service (OaaS) deployment model can help to significantly change how researchers achieve computational efficiency. In May, Viridien launched its AI Cloud solution, designed to meet the needs of data-intensive industries, including life sciences, that seek to optimise and accelerate their and resource-heavy HPC and AI workloads.

Viridien’s AI Cloud combines the latest high-performance architecture, including NVIDIA H100 Tensor Core GPUs fully configured and optimised for HPC and AI, with a pre-installed software environment that can be tailored for each client. 

Viridien experts manage the complexities of cloud computing and infrastructure, allowing researchers to work with the cloud provider to identify code bottlenecks, improve decision-making and also unlock further business value by combining AI cloud with Viridien’s results-based OaaS deployment model.

To learn more about how OaaS cloud computing can enhance the performance of your real-world life sciences applications, read the white paper from Viridien.

A helping hand

In November, Codee, provider of software developer tools for automated code review and testing, announced a strategic partnership with Do IT Now to address the growing need for efficient and reliable software solutions that can handle the complexities of modern HPC environments. The partnership aims to help organisations improve code quality, ensure correctness, modernise legacy code, enforce coding guidelines, ensure portability, optimise performance, and accelerate software delivery.

Do IT now users gain access to Codee, which performs automated analysis on every line of code, identifying issues, discovering opportunities for modernisation and optimisation without the need to execute the code. The software then generates detailed reports tailored to different profiles, such as developers and managers, explaining how to leverage these opportunities. By integrating seamlessly into existing CI/CD workflows, Codee enables software development teams to produce fast, maintainable, correct code by accelerating software development and delivery.

Steps for in-house code optimisation

Understanding the structure and behaviour of the application is the first step in optimisation. Profiling tools such as gprof, VTune, and HPCToolkit help identify performance bottlenecks, such as sections of code that consume excessive computational resources or memory. These insights allow researchers to focus on areas that will deliver the most significant performance gains. Profiling also reveals inefficiencies in data handling, algorithm execution, or I/O operations that may otherwise go unnoticed.

Algorithm and data structure choices have a profound impact on performance. Researchers must evaluate whether their algorithms are computationally efficient and scalable. For example, replacing an O(n²) algorithm with an O(n log n) alternative can yield dramatic improvements. Parallel algorithms, such as those employing domain decomposition, are often essential in HPC environments to exploit the distributed nature of the infrastructure. Furthermore, optimising data structures for better memory locality reduces cache misses and improves execution speed, which is particularly important in systems where memory access latency can significantly impact performance.

Parallelisation is another concept that is central to HPC optimisation. Life science workloads, such as molecular dynamics simulations or genome assembly, often require substantial parallel computation to meet performance demands. Researchers can use parallel computing frameworks like MPI (for distributed memory systems) or OpenMP (for shared memory systems) to distribute computations across multiple processors. Effective parallelisation also requires careful load balancing to ensure that all processors are utilised efficiently, as uneven workload distribution can lead to performance degradation. Scalability testing helps determine how well an application performs as the number of cores or nodes increases, highlighting any limitations in the current parallelisation strategy.

Efficient handling of input/output (I/O) operations is another critical factor. Life science applications often involve large datasets, such as genomic sequences or imaging data, which can overwhelm I/O systems if handled incorrectly. Optimising I/O includes batching smaller reads and writes to reduce overhead, using file formats that support compression to minimise data size, and implementing parallel I/O libraries like MPI-IO or parallel HDF5. These practices ensure that data movement does not become a bottleneck in the workflow.

Resource optimisation is another crucial consideration. Modern CPUs include SIMD (Single Instruction, Multiple Data) instructions, which can be exploited through vectorised code or libraries that take advantage of these capabilities. Similarly, GPU acceleration has become increasingly crucial for computationally intensive tasks such as molecular modelling or deep learning. Frameworks like CUDA, OpenCL, or TensorFlow allow researchers to harness the power of GPUs effectively. Memory usage also requires attention; excessive allocation and deallocation can create overhead, which can be mitigated by using memory pools or reusing buffers.

Containerisation is a powerful tool for ensuring that optimised code runs consistently across different environments. Tools like Docker and Singularity package code and its dependencies enable seamless execution across local and cloud-based HPC systems. Pre-optimised container images for widely used bioinformatics tools, such as GROMACS or BLAST, save time and effort by providing ready-to-deploy solutions.

It should be noted that code optimisation is an ongoing process. Benchmarking applications regularly help measure performance and identify new bottlenecks as data sizes and algorithms evolve. Monitoring execution time and cost efficiency is especially important in pay-as-you-go cloud models, ensuring that performance gains translate into financial savings.

Cloud providers also offer scalability and flexibility that traditional HPC setups may not. OaaS platforms can use serverless computing or containerised workflows to ensure resources are used efficiently.

This on-demand model can be particularly advantageous for smaller research groups or organisations without access to large, dedicated HPC clusters. They often result in significant long-term savings by reducing the need for in-house expertise and infrastructure maintenance.

To learn more about how OaaS cloud computing can enhance the performance of your real-world life sciences applications, read the white paper from Viridien.

Media Partners