Skip to main content

Cloud services help researchers mitigate the risk of adopting HPC

Cloud HPC

Credit: Sashkin/Shutterstock

The growing demand for advanced computing resources, such as High-Performance Computing (HPC), or large-scale AI workloads, has driven organisations to explore new ways to adopt and integrate these technologies.

HPC is essential for solving complex computational problems across industries. Much of this growth is driven by the adoption of AI, increasingly large data sets, and experiments' growing complexity.

However, the traditional deployment of HPC infrastructure involves significant costs, complex system integrations, and long implementation times—making it inaccessible for many businesses.

Hyperion Research recently announced the results of its recently completed AI in the Cloud study, entitled, Cloud-based AI Activity for HPC: Widespread but Primarily Exploratory. The survey, conducted in July 2024, collected input from 105 survey respondents who indicated current or planned use within the next 12-18 months of AI on public cloud-based resources to support HPC or compute-intensive activities. 

Hyperion's lead analyst on the report, Tom Sorensen, stated: “Advanced computing users and organisations conducting compute-intensive workloads are currently and increasingly leveraging public cloud resources for AI endeavours.”

“Cloud service providers (CSPs) have the advantage of circumventing long on-premises buying cycles, more streamlined installation into existing workloads, and a more composable way of designing compute infrastructure compared with on-premises counterparts. This agility within the cloud allows for CSPs to offer users more varied and up-to-date solutions while on-premises resources must follow a different, often lengthier path to utilisation.”

Earlier this month, IBM announced that it was increasing the capacity of HPC cloud, adding NVIDIA H100 Tensor Core GPU instances to its cloud services. Rohit Badlaney, General Manager at IBM Cloud, noted: “To help clients embrace generative AI, IBM is extending its high-performance computing (HPC) offerings, giving enterprises more power and versatility to carry out research, innovation and business transformation.”

Talking about the NvidiaH100, Badlaney said: “It has the potential to give IBM Cloud customers a range of processing capabilities while also addressing the cost of enterprise-wide AI tuning and inferencing. Businesses can start small, training small-scale models, fine-tuning models, or deploying applications like chatbots, natural language search, and using forecasting tools using NVIDIA L40S and L4 Tensor Core GPUs.”

A barrier to entry and innovation

The initial investment required to use HPC is a significant barrier to entry, so cloud services have emerged as a solution that provides organisations with the flexibility and scalability needed to deploy HPC without the associated capital expenditures and technical barriers.

One of the primary concerns for researchers using cloud computing is the cost, which can spiral over time if left unchecked. Another hidden cost lies in the complexity of managing cloud services. Researchers might initially not understand how to efficiently allocate or manage the resource provisioning of their cloud infrastructure. This can lead to overprovisioning, where more computational resources are allocated than necessary, or underprovisioning, which can cause delays and inefficiencies.

The outcome-as-a-service (OaaS) deployment model significantly reduces the risks of adopting advanced computing resources. This model allows companies to focus on achieving specific results rather than managing infrastructure and operational processes, thereby minimising the hurdles traditionally linked to advanced computing adoption.

Cloud HPC enables complex research

Chris Thorpe, an ARISE Fellow at the European Bioinformatics Institute who works on cutting-edge research in immunology and computational biology, focuses on using AI and deep learning models such as AlphaFold to understand T cells and their interactions with pathogens.

Thorpe’s research aims to contribute towards developing safer, more effective treatments, such as personalised cancer therapies and advanced vaccines, which work in a targeted way with the immune system, thus reducing the side effects seen with current, less targeted therapies.

Thorpe has been working towards overcoming some of these limitations by using Viridien’s Cloud, which provides access to powerful computing resources, specifically NVIDIA H100 Tensor Core GPUs. This allows Thorpe to work with much larger datasets and more complex models, which would otherwise exceed the memory limits of desktop-class GPUs.

“We need to ensure we are featuring co-evolution by feeding the system with relevant multiple sequence alignment (MSA) data,” says Thorpe. “However, the size of the MSA can cause issues with the GPU running out of memory."

Thorpe aims to refine the use of highly customised multiple sequence alignments (MSAs), a key component in AlphaFold’s predictions. His goal is to make predictions more efficient and reliable on commodity hardware. By optimising these systems, Thorpe and his collaborators' advanced methodology could become more accessible to immunology labs worldwide, many lacking the resources for high-powered computational tools. This would democratise the technology, broadening its impact on immunotherapy and vaccine development.

“Thanks to Viridien’s optimised HPC infrastructure and the H100 processor speed, which is more than five-fold greater than those I used during prototype phases, I’ve reduced prediction time from 10-15 minutes to just 2-3 minutes, enabling faster, more accurate insights,” says Thorpe. “Looking ahead, I believe we can develop custom MSAs to a point where predictions are still accurate yet will run reliably on commodity hardware. In turn, this will democratise methodology for immunology labs, most of which don’t have access to high-end GPUs.”

As organisations continue to explore advanced computing resources like HPC, the cloud-based Outcome-as-a-Service (OaaS) model presents a compelling option to mitigate risk and maximise value. OaaS enables businesses to leverage HPC and AI computing power without the financial, operational, and technological risks associated with traditional HPC deployments by shifting the focus from managing infrastructure to achieving specific outcomes.

To learn more about OaaS HPC for life sciences, read the latest white paper from Viridien

Media Partners