Life science researchers are tasked with progressively complex simulations. This increasing complexity requires considerably more computational power, which is why some scientists are turning to the use of high-performance computing to accelerate their simulations.
This explosion in the number of potential users and research topics is forcing HPC providers to re-think the way they offer HPC to life science researchers. At the same time, researchers are facing unprecedented growth in the data they are handling, forcing many of them to explore new avenues to accelerate life science research.
Jorge Balcells, director of technical services at Verne Global, highlighted the increased demand for HPC: ‘The trend that we are seeing in our customers is simply an exponential increase in the granularity of data that is being processed. That exponential increase directly correlates with an increase in the demand for power.’
The increase in the power requirements when moving from small to large scale computing are fairly obvious, but being able to support a cluster at existing premises can be a difficult challenge, as Balcells explains: ‘You are seeing a big push towards cloud-based HPC; it is an evolution, but a forced evolution because the resources that you need to power a data centre are scarce in large metropolitan areas.’
There are other options for deploying cloud-based HPC. Larger facilities may have the resources to set up their cluster, although this requires considerable technical knowledge. Another option would be to co-locate with other like-minded facilities or collaborators. Again this requires considerable resources, but they can be split between multiple research centres limiting the initial investment.
In 2014, the data centre provider, Infinity, secured a five-year framework agreement with Janet, the UK’s national research and education network, provided by Jisc. The creation of a Jisc data centre to support the requirements for academic research was the first example of its kind in the UK. The initial Jisc partners are University College London (UCL), Kings College London, The Sanger Institute, The Francis Crick Institute, The London School of Economics and Political Science (LSE) and Queen Mary University of London (QMUL).
One of the largest concerns for genomics and the wider life science industries is the amount of storage needed to support research over the long term. One Biomedical research facility, The Scripps Institute (TSRI), recently deployed DDN’s end-to-end data management solutions, including its SFA7700X file storage system using DDN’s WOS object storage archive, to support fast analysis and cost-effective retention of research data produced by cryo-electron microscopy (Cryo-EM).
TSRI uses advanced microscopes, next-gen digital cameras and sophisticated software pipelines to shed light on new treatments for Alzheimer’s, Parkinson’s, Lou Gehrig’s and Huntington’s diseases while identifying new ways to combat HIV, Ebola and Zika. DDN’s storage enables TSRI to harness about 30TB of data generated each week by Cryo-EM while scaling an active archive for widespread collaboration and content distribution.
According to Jean-Christophe Ducom, HPC manager, information technology services at TSRI, Cryo-EM has become the biggest producer of data, yielding four times more output than the institute’s genomics workflows. ‘DDN helps us give scientists what they want—unlimited storage capacity and easy access to data that holds the secret to life-saving discoveries.’
If setting up a cluster seems like a step too far, then there are options for cloud-based HPC that allow researchers to send large-scale batch jobs to remote computing facilities. However, as Balcells highlights, not every data centre is the same, and the cost of power can be considerably different from one location to another.
Verne Global’s data centre is located in Iceland and uses only renewable energy from geothermal and hydroelectrical power plants located in the country. However, the placement of this data centre is not purely for ecological reasons.
‘We chose the site after a lot of due diligence globally. When you are building a data centre facility, you are building a 20-year investment. It is a significant capital investment so it is not just the cost of power today but the cost of power over the long term’ said Balcells.
‘We choose Iceland not only because it is very strategically located between North America and Europe, but also because the sources of power are 100 per cent renewable. We offer 5, 10 or even 15-year contacts on power. Our customers, before they even move in, know the price of power 15 years from now, and you simply cannot do that with a commodity-based market’ said Balcells.
In addition to offering a carefully selected location, Verne also provides services to disaggregate data, further helping users to reduce the cost of cloud computing.
‘Disaggregation has been happening for a long time – disaggregation at a geographical level. What we see now is the disaggregation of data’ said Balcells.
This concept focuses on selecting the most important data or jobs within a given workload and reducing the protection of data that is not as important. ‘A lot of our customers are moving towards what we call the disaggregation of data. They are looking at their data, at the amount and type of workloads they have, and they disaggregate it.’
If an organisation is using the same workflows everyday, then there are more specific services available. One option, specifically designed for genomics workflow is a company called Bluebee. Created in in 2011 by the Delft University of Technology (TU Delft), and Imperial College London Bluebee combines cloud computing and genomics to deliver a genetic analytics service for research and clinical labs.
Bluebee securely connects sequencers to its supercomputing clusters that run in private cloud data centres. The Bluebee Service Connector takes care of encryption, authentication and data transfer. Once the analysis is complete, results can be viewed online, and data can be retrieved in an automated way for use with any interpretation software.
While there are many options available to help scientists, no other tool can match HPC in its potential to accelerate life science research. It is just a case of choosing the right technology and the right deployment to match the requirements of an organisation with the cost of investment required.