The Technical University of Denmark (DTU) has selected an SGI Altix UV 1000 high-performance computing (HPC) system to help researchers mine huge amounts of genomic data from microbes and humans to identify novel genes and proteins that give living organisms their distinct properties. The new solution is being configured with Intel Xeon processor E7 series and 8TB of shared memory. The system will be physically linked to the existing supercomputer installation at DTU’s Centre for Biological Sequence Analysis (CBS), and will have access to the centre’s high-performance disk system, which is approaching a capacity of 1,000TB. The computer has been named Anakyklosis (Greek for recycling), reflecting its impact on a bio-sustainable future.
Metagenomics systems biology is one of the six cornerstones of the new 100 million Euro centre at DTU, and will focus on a treasure hunt for novel enzymes relevant to the biotechnology industry and for biochemical pathways that will be used to engineer novel so-called ‘cell factories’. These will be optimised to produce chemicals from cheap, non-fossil carbon sources, reducing the world’s dependence on oil.
‘In the past, our researchers believed the data in the human genome was daunting, representing massive amounts of data. However, now with metagenomics, the science of investigating entire bacterial communities such as the ones we have in our stomachs and those that exist in the deep seas, it generates far greater amounts of data. Therefore, we need even larger computer resources to make sense of it,’ said senior scientist Nikolaj Blom at the Novo Nordisk Foundation Centre for Biosustainability at DTU.
‘Our current and future data sets are many times larger than the size of the human genome, and our legacy computer solution was limited in how quickly it could process this data simultaneously due to memory bottlenecks,’ added associate professor Thomas Sicheritz-Pontén, who will lead the metagenomics effort in the new Novo Nordisk Foundation centre. ‘The new SGI Altix UV supercomputer system will be able to hold the equivalent of approximately 2,500 human genomes in working memory at the same time. This allows us to process data sets that are impractical to work with today, in particular when we need to integrate them with many other data sets.’