Following the publication of a hefty report on the science and engineering research undertaken on SuperMUC, together with an international award for its performance – both in June this year -- Robert Roe visited the facility to find out what makes SuperMUC ideal for academic research projects.
As we stood on the roof of the Leibniz-Rechenzentrum (LRZ) in the heat of a Bavarian summer’s day, Dr Palm Ludger proudly pointed to the heat exchangers there. Located in the heart of the Garching Research Centre, just north of Munich, the SuperMUC system uses simple ‘free-cooling’ equipment because, even here, the air temperature rarely exceeds 35 degrees Celsius. Ludger remarked that the cost of cooling SuperMUC is only 10 per cent of the overall cost of the energy spent powering the system, compared to nearly 50 per cent in some data centres.
A research chemist who now doubles as public relations representative for the Leibniz-Rechenzentrum, Dr Ludger takes an obvious pride in the system. But it is not just the technical capacity of the machine and its desirable energy efficiency characteristics that are important to him; it is that the machine – and the LRZ itself – are dedicated to making it easier for scientists and engineers to optimise and run their software and thus make the transition to high-performance computing without tears.
‘SuperMUC was designed as a general purpose machine,’ Ludger explained. ‘It does not contain any kind of accelerators: compute nodes consist of Intel x86 Sandy Bridge processors, connected by Infiniband interconnects.’
This simplifies program development and testing, as an application can be developed on any x86 based cluster or desktop and then ported to SuperMUC relatively easily.
The attractiveness of the LRZ approach to academic scientists is demonstrated by the wide variety of disciplines represented among the main areas of research studied using SuperMUC. They range from astrophysics and plasma physics, through earth and environmental sciences, life and material sciences, engineering and computational fluid dynamics, to high energy physics.
Due to the demand from certain areas of research, the LRZ has created dedicated application labs for astrophysics, and earth and life sciences. In these labs, application experts from the LRZ work closely with researchers to optimise and scale the code to be used on SuperMUC, leading to a greater sustained performance on the system. But the research has wider implications than either computer science or academic research, for when applied in the wider world, some of the results from SuperMUC may save lives in situations ranging from volcanic eruptions to the treatment of respiratory disease.
SeisSol, a program designed to simulate earthquakes inside the Merapi volcano on Java Island, Indonesia, could lead to more accurate predictions of volcanic eruptions and thus allow timely evacuation of local people. By optimising the SeisSol code on SuperMUC, developer groups at TUM and LMU were awarded the Prace ISC prize for sustained petascale performance at ISC’14 in June – as reported here – for its work on optimising the code for high-performance computing. Using all 147,456 compute cores of the SuperMUC system, the research team achieved a sustained system performance of 1.09 Petaflops, running the application for more than 3 hours to simulate the Earth’s vibrations.
The virtual lung project aims to develop scientists’ understanding of the lungs by simulating the flow of air through the airways. This will lead to better care for patients undergoing mechanical ventilation, which can lead to overstraining of the tissues due to the unnatural loading. The multi-purpose FE software BACI, developed using C++, was used to create the simulations. The software package includes state-of-the-art solution techniques for nonlinear and linear equations as well as for coupling of several physical fields.
More remote from immediately human concerns is the ‘star formation in extreme conditions’ project. This explores starbursts – the rapid formation of stars under extreme events –and their relation to galaxy formation. As a result of the detail and the complexity of the model that the team was able to run on SuperMUC, the researchers found that the bursts of star formation triggered by galaxy interactions are stronger and last longer than had been previously thought. They expect now to be able to go on to study yet more complex interactions, including the role of feedback from supermassive black holes which form the nuclei of active galaxies and how they may quench star formation activity. This project took over 9 million core hours on 4,096 cores.
The SuperMUC system consists of 155,656 processor cores in 9400 compute nodes with more than 300TB of RAM. The system has 4PB of NAS-based permanent disk storage. It also has 10PB of temporary disk storage which uses the General Parallel File System (GPFS), the high-performance clustered file system developed by IBM. This in turn is complemented by 30PB of archived tape storage.
The system is configured into 18 thin node islands and one fat node island, each Island contains more than 8,192 cores and is connected via an Infiniband network (FDR10 for the thin nodes / QDR for the fat node).
There is also a planned expansion to the system due to be completed in 2015 that will double the power of the SuperMUC system to around 6 petaflops peak performance.
After the tour, Professor Arndt Bode, Director of the LRZ, explained in an interview that the planned upgrade will consist of: ‘70,000 cores that again will have this hierarchical structure. It will be attached to the same Infiniband interconnection system as the phase one system. It will also use the same storage. Now, this is 70,000 cores as compared to 150,000 in the old system and half the number of cores to achieve the same calculation performance.’ Ludger added that this will take up roughly half the space of the phase one SuperMUC system -- a consequence of the increased server density and processor performance that comes with another generation of compute hardware.
The fact that the HPC centre is run efficiently and designed to withstand basic hardware failures is a factor in the recent success in academic research at SuperMUC – highlighted by the recent award from ISC. However the primary reason for this success is the parallelisation of code to run specifically on the SuperMUC system. By partnering with experts from Intel and IBM, the LRZ is able to collaborate to solve the potential bottlenecks relating to a particular application, such as SeisSol, and achieve a greater level of performance than what would typically be possible from such a large cluster.
Bode said: ‘The sustained Petaflops performance obtained in this earthquake simulation could only be achieved through the optimal cooperation of specialists from geophysics, computer science, and from the supercomputing centre as put into practise at LRZ’s Extreme Scaling workshops and through LRZ’s partnership initiatives.’
Ludger emphasised the point: ‘This is a real, everyday research application that has achieved 1 petaflop and I think that we can be very proud of that.’ Bode remarked: ‘This example shows that our system is running in a very stable mode because the results that we have obtained – during night time where nobody observes the system or comes in to repair something because it really runs for hours without the interference of any maintenance.’
Ludger concluded: ‘That’s why we call it a dark centre: because you can switch off the lights and running everything as normal.’ The ability of SuperMUC to run in a stable mode and as a ‘dark centre’ is due to the reliability of the hardware as well as the fact that the code is optimised so that the job does not crash. Hardware failure is a common occurrence in systems that have hundreds of thousands of cores and is seen as a serious obstacle on the road to exascale computing.
The work of the LRZ highlights how attention-grabbing peak performance figures are not always a reliable metric for judging the performance of a HPC system that is used primarily for academic research. Ease of use, ability to port code from smaller or different systems, and sustained performance when running a specific application, are much more useful to scientists than Linpack benchmarks.