The Jaguar supercomputer, a Cray XK6 located at the Oak Ridge Leadership Computing Facility (OLCF) in the US, was already among the world’s fastest systems before developers began a project to add graphics processing units (GPUs) to increase its speed. Investigators who use OLCF’s supercomputer, which was renamed Titan upon completion of the project, span the spectrum of scientific inquiry from alternative energy, astrophysics and climate to new materials, nuclear physics and combustion.
A critical component in the project was the choice of debugger. Across thousands of cores, the traditional method of doing printfs to locate problem-code becomes intractable and so Allinea DDT was selected to enable developers to quickly pinpoint any failures, as the solution offers a single view of every process in a parallel job, along with exactly what line of code is being executed.
‘Allinea DDT is tightly integrated into the Cray programming environment. We worked with Allinea Software to ensure that,’ commented Joshua S. Ladd, Tools Project technical officer during the OLCF3 Project. ‘All you really need to do is load the Allinea DDT module and type “ddt” on the command line to fire up the GUI, and you’re ready to go. And the GUI is just point and click with a mouse.’ He adds that he was able to fire up the solution on 130,000-plus cores in less than 30 seconds.
The program was also used to debug an open source implementation of the Message Passing Interface (MPI) middleware. The work was at a very large scale: a half-million lines of code running on 100,000 to 225,000 cores. Debugging also becomes difficult when code has errors but still runs. To address this problem, Allinea Software is collaborating with VisIt – open source software used to visualise large scientific data sets. A visual inspection enables researchers to look at a picture of the data, click on different cells, and inspect the process generating the data.
‘So let’s say the output is a video of a star exploding,’ said Ladd. ‘As that star explodes, if there are all kinds of weird asymmetries, you probably have some bug in your math. With a visualised debugging tool, if it doesn’t look like you expected, you go through the process to determine if you’ve got a bug in your code, or if you’ve discovered something new.’