Rogue Wave Software has announced that TotalView has debugged a parallel job running on 786,432 processor cores. The tests were conducted on Lawrence Livermore National Laboratory’s (LLNL) Sequoia, its IBM Blue Gene/Q supercomputer.
Rogue Wave’s scalability initiative, which is a partnership with LLNL and LLNL's Tri-Lab partners (Los Alamos National Laboratory and Sandia National Laboratory), features a multi-architecture approach, targeting the Blue Gene/Q platform, along with x86-based architectures, such as the Cray XE. Extreme-scale testing allows TotalView engineers to identify bottlenecks and prioritise efforts in optimising and tuning the debugging engine for scalability. During the most recent testing session, TotalView successfully scaled across 786,432 cores with no indication of the debugger hitting any barriers, suggesting that it could have used more of Sequoia's 1.5 million cores if additional compute nodes had been available.
Rogue Wave conducted this test using a hybrid MPI + OpenMP code that implements a method for solving a system of linear equations. This application, which makes use of both MPI for distributed memory multi-process parallelism and OpenMP for shared memory thread based parallelism, was selected because it shares important characteristics with many applications used on extreme scale systems, such as Sequoia. This kind of attention to the workloads of large-scale systems is another key aspect of scalability requirements.