As a major gateway for scientific discovery, the Argonne Leadership Computing Facility (ALCF) at Argonne National Laboratory partners with the world’s best computational scientists to advance research in a diverse span of scientific domains, ranging from chemistry, applied mathematics and materials science to engineering, physics and life sciences. Our vision is to serve as the forefront computational centre for extending science frontiers by solving problems that require the most innovative approaches and the largest-scale systems.
The ALCF is a US Department of Energy (DOE) national leadership-class computing facility sponsored by DOE’s Office of Science. Our facility houses the powerful IBM Blue Gene/P supercomputer named Intrepid, which debuted in June 2008 as the world’s fastest computer for open science and third fastest overall. Intrepid offers a peak speed of 557 Teraflops (TF) and a Linpack speed of 450 TF. It has 40,960 nodes, each with four processors or cores for a total of 163,840 cores and 80 terabytes of memory.
Delivering more science per watt
With Intrepid, we’ve designed a green system and supporting data centre that can be managed efficiently from the ground up. Blue Gene/P features a low-power, system-on-a-chip (SOC) architecture and communications fabric that enables science applications to efficiently scale to the highest performance. By increasing the system’s parallelism and using more power-efficient voltages and clock speeds, Intrepid permits scientists to explore the universe with computation using but a trickle of electricity compared to alternative architectures. The Blue Gene/P uses about one-third as much electricity as a machine of comparable size built with more conventional parts. Of general purpose, homogeneous architecture supercomputers, the Blue Gene/P is the most power efficient. And by using the Chicago area’s cold winters to chill our water for free, we save millions of dollars a year in electrical power compared to other similarly-sized supercomputer centres.
The supercomputer’s data systems consist of 640 I/O nodes that connect to 16 storage area networks (SANs), which control 7,680 disk drives with a total capacity of 7.6 petabytes of raw storage and a maximum aggregate transfer speed of 88 gigabytes per second. We use two parallel file systems – PVFS and GPFS – to manage the storage. An HPSS automated tape storage system provides archival storage.
Data analytics and visualisation are handled through one of the world’s largest installations of Nvidia Quadro Plex S4 external graphics processing units (GPU). Nicknamed Eureka, our visualisation supercomputer is also very power efficient, and allows researchers to explore and visualise the torrents of data they produce with Intrepid at the ALCF. The installation provides more than 111 TF and more than 3.2 terabytes of RAM (five per cent of Intrepid’s RAM).
These three systems – the computational engine, the data storage, and the analytics system – are all key components in the scientist’s workflow. We are working to improve the software environment so that scientists can be more productive and energy-efficient so that they can achieve more ‘science per watt’. We believe that the ALCF is one of the most productive and power-efficient computational science centres in the world.
2009 research projects
DOE selects major ALCF projects through the Innovative and Novel Computational Impact on Theory and Experiment (Incite) programme. This programme seeks computationally-intensive, large-scale research projects from industry, academia, and government research facilities that can make high-impact scientific advances through large allocations of computer time, resources and data storage.
In 2009, based on their potential for breakthroughs in science and engineering research, 28 projects were awarded 400 million hours of computing time at the ALCF. The awards are part of an overall group of 66 scientific projects competitively selected through the DOE Incite programme.
Some of this year’s intriguing research endeavours at the ALCF include investigating the circulation of water in the sea for storing CO2, and using computer simulations to conduct cerebral blood flow experiments to study its role in understanding, diagnosing, and treating cardiovascular disease.
Other projects cover broad research in:
- Energy, including advanced systems for fusion energy and nuclear power, and improving combustion to increase efficiency;
- Biology, such as studying the causes of Parkinson’s disease and simulating electrical activity in the heart;
- Climate change, including improving climate models, studying the effects of turbulence in oceans, and simulating clouds; and
- Astrophysics, such as modelling supernova explosions and simulating black holes.
User support
The ALCF offers scientists more than open access to the Blue Gene/P’s capabilities. Our staff provide ongoing, in-depth expertise and assistance in using the system and optimising user applications. Both the ALCF’s Catalyst and Applications Performance Engineering and Data Analytics (APEDA) teams support the users’ projects.
Our Catalyst team establishes strategic collaborations with the ALCF’s leading project partners to maximise benefits from the use of ALCF resources. The team provides full project lifecycle assistance, value-added services and support in conjunction with ALCF hardware and software resources, tailored services for unique requirements of a given research initiative and close contact with research teams through ongoing interactions with an assigned ALCF project coordinator.
Our APEDA team helps ALCF users achieve the best performance in their applications. Team members work closely with users in porting, tuning and parallelising their applications on the Blue Gene/P. They also address I/O and data analytics issues that inhibit performance.
Looking ahead
I believe that a number of key issues and trends in high performance computing will impact the delivery of breakthrough science and engineering in the future.
For instance, modelling and simulation are playing a greater role in all areas of scientific discipline – from understanding the molecular processes in cells to designing next-generation batteries for hybrid vehicles. For many disciplines, computation is not only the fastest and most cost-effective tool for discovery, it is the only one. For example, understanding the evolution of our galaxy, the future climate of the planet or the spread of influenza through populations is difficult without supercomputing.
We’re also working with the international community to improve the software supporting science on the next generation of extreme-scale platforms. The International Exascale Software Project has launched a series of workshops to explore how the community can better integrate and develop the open source software components that run the world’s fastest computers. But the future of software is only half of the story. In partnership with Lawrence Livermore National Laboratory and IBM, we’re designing the next-generation supercomputer platform, which will be many times more powerful, yet more power-efficient, than the current generation of systems.
Finally, while we have had great success with building a world-class computational facility, the ALCF can’t deliver scientific discoveries without top computational scientists. Bringing these scientists on board is crucial to addressing scientific problems quickly and comprehensively. For example, by improving code, they significantly boosted performance by 15 per cent in one research project. As a result, we rely on the nation’s continuing investment in science and technology educational programmes. They are instrumental to our success in recruiting and retaining the most highly-qualified computational scientists.
As we lay the foundation for the ALCF’s progression to exascale computing, we will relocate to a new Theory and Computing Sciences building located on Argonne’s site. Scheduled to be completed later this year, the 200,000 ft2 building has been designed to accommodate our next-generation supercomputers. Through this state-of-the-art facility and key partnerships, we’ll advance computer science and key application fields both now and in the future.
The ALCF is a leadership-class computing facility that enables the research and development community to make innovative and high-impact science and engineering breakthroughs. Argonne operates the ALCF for the DOE Office of Science, as part of the larger DOE Leadership Computing Facility strategy that is organised by the DOE Office of Advanced Scientific Computing Research. DOE leads the world in providing the most capable civilian supercomputers for science.