The US Department of Energy (DoE) yesterday announced that Intel and Cray are to build the third machine in its ‘Coral’ programme to lay the foundations for Exascale computing. In fact, the contract, worth $200 million, is to build two machines at the US Argonne National Laboratory: a 180 petaflops system, to be called Aurora, and a secondary system of about 8 petaflops, called Theta.
The joint Collaboration of Oak Ridge, Argonne, and Lawrence Livermore (Coral) was established in early 2014 to coordinate investments in supercomputing technologies, streamline procurement and reduce costs, with the aim of developing supercomputers that will be five to seven times more powerful when fully deployed than today’s fastest systems. However, in yesterday’s announcement, DoE Undersecretary Franklin Orr said he expected that Aurora would be 18 times as powerful as Argonne’s current Mira system, while using only 2.7 times as much energy. It is expected to be operational in 2018.
The award to Intel and its subcontractor Cray comes five months after the announcement of the first two machines in the programme which are to be built at a cost of $325 million at the Department of Energy’s Oak Ridge and at its Lawrence Livermore National Laboratory. Both these systems will make use of IBM Power Architecture, NVIDIA’s Volta GPU and Mellanox’s interconnect technologies.
Technical details are much sketchier in the Argonne announcement. It is not clear what Intel processor will be employed and the Shasta architecture that Cray will deploy is embryonic. According to a blog post by Cray’s CTO Steve Scott: ‘We don’t normally talk about a new system architecture this far in advance.’ Shasta is intended to be the successor to both the Cray XC line of supercomputers (previously code-named ‘Cascade’) and the CS line of more commodity-based clusters.
Scott was keen to focus on the wider commercial applications of Shasta, describing it as ‘the most flexible system we’ve ever designed. Yes, this will be the infrastructure that takes us all the way to exascale computing, but it’s really designed to provide cost-effective solutions in standard data-centers, starting at quite modest system sizes.’
His comments echo those made five months ago when both IBM and Nvidia stressed that the full significance of that procurement decision lay elsewhere than in the niche application of supercomputing but rather that it opened up the world of enterprise computing to new technologies that will help master the swelling volume to data that commercial companies have to cope with – both in engineering and in business intelligence. Both Sumit Gupta, general manager of accelerated computing at Nvidia, and David Turek, vice president of technical computing OpenPower at IBM, stressed the importance of the design chosen for Oak Ridge and Livermore not just for scaling up to ever faster and more powerful machines, but also for ‘scaling down’, so to speak. Turek maintained that he had always been slightly sceptical of the line of argument that Exascale would inevitably deliver cheap petascale computing. The Coral project was designed to be a one-node construct and economies of scale ‘in both directions’ were built in from the outset, he said. ‘We didn’t want to have to say to customers: “You have to buy a rack of this stuff”.’
Argonne’s second system, to be called ‘Theta’, is a more conventional Cray XC series supercomputer to be delivered in 2016. Cray also has options to provide next-generation, high-performance parallel storage systems for both the Theta and Aurora systems.
Cray is acting as subcontractor to Intel Federal, a wholly-owned subsidiary of Intel, that will be responsible for the delivery of the two systems. Intel Federal was set up in August 2011: ‘to provide strategic focus in order to better address new opportunities in working with the US government. Initially Intel Federal will focus on the High Performance Computing segment, including work on Exascale computing with the US Department of Energy and other agencies.’
In some ways, yesterday’s announcement is history repeating itself. Nearly 20 years ago, in late 1996, Intel built and installed ASCI Red at the US Sandia National Laboratories. It was the first computer built under the then US Accelerated Strategic Computing Initiative (ASCI), created to help the maintenance of the United States nuclear arsenal after the 1992 moratorium on nuclear testing. Based on the Intel Paragon computer, it achieved a reputation for reliability and was the first supercomputer to achieve a performance of more than one teraflops.
That Intel would be in pole position for the Argonne machine was widely expected. Once IBM, Nvidia and Mellanox had come together to form the consortium based on the Power processor and then their technology was chosen for both Oak Ridge and Lawrence Livermore, there really was no other candidate. Intel was the only company capable of making the processors and overseeing the creation of the machine itself.
Procurement contracts are a long-standing and highly successful method by which the US Government provides 'hidden' subsidies for its high-tech industry, as discussed in Scientific Computing World in July 2014. It is a reflection of just how challenging it will be to get to Exascale that even a nation as advanced as the USA can field only two consortia capable of developing the technology to get there.