As the HPC industry prepares for the next milestone in supercomputing, there is much work to be done preparing hardware and software to meet the challenge of reaching exascale computing.
Beyond this milestone, exascale computing could provide far-reaching benefits such as new computing architectures, more robust programming models and software that can scale beyond today’s largest supercomputers.
Both the European and the US exascale efforts focus on the use of co-design to reach this milestone and, while this does make development more complex, it also means that software and hardware designers work closely to ensure that future designs work in harmony.
John Goodacre, professor of computer architectures at the University of Manchester, commented on the co-design process and what the EuroEXA project hopes to achieve using this approach. ‘The biggest challenge with co-design in any project is working with the differences in development schedules between hardware and software,’ said Goodacre.
The EuroExa project represents the culmination of several EU-funded research projects, ExaNoDe, ExaNeSt, EcoScale and EuroServer that have laid the ground-work for the design of an exascale computing architecture based on Arm processors and FPGA acceleration. The project also includes advanced memory technologies, immersion cooling and debugging, and tools from Allinea.
The project is using this co-design approach to build a balanced architecture for compute and data-intensive applications. It uses cost-efficient, modular-integration enabled by novel inter-die links using a novel processing unit with integration of FPGA for data-flow acceleration.
The team also hopes to provide a homogenised software platform that delivers heterogeneous acceleration with scalable shared memory access and a unique hybrid geographically-addressed, switching and topology interconnect within the rack.
As the architecture for the EuroExa project has been fixed around the use of CPU/FPGA, Goodacre commented that much of the co-design work focuses on balancing resources and ensuring software is ready to make use of this new architecture.
‘Previous collaborations to EuroExa have looked from a hardware perspective at applications at the kernel/miniapp level, and at system level – so the EuroEXA co-design is about sizing and balancing the resources, rather than changing the overall architecture – that holistically evolved along with the apps in the previous years,’ said Goodacre.
‘The way we therefore coordinate this in EuroExa, is to have two pillars to the project, with the application pillar providing early visibility to the required resource balance, and the technology pillar negotiating what is possible in the constraints of technology, budget and schedule.’
Ultimately, the goals for exascale computing projects are focused on delivering and supporting an exascale-class supercomputer, but the benefits have the potential to drive future developments far beyond the small number of potential exascale systems. Projects such as EuroExa and the Exascale Computing Project in the US could have far-reaching benefits for smaller-scale HPC systems.
As Goodacre notes, while the goal of the project is to provide the design for a future exascale system, the team behind the project hopes to deliver a much more significant legacy for future HPC users.
‘It is hoped that the legacy is not just a milestone on the way to exascale,’ stressed Goodacre. ‘The tasks being undertaken include significant work in increase compute and thermal density – required for exascale, but this will be very useful to many markets that have power or environmental constraints.’
One particular area that the EuroEXA project hopes to address is the memory bottleneck that faces many HPC users. By shifting away from the von Neumann architecture, it is hoped that the technologies developed for this project could help the industry to break away from this model.
While the von Neumann architecture has been incredibly important for the development of computing, the architecture has an inherent bottleneck around the shared bus between program and data memory, which limits data transfer between CPU and memory.
Goodacre continued: ‘Likewise, the work we’re doing in shifting mindset from a von Neumann – “what’s the next operation, read the data, store the data” and communicate through the IO unit, is also undergoing significant evolution. Although we’re using FPGA, the goals here are to move forward on a dataflow paradigm for compute, in which we can address the memory bottleneck issues while significantly increasing compute efficiency by allowing data to flow between operands without needing written, and then read back from memory.’
Another positive benefit that could come from the push for exascale computing is the democratisation of HPC technology and technological advances that can be applied to smaller HPC systems. Just as the power consumption and efficiency savings that Goodacre mentioned previously can be applied to smaller systems, so can developments in memory architecture, storage and cooling efficiency.
While not every technology developed for exascale will eventually trickle down into the HPC industry, the ones that do could have benefits that make HPC more efficient and potentially cheaper for the entire industry.
‘Cost is a major barrier to petascale today, and something that’s seldom even discussed for exascale,’ commented Goodacre. ‘EuroExa also has activities in supporting what is being called “silicon-modules”. As you are starting to see in the market, this provides a way to build a big processor by bringing a number of smaller components together. Not only does this help with reducing cost by increasing silicon yield, it also simplifies development and ultimately enables “IP” reuse at the silicon level, enabling vendors to share the cost of mask sets over a larger market.
‘HPC is a small market, and getting the compute density they need, we believe, would be cost-prohibitive without being able to share aspects of the design with other markets.’
Goodacre also mentioned the memory system that is being developed as part of the project: ‘EuroExa also goes a step further in this regard too, with a new memory system architecture designed to enable multiple silicon modules to interact generically, and not only in the manner defined by a single device.’
He also noted several areas where petascale HPC can benefit from the work done to reach exascale. This includes maturing the tools and methodologies for dataflow programming, technology to enable silicon-module reuse, compute density advances (silent cooling), scalable system architecture model (beyond clustering) and application frameworks and OS libraries that leverage this new platform. With all these potential benefits, it is clear that the pursuit of exascale can produce far more than just an exascale supercomputer. However, the road to exascale is long and filled with uncertainty around the development and deployment of new technologies that may not be a commercial success.
‘EuroExa is just a step on the path towards exascale. The fragmentation and uncertainty of “winning” future funding calls is the biggest challenge to deliver exascale. Taking turns in which projects get funded, just extends the overall time to deliver anything,’ stressed Goodacre.
‘Exascale needs ground-breaking innovations to be fundamental to the end results, and this is in conflict with the ongoing business needs of existing suppliers, in terms of risk management. So, whether there will be enough open funding to complete the innovations to a point that the commercialisation risks are small enough to enable adoption, is one of my outstanding questions.’