Managing the information derived from the types of data intensive science and engineering simulations run on supercomputers can be difficult, simply due to the amount of data produced. Sandia National Laboratories is using the Data Analytics Supercomputer (DAS), a high performance computing system from LexisNexis, to handle the large amounts of data generated from the complex simulations run at the facility.
'Traditional supercomputing technology allows us to run complex physics applications and visualise detailed simulations,' said Dr Richard Murphy, senior member of technical staff at Sandia National Laboratories. 'However, these systems are not ideal for the informatics challenge of sorting through petabytes of data to find correlations and generate hypotheses. Our tests show that the DAS is a strong platform for helping us address these challenges.'
Speaking to Scientific Computing World, Murphy explained that greater numbers of simulations are being run in increasingly sophisticated ways and scientists want to be able to ask additional questions of the data.
With climate modelling, for example, the researcher will run an ensemble of simulations with slightly different starting points and then try and determine how to get the best value from the results. The initial simulations of a hurricane model determining the economic impact along a particular path, for instance, will generate a number of follow-up questions. Murphy points out that often the questions that follow on from the simulation are of more interest than the answer from the simulation itself.
A further example is running simulations to determine stress points in an engineering application. An ensemble of simulations would be run to locate all of the stress points, but then analysts want to interrogate the data to find why these stress points occurred where and when they did.
In order to allow researchers to do this, advances must be made in numerous areas. 'You're using a supercomputer, which is the biggest computer on the planet, to run the simulation itself,' said Murphy. 'One of the interesting properties of this kind of analysis is that the data doesn't fit into the memory of the machine – in general it's much larger [than the memory can handle].'
The DAS looks for specific patterns and non-obvious relationships to better identify possible outcomes from the Sandia simulations. By accurately managing the massive data sets generated in these operations, the DAS enables traditional systems to more quickly extract relevant data to process scientific calculations in their core memory at high-speed.
One of the areas where Sandia will use the LexisNexis DAS technology is to cut the data set size down to the aspects of interest for further analysis. 'One of the key things we're looking at is how would we construct a machine for this class of problems,' said Murphy. He continued that the computer built for these types of data intensive applications would be very different from the computer used to simulate a traditional physics problem.
'LexisNexis has a piece of technology that they developed in a commercial setting that does look promising for this particular class of problems,' Murphy commented. 'You really don't want to end up in the situation where you're building everything yourself from scratch.' This is largely due to cost and it's beneficial for supercomputer facilities to use whatever hardware the computing community has developed.
Sandia National Laboratories is currently using the DAS to determine its potential for integration into a system of large-scale informatics computing platforms. In addition, Sandia has partnered with LexisNexis on a proposal for an exascale machine, a system that can handle a million trillion calculations per second. The project is part of the Ubiquitous High Performance Computing (UHPC) programme run by the Defense Advanced Research Projects Agency (DARPA) in the US. According to Murphy, technology of a similar standard to LexisNexis' DAS tool will be a key part of future exascale computers: 'LexisNexis technology has the capability to do automatic management of resources, which is critical when talking about systems that may have millions of compute nodes.'