Researchers at Queen Mary, University of London, and STFC Daresbury Laboratory (DL), who are employing HECToR, the UK's national academic supercomputer, to investigate radiation damage in nuclear reactor vessels and radioactive waste containers have been making use of NAG's expertise in HPC software engineering.
With NAG's help, they have been able to build models of larger systems, thereby enabling the exploration of more realistic problems that were hitherto beyond their reach.
Queen Mary researchers Kostya Trachenko and Eva Zarkadoula, together with Ilian Todorov at DL, are running the DL_POLY_4 molecular dynamics code on HECToR to model the disruption of atomic structure in metals caused by high-energy atoms. Disruption at the atomic scale can cause the crystal structure of the solid to become amorphous – that is, having no long-range order – which in turn leads to material damage and a weakening of large-scale components (for example, nuclear vessels) that are fabricated from the material.
Very large systems are required if the models of atomic disruption are to be as realistic as possible; specifically, the researchers wanted to perform simulation runs using hundreds of millions of atoms. In general, the performance of the DL_POLY_4 code scales well as the number of compute processors is increased (HECToR is a 12,000 core Cray XT4), but problems associated with periodically outputting the details of the atomic configurations for subsequent analysis were impacting the code's overall performance and placing a restriction on the system sizes that could be investigated.
The researchers turned to the HPC experts at NAG – who are the providers of the Computational Science and Engineering support service for HECToR – and the DL_POLY_4 authors at DL for help with this problem. Ian Bush at NAG and Ilian Todorov at DL have been collaborating for four years on input and output (I/O) strategies, and they devised a novel method for performing the output of the configuration files. Their method relies on balancing the communication and computation workloads of the configuration output routines (which are tuned to take advantage of the structure of HECToR’s file system) by first sorting the atoms into the appropriate order, and then using a subset of the compute processors to perform the output in parallel. A similar strategy for reading the input files in parallel was also implemented. Incorporating these new methods into DL_POLY_4 resulted in the removal of the I/O bottlenecks, the recovery of the good scaling of the code and a twenty-fold improvement in its performance.
These improvements enabled the researchers to realise their goal of simulating very large systems on HECToR, thereby capturing a larger amount of scientific detail. Following these successes, the researchers are aiming at still larger systems that will better model multiple collision events.