Physicists at CERN are using a novel system to make production runs that integrate their existing pool of distributed computers with dynamic resources in 'science clouds'. The integration was achieved via two mechanisms: the Nimbus Context Broker, developed by computer scientists at the US Department of Energy's Argonne National Laboratory and the University of Chicago, and a portable software environment developed at CERN.
Scientists working on A Large Ion Collider Experiment, also known as the ALICE collaboration, are conducting heavy ion simulations at CERN. They have been developing and debugging compute jobs on a collection of internationally distributed resources, managed by a scheduler called AliEn.
Since researchers can always use additional resources, the question arose: How can one integrate a cloud's dynamically provisioned resources into an existing infrastructure such as the ALICE pool of computers, and still ensure that the various AliEn services have the same deployment-specific information? Artem Harutyunyan, sponsored by the Google Summer of Code to work on the Nimbus project, made this question the focus of his investigation. The first challenge was to develop a virtual machine that would support ALICE production computations.
'Fortunately, the CernVM project had developed a way to provide virtual machines that can be used as a base supporting the production environment for all four experiments at the Large Hadron Collider at CERN – including ALICE,' said Harutyunyan, a graduate student at State Engineering University of Armenia and member of Yerevan Physics Institute ALICE group. 'Otherwise, developing an environment for production physics runs would be a complex and demanding task.'
The CernVM technology was originally started with the intent of supplying portable development environments that scientists could run on their laptops and desktops. A variety of virtual image formats are now supported, including the Xen images used by the Amazon EC2 as well as Science Clouds. The challenge for Harutyunyan was to find a way to deploy these images so that they would dynamically and securely register with the AliEn scheduler and thus join the ALICE resource pool.
Here the Nimbus Context Broker came into play. The broker allows a user to securely provide context-specific information to a virtual machine deployed on remote resources. It places minimal compatibility requirements on the cloud provider and can orchestrate information exchange across many providers.
'Commercial cloud providers such as EC2 allow users to deploy groups of unconnected virtual machines, whereas scientists typically need a ready-to-use cluster whose nodes share a common configuration and security context. The Nimbus Context Broker bridges that gap,' said Kate Keahey, a computer scientist at Argonne and head of the Nimbus project.
Integration of the Nimbus Context Broker with the CernVM technology has proved a success. The new system dynamically deploys a virtual machine on the Nimbus cloud at the University of Chicago, which then joins the ALICE computer pool so that jobs can be scheduled on it. Moreover, with the addition of a queue sensor that deploys and terminates virtual machines based on demand, the researchers can experiment with ways to balance the cost of the additional resources against the need for them as evidenced by jobs in a queue.
According to Keahey, one of the most exciting achievements of the project was the fact that the work was accomplished by integrating cloud computing into the existing mechanisms. 'We didn’t need to change the users’ perception of the system,' Keahey said.