The discovery of a new element is so infrequent these days that such an event warrants international news coverage. The opposite is the case for the discovery of new proteins, ligands, materials or other molecules; in fact, databases such as the Worldwide Protein Data Bank (www.rcsb.org) have been set up to help the experts keep up with frequent new discoveries.
These discoveries are coming at an ever faster pace thanks in large part to more powerful computers and software. Computational chemistry software now plays a major role in many areas including drug discovery and in searching for new material properties. Not only are laboratory efforts expensive and time consuming, software can do things that are beyond the capabilities of what’s possible in the laboratory. So says Professor Kim Baldridge, who heads the Theoretical Chemistry and Computational Grid Applications Section of the Organic Chemistry Institute at the University of Zurich, and who has an excellent background to discuss this topic: a PhD in theoretical chemistry and years of experience as the Director of Integrative Computational Sciences at the San Diego Supercomputer Center.
Explaining what’s in nature
Baldridge explains that in the lab you can see what is happening, but not necessarily know why it happens. Molecular modelling software, on the quantum mechanical level, fills in the gaps that you cannot determine in the lab or that might be fuzzy. It’s possible to study details about molecular structure (where various nuclear particles are located and the links between them), dynamics/energetics (how fast a molecule flips from one form to another), reaction mechanisms and barriers to reaction as well as many other aspects such as fluorescent/spectroscopic properties (what happens when you excite the molecules).
In many cases in drug discovery, for example, researchers want to know how to enhance a compound’s properties or turn them on or off. With investigation, they may be able to do so by changing the molecular structure in a clever way or by adding additional substances in the proper amount. However, there are often far too many possibilities to reach that goal to feasibly consider in the laboratory.
With software, researchers can examine a judicious spectrum of these almost countless possibilities in a reasonable amount of time. The result is that the researcher can go to a chemist and say, ‘I bet you could make this structure and it would have these properties…’
Baldridge relates one study she worked on concerning a compound found in mushrooms with anti-tumour properties. In nature, it turned out to be too aggressive; it was toxic to many more cells than just tumour cells. Using computational experiments, researchers were able to suggest to chemists how to make the natural compound less reactive yet more selective.
Teasing this kind of information out of the laboratory would be very difficult and expensive.
Changing the sacred benzene ring
Baldridge, along with her experimental counterpart, Professor Jay Siegel, who is Director of the Organic Chemistry Institute at the University of Zurich, also points to a project of which they are quite proud that gained international attention.
It deals with the simple benzene ring. Even in high school chemistry you learn that the electrons are distributed evenly around the symmetric six-membered ring. Using computational simulations, Baldridge showed that it would be possible to create a modified benzene ring where the electrons concentrated alternately in three positions of the six around the benzene ring, thereby changing its special aromatic properties and associated reactivity; this structure is known as a cyclohexatriene molecule.
This discovery has inspired chemists to implement similar designs in other known structures to also change their structure, properties and reactivity.
Conversely, experimentalists in a lab might create a new molecule, and then subsequently turn to modelling software to help them prove that what they created is indeed what they thought they created.
This kind of work can also be looked at from a more philosophical point of view. Siegel makes an interesting observation: Ever since the days of the alchemists, we have felt that mankind can make new matter. In fact, one special nature of chemistry is that it defines its own object of study through chemical synthesis.
Whereas physicists cannot make a new star to study, there are worlds of molecules we can imagine and create at will. Until recently this has been unique to chemistry, but other sciences are moving in this direction. Physicists now use supercolliders to get new particles of matter, and molecular genetics uses synthetic biology.
An issue to consider, though, adds Baldridge, is that reallife molecules of interest are often beyond what we can computationally study. Thus there are a very large number of scientific chemistry codes, and each makes a trade-off between accuracy and the size of the system it can handle.
Molecular mechanics and dynamics modelling
To do the type of molecular modelling that Baldridge works on, it is necessary to use a class of codes called molecular mechanics, which are based on simple expressions for the different types of motion in a molecule, for example, stretching of a bond or bending of an angle. Using these, one can examine the motion, forces and interactions among atomic particles.
This type of software, however, is typically inadequate for researchers such as Dr Kerstin Möhle, also of the University of Zurich, who want to look not at static structures and properties but instead the dynamical motion of large molecular structures. For this they use molecular dynamics software to examine processes such as protein folding or docking or to see if small molecules can bind onto proteins.
These simulations run at a far different time scale: from milliseconds to hours of real time. Even if it were possible to do such studies with molecular mechanics software, it would take literally decades to do so.
Another insight comes from Dr Olaf Zimmermann, who works in the Simulation Laboratory for Biology within the Computational Science Division at the Jülich Supercomputing Centre. He notes that there is an increasing competition between the physics-based methods just described and knowledge-based methods.
As mentioned, the former use basic principles to study how a system works, and with better CPUs we can now start to examine real-world problems. The other method, which he notes is still currently the most effective method in use, uses a database approach. You start with a known molecule and look for the nearest neighbours in the database as a starting point for investigations rather than starting from basic principles.
Multitudes of codes
There are hundreds of software packages, both commercial and academic, which address every phase of molecular modelling and drug discovery. Many of these programs are very specialised for a particular task. In contrast, commercial software is often far more extensive in scope.
In some cases, though, there’s not much difference between the capabilities of what you buy and what you get for free. Baldridge has tried both. In graduate school, she worked on the core code for a quantum chemistry program that solves the Schrödinger equation, a program based on the methodology for which John Pople won the 1998 Nobel Prize in Chemistry. The public-domain version of the code is called GAMESS (General Atomic Molecular Electronic Structure System), and it is maintained by Mark Gordon’s Quantum Theory Group Ames Laboratory/Iowa State University. Other similar large-scale software codes, however, have been commercialised, and in general they have a different policy for distribution and development. Baldridge adds that these classes of quantum chemistry codes have a basic level of functionality that is essentially the same, but then offer speciality features that the group of developers has worked on during the years.
If you don’t want to worry about installation or development yourself and professional user support is important to you, then opt for a commercial product.
Molecular dynamics (MD) simulations are very time consuming; even simulations of a modest size, such as 5,000 atoms, require hours or days.
One way to address this issue is the use of parallel computing. And while some MD software has been modified to run on parallel machines, the NAMD (NAnoscale Molecular Dynamics) package – developed jointly by the Theoretical and Computational Biophysics Group and the University of Illinois at Urbana-Champaign – was written from the ground up for parallel operation.
One application for NAMD addresses the fact that many biological processes, ranging from the production of biofuels to cleaning up toxic organic waste, are controlled by proteins in the cell membrane. Large-scale gating motions, occurring on a relatively slow time scale, are essential for the function of many important membrane proteins such as transporters and channels. Voltage-activated ion channels are literally electric switches that are turned ‘on’ by a change in the cellular potential. Malfunction of those channels can lead to cardiac arrhythmia and neurological pathologies.
Researchers are modelling the molecular function of a voltage-gated potassium ion channel at the Argonne National Laboratory and Oak Ridge National Laboratory (ORNL) to understand how membrane-associated molecular protein-machines are able to carry out their functions. In total, there are more than 350,000 atoms in the system.
The simulations were generated using NAMD software on the Cray X-T (Jaguar) at Oak Ridge National Laboratory and the BG/P at the Argonne Leadership Computing Facility. The results of these simulations open up the possibility of better-designed therapeutic drugs as well as the construction of artificial biomimetic nanoswitches.
Commercial packages are also increasingly going towards parallel operation. One package mentioned often by the researchers interviewed for this article is MOE (Molecular Operating Environment), a drug-discovery software platform from the Chemical Computing Group. It integrates visualisation, simulations and methodology development.
CCG’s informatics platform is PSILO, which is used for macromolecular structure registration, version control and web-based searching. A standard part of MOE is the MOE/smp distributed computing technology with which multiple cooperating computers can perform large-scale calculations. A heterogeneous collection of computers including laptops, workstations and multi-processor clusters, all running different operating systems, can be harnessed together in one MOE session.
Another frequently mentioned company is Schrödinger, which develops chemical simulation software with products ranging from general molecular modelling programs to a full-featured suite of drug design software. Its Maestro is the unified interface for all of these products. For model generation, it supports many common file formats for structural input as well as a building tool for constructing molecular models of any type.
It also includes many viewing options to handle everything from small molecules to large biomolecular complexes, and its rendering and stereographic capabilities allow researchers to view complex molecular systems, such as 3D objects.
Schrödinger and Cycle Computing recently announced that they will offer cloud computing solutions to run Schrödinger’s chemical simulation and molecular modelling software on elastic resources so users have timely access to computational resources as needed without prohibitive upfront capital investment in, nor the burden of administering and maintaining, large computer clusters.
Be careful when analysing results
No matter what the software, Baldridge warns that great care must be taken with its use, especially adopting the idea that these tools can be used in a black-box style where users have little knowledge of the underlying equations or how they are being applied.
‘Scientific rigour is becoming diluted; sometimes it is almost a missing component. It is very easy to be swayed by computational answers that are not physically correct, and so keeping your intuition strong and a close eye on experiments are important components to any computational modelling.’