IBM Research Europe and Thieme Chemistry have announced the first results of their collaboration which were evaluated by seven eminent synthetic chemistry experts and their research groups from China, Germany, Switzerland, New Zealand, and the USA.
Professor Dame Margaret Brimble from the University of Auckland, New Zealand comments: ‘This innovative IBM/Thieme Chemistry platform provides an efficient tool for synthetic chemistry researchers to provide validation for their own retrosynthetic plans whilst also being presented with alternative solutions. It enables a rigorous assessment for the retrosynthetic design phase of a given synthesis which no doubt will pay dividends when the selected synthetic plan is implemented.’
The partnership between IBM Research Europe and Thieme Chemistry builds on the synergies between high-quality data and state-of-the-art machine learning models for organic chemistry synthesis predictions. RXN For Chemistry, a cloud platform using artificial intelligence (AI) has recently been trained with high quality, human-curated datasets from Thieme’s Science of Synthesis and Synfacts.
Organic compounds can react with each other in hundreds of thousands different ways. Experiential knowledge is key for organic chemists to avoid spending hours and hours in the laboratory with countless trials and errors. To improve synthesis planning, IBM Research and Thieme Chemistry have combined the expert human-curated datasets from Thieme’s full-text resource for methods in synthetic organic chemistry, Science of Synthesis, and the reviewed content from the journal Synfacts with the artificial intelligence model called Molecular Transformer in RXN for Chemistry by IBM.
The Molecular Transformer, a neural machine translation model, was created to reliably predict the outcome of chemical reactions and was later enhanced to include retrosynthetic analysis – i.e. to first determine the chemicals needed to create a given target molecule. The model has proven to be very successful at learning the information of chemical reactivity present in datasets of chemical reactions. It is, however, limited to the content and correctness of these datasets.
Increased prediction accuracy
Science of Synthesis and Synfacts cover a wide area of reaction space. Typically, models trained on commercially available patent datasets perform poorly on many such reactions. Science of Synthesis and Synfacts have a higher quality of chemical records, reflected by a larger percentage of usable records. This consistency in Thieme’s dataset facilitates the learning process of the AI models, resulting in more consistent predictions: Results show that Thieme-trained models on the RXN for Chemistry platform increase prediction accuracy by a factor of three for forward predictions, and a factor of nine for retrosynthesis.
The collaborative work between Thieme and IBM Research Europe shows the impact high-quality chemical reaction data can have on future AI chemical synthesis tools. Integrating high-quality, curated data from Science of Synthesis and Synfacts provides a unique opportunity to boost the performance of RXN for chemistry to unprecedented levels as it unleashes the entire knowledge contained in hundreds of thousands of chemical reaction records.
Professor Richmond Sarpong from the University of California, Berkeley, USA states: ‘A sustainable future for synthesis will include minimising the number of unproductive strategies that are pursued by running only those reactions that lead to a productive end. This is only possible through the marrying of computer designed and human-designed efforts, which makes this collaboration with IBM and Thieme Chemistry exciting.’
Also involved in testing the retrained models were Professor Alois Fürstner (MPI Mülheim, Germany), Professor Karl Gademann and Professor Cristina Nevado (University of Zurich, Switzerland), Professor Ang Li (Shanghai Institute of Organic Chemistry, China), Professor Dirk Trauner (New York University, USA) and their research groups.