Navigating the integration of machine learning (ML) and artificial intelligence (AI) in pharmaceutical R&D involves overcoming data management, quality, and expertise challenges to unlock their full potential in drug discovery, reveals Richard Lee, Director, Core Technology and Capabilities, ACD/Labs
Pharmaceutical companies are under constant pressure to innovate and bring new drugs to market efficiently and cost-effectively. However, the drug discovery and development process is complex with challenges that can significantly slow progress. One of the most promising avenues to address these challenges is the integration of machine learning (ML) and artificial intelligence (AI) into research and development (R&D). Despite the potential, implementing these technologies is not without its hurdles.
Bridging the gap between raw data and ML/AI applications
One of the most significant challenges pharmaceutical companies face when implementing ML and AI in R&D is managing the sheer volume and diversity of data generated by modern scientific instrumentation. Drug discovery involves complex datasets from techniques such as liquid chromatography, mass spectrometry, and nuclear magnetic resonance (NMR) spectroscopy. This data must be efficiently captured, organized, and interpreted before it can be used in ML/AI models.
Data heterogeneity, assembly, and quality
A key issue within this challenge is data heterogeneity. Data generated by different instruments and experiments often come in various proprietary formats across vendors. Integrating these disparate datasets into a coherent format usable by ML/AI models requires significant preprocessing. This preprocessing can involve normalization, standardization, and the translation of data into a common format, which is not only time-consuming but also prone to errors if not handled meticulously.
The integration and assembly of data represents a significant challenge within the pharmaceutical industry. Analytical data, in isolation, is insufficient to provide the full context of a chemical experiment. It is frequently the aggregation of analytical data, coupled with comprehensive experimental details, that is necessary to present a complete and coherent understanding of a chemical study.
Moreover, the quality of data is another critical concern. ML and AI models are only as good as the data used to create the model. Data collected from experiments can contain missing metadata, or outliers, all of which can skew the results of ML/AI models if not addressed. Ensuring data quality through rigorous validation, cleaning, and curation processes is essential to build reliable models. However, this task is often resource-intensive and requires expertise in both domain knowledge and data science.
Data accessibility and integration
Once data is standardized and cleaned, another challenge arises in making it accessible and integrable across different systems. Pharmaceutical companies often operate with legacy systems and siloed data repositories, making it difficult to create a unified data environment. The integration of structured data from various sources, such as experimental data, is essential for training comprehensive ML models.
Lowering the barrier to ML/AI integration
Even when pharmaceutical companies have structured data, the next challenge lies in the expertise required to develop and implement ML/AI models. Many organizations lack the specialized skills needed to create ML/AI models. This skills gap can be a significant barrier to adopting advanced technologies, as it necessitates either building a specialized team or relying on external consultants, both of which can be costly and time-consuming.
ACD/Labs and ML/AI in pharmaceutical R&D
The challenges pharmaceutical companies face in implementing ML and AI are significant, but not insurmountable. ACD/Labs’ provides technologies that lay the foundation for data to be accessible by ML/AI applications. The Spectrus platform allows an organization to standardize and assemble analytical data with chemical context. This can be done through automation services, including data marshalling, format standardization, data processing, and data assembly. The Spectrus platform can also integrate with other informatics systems in IT ecosystems through its extensive APIs.
Moreover, ACD/Labs has been providing predictive analytical modules such as NMR spectral and physiochemical property prediction based on ML. These modules are considered the gold standard in the chemical informatics industry. Recently, ACD/Labs’ Katalyst D2D, the high throughput chemistry application, has been integrated with open source ML module Experimental Design via Bayesian Optimization (EDBO) to enhance and accelerate screening experiments. EDBO is a powerful algorithm that optimizes chemical reactions by iteratively suggesting new conditions based on prior experimental results. By embedding this ML capability directly into Katalyst, ACD/Labs lowers the barrier to entry for pharmaceutical companies, enabling them to leverage AI-driven optimization without needing deep expertise in machine learning.
In support of other ML/AI frameworks and platforms, ACD/Labs’ has adopted a collaborative approach by partnering with leading ML/AI companies such as Atinary. Atinary specializes in AI-driven experimental design and optimization, and their collaboration with ACD/Labs brings additional AI methodologies into ACD/Labs’ software suite. This partnership enables ACD/Labs to offer pharmaceutical companies more comprehensive and sophisticated ML/AI solutions, ensuring that these technologies are seamlessly integrated into existing R&D workflows.
By providing out-of-the-box solutions like the EDBO-enhanced Katalyst and collaborating with AI leaders like Atinary, ACD/Labs is empowering pharmaceutical companies to overcome the hurdles of ML/AI implementation. This approach not only accelerates drug discovery and development but also drives innovation and efficiency across the R&D process.