Analytical data is ubiquitous in the world of chemical and biochemical R&D. Traditionally, digital representations of analytical experiments were used almost solely by laboratory scientists to make decisions about the direction of their work. While analytical data was stored as proof of structure or composition for regulatory purposes, its primary use was to understand materials and processes, and help build control strategies. Laboratory scientists used the information gained from interpreting NMR, LC/MS, FT-IR, and other analytical data to direct next steps.
With the increasing desire to funnel data to artificial intelligence and machine learning (AI/ML) applications, R&D organizations are expanding the remit for the data generated within their four walls. Data scientists are now emerging as a major data consumer for the rich analytical data that is acquired from scientific instruments. We recently reported that while only 6% of surveyed individuals have already implemented the use of analytical data for data science use, 43% are in the process of such implementations.
The Divergent Needs of Lab Scientists and Data Scientists
The interaction and requirements—use cases, required preparation of analytical data, methods used to access the data—differ significantly enough between these two different data consumers that new challenges are being raised in the already difficult area of analytical data accessibility and management. Can these divergent needs be met by currently available technologies?
Laboratory scientists performing and using these experiments require immediate on-demand access to the data as soon as it’s acquired. Ideally, they need access to the chromatograms and/or spectra; not a static image or PDF. Access to the full data set enables them to extract the maximum scientific value for decision-making. They need to be able to view the data in a rich user interface (UI) that is highly interactive, giving them the ability to interrogate the data. They need to be able to not only “zoom in” on the spectra or chromatogram, but also process the data with basic and advanced functions such as peak detection, integration, and combining spectra. Lab scientists also have specific storage requirements as they want to be able to recall the full analytical datafile at a later point time if more information is available for data reprocessing.
Data scientists, on the other hand, want access to the results of fully processed data—the tabulated results and meta data associated with the data file—and not raw chromatograms or spectra. Furthermore, data from a single analytical data file may be less useful to them than data that has been assembled from several related analytical data files, representing a chemical study. Data scientists do not require interfaces to visualize the raw data, rather they need access to tools such as APIs to abstract their desired data content, so that it can be engineered in the specific formats necessary for their applications. Alternatively, they may also utilize specific database views to abstract data. Business Intelligence (BI)/ML/AI applications use large tracts of data to provide insights not easily accessible to human reviewers of that same data. Data scientists, for example, may use the data to understand instrument usage to inform the purchase of new instruments, or to predict or provide guidance on product formulation and stability (long term), so that scientists have a better starting point—shortening the development cycle and ultimately getting the product to market at a faster rate.
Selecting the Right Technology Partner
In order to address the divergent needs of today’s data consumers, informatics providers need expertise in analytical chemistry data and an understanding of how different users interact with the data. It’s not enough to be able to provide the best IT architecture if the end users’ needs are unaccounted for.
Organizations need to work with technology providers that meet the needs of lab scientists and data scientists. They require a unified platform to manage and normalize data from all the major instrument vendors and analytical techniques in raw or processed proprietary data formats. Pulling data into a single informatics environment that is designed for analytical data processing and interpretation means scientists in the lab can access all their data needs in a single UI. Normalization of that data, export to JSON or XML formats, and availability through a well-developed API lends accessibility to data scientists. ACD/Labs has been working with leading R&D organizations to provide solutions and delivery modes that satisfy the needs for all of today’s data consumers, regardless of their needs.