Data silos – where data is either created in a bespoke format or is manipulated in such a way that it is only usable by those employing it for a specific purpose – prevent efficient collaboration along drug discovery and process development. They can occur at any time, but most commonly when using commercial tools for data storage.
John McGonigle, Director of Bioinformatics at Insmed, says, “I’ve often hit data silos when it comes to using commercial tools for data storage; this is because there is usually little incentive for vendors to ensure their data solutions communicate with one another.”
Lukas Kürten, Digital Innovation Manager at CPI, says it’s not just digital tools that create isolated data. “We had an example of a colleague wanting to buy an analytical instrument for the laboratory,” he says. “It was apparently top notch for the biological sampling it did, but didn’t even have OPC (Open Platform Communication) connectivity and would only run on a Windows 7 PC that our IT team would never allow to be installed. There is definitely an innovation gap here, where equipment suppliers think only about the technical innovation in the instruments they build, without thinking about the digital side and how that integrates with data science.”
Moritz von Stosch, Chief Innovation Officer at Datahow, suggests biopharma companies should make a stand when it comes to interoperability of instrumentation. “When I was still working at a biopharma company, one of the things that we made sure of was that, whenever we purchased new equipment, it supported a standard data communication protocol. If large customers begin insisting on this, then vendors will have a financial incentive to adhere to these standards.”
Kevin Back, Product Manager at Cambridge Crystallographic Data Centre (CCDC), adds: “Some data silos exist, not because people aren’t able to access that data, but because they don’t know how to get to it or even that it exists in the first place.”
So, how does one get round the issue of data silos? “The way we’ve resolved this is to have an API layer that connects the external platform to the internal one,” says McGonigle .”Even with an API, maintaining consistency among different data platforms is a real challenge in data science. If you get it right, you can represent the data in such a way that your tools have access to it, but you don’t need to maintain and update the data source, keeping it all connected through the use of API layers and meshes.”
“There are tools that can link databases together,” says CCDC’s Back. “Here, you can look up a sample ID or a compound identifier and see all the data that’s connected with it. It allows users to find out everything they can know about that compound without having to reach out to colleagues or go hunting through lab notebooks.”
Darren Green, Director at DesignPlus Cheminformatics Consultancy, says departments need to work together from as early a stage as possible: “You really need a partnership between the people that generate particularly complex data and the people that want to reuse it. Getting out of a silo is much easier when you treat it as a joint mission, where you can pair up and come up with a solution.
“It’s not easy to do that – and it’s not always necessary. It’s costly, so you can see how it would get pushed to the bottom of the priority pile, but you do need to chip away at it when you have an important use case.”
Jim Thompson, Medical Devices and Pharmaceuticals Industry Lead, Siemens Digital Industries Software, adds: ‘There needs to be a layer on top [of the raw data] that is both helping with access to the data and helping to collaborate on the data in a way that can add a degree of intelligence. That’s as much about discounting poor data as it is about sharing good data.’
Data sharing versus commercial sensitivity
“Increasingly, you are seeing research papers published without making their code available or, in some cases, without even providing any data,” says Insmed’s McGonigle. “I think this is largely down to perceived competitive advantage in keeping the data and the code for processing and analysing it a secret.”
However, there are examples of collaboration – in an agnostic way – that enable organisations to learn from each other, without exposing the core IP.
CCDC’s Back says: “We have worked with large pharma companies to publish the drug subset – a set of structures that are linked to drugs. We have compared in-house data from companies such as Pfizer, GSK and AstraZeneca to this subset. No one saw individual structures, but you can see things such as the size of molecules or other derived data sets that may be useful. This can help when risk-assessing new compounds and identifying whether you sit within the span of existing data or outside it.”
Jackie Lighten, Program Manager, Cell and Gene Therapy Catapult, says that, when it comes to sharing experiences, there’s a difference between working in academia and working for a commercial, VC-funded start-up.
“In academia, there tends to be a more open approach to sharing data,” he says. “It’s a ‘competitive joint effort’, where competition is driven by kudos and career progression rather than product development. In the cell and gene sector, with potentially high-value products and IP involved, VC money is at risk by sharing data.
“However, I think there are ways of sharing data that don’t reveal IP. There needs to be some kind of federated data institute that will help sharing across modalities, so that lessons can be shared among developers that do not compromise protected findings, but help to steer other companies away from tar-pit traps.
“At Cell and Gene Therapy Catapult, we’re hoping to act in the capacity where we can pass information on an agreement basis between developers to help them get to that next stage of their company growth. Many developers won’t be baking in this kind of data sharing or data management framework in their funding pitches to VC investors simply because companies and investors are not necessarily altruistic and need to understand the benefit of data sharing before this can happen.”
The full report is available to download as a white paper, which also covers: Data collection and formats; Data ontologies and efficiency of process development; Cultural change and the digitisation journey; The shift to in silico for experiments; and Process optimisation and technology transfer.
Register to download the White Paper here
The roundtable and series of articles is sponsored by Siemens Digital Industries.