The European Bioinformatics Institute (EBI) is using data as a service to deliver its massive stores of life science data to the scientific community.
Scientists working in medicine, agriculture and environmental science make more than 12 million requests per month to the EBI for its data. By employing Delphix’s data as a service (DaaS) platform, the EBI expects to save as much as 70 per cent on its current storage footprint. But the technology will also help develop crucial data management strategies that will pave the way for personalised medicine initiatives.
The EBI, part of the European Molecular Biology Laboratory (EMBL-EBI), holds more than 55 petabytes of life science data and its databases double in size approximately every 12 months.
Steven Newhouse, head of technical services EMBL-EBI said: ‘The collection, curation, and release of reference genome data is vital for research activities worldwide – especially in the area of personalised medicine, which will drive future healthcare. However, the sheer size and complexity of the data we host makes it increasingly difficult to move, both internally and externally.’
Newhouse explained that much of this work revolves around cloning structured databases so they could be released. This repetitive cloning of data could be reduced drastically by virtualising the process using Delphix’s DaaS platform.
He continued: ‘We have been doing a lot of work over the last five years in terms of trying to use virtualisation to give us more flexibility in our compute activities. Essentially 99.9 per cent of this data is basically identical in each of these clones, except for the bits that each researcher was manipulating.’
Newhouse concluded: ‘We create what is essentially a virtual clone, because all that is unique to that particular instance is the differences that have been applied to it. When someone pulls data out of the table, they pull the original data and then you impose the changes that this developer has made -- this saves you a lot of storage’.
‘Delphix enables each developer to deploy his or her own temporary environment on demand, to do independent exploratory work, benchmarking or development. This kind of agility was just not possible without Delphix,’ said Manuela Menchi of EMBL-EBI’s database team.
In addition, the technology allows the EBI production teams to prepare and release research data faster and more frequently than ever before. It used to take up to three months to prepare a data release. Much of this time was spent passing copies of databases from one team to another, adding extra information about different molecules and interactions along the way. Now, the process is much faster.
‘Our projection is that Delphix will reduce the data release timeframe by 20 per cent, allowing some data resources to make an extra release – or more – every year with no additional development or curation staff. Reputation is the currency of research, and our users demand reliable and fast delivery,’ said Newhouse.
EMBL-EBI initially deployed Delphix Data as a Service platform three years ago, and now hosts over 50 virtual data environments supporting test and development operations. These environments have been used largely to benchmark the technology, which the EBI now plans to extend further across the institute.
Newhouse said: ‘The trials that we have been doing to date suggest that we could save 70 per cent of our storage by making wider use of this technology to allow us to use our databases more efficiently.’
Newhouse continued: ‘The teams that have been using it over the last couple of years have been very happy with the usability, the reliability, and the impact on the amount of storage that we are using. That gave us enough confidence to scale up the activities across the rest of the test/development organisation.’
As the volumes of life science data grow, many other organisations will need strategies to cope with data management. ‘This starts to become critical when you start to look at personalised medicine, because you are relying a lot more on the statistical inferences of behaviour,’ stated Newhouse.
Personalised medicine can be employed to target specific drugs and treatment to patients, based on a number of different data sources such as genetic information, lifestyle, and even geographical location. This requires the analysis and a management of many large data sets that must be integrated and understood together to derive real insight into patient outcomes.
A particular focus of personalised medicine is in preventing sickness and disease by actively informing people of potential risks that may face based on their lifestyle, genetics, but also, geography, and pollution for example. The Biomax Symposium held in Martinsried near Munich, Bavaria in September 2014 highlighted the use of personalised medicine but also the technical challenges that face medical professionals and scientists trying to implement these systems.
‘Treating the individual, with the knowledge of all’ and ‘Can informatics turn data into medical treatment?’ both focus on the potential benefits of personalised medicine, while highlighting the significant challenges in delivering computational and storage resources as well as intelligent analytical software which can analyse all of this data efficiently.
According to Newhouse: ‘This can help to start targeting medicines and drugs much more effectively to an individual than we have at the moment and better targeting would have an immediate impact in terms of the effectiveness of the healthcare budget.’