With the advances made in genomics over the past decade, the emphasis is slowly shifting to proteomics to make sense of the genetic information encoded by DNA. The proteome is the total complement of proteins expressed by a genome, cell, tissue or organism, and within that there is enormous complexity. Proteins are modified and dressed up so that the 25,000 genes making up the human genome translates to 300,000 to 400,000 different proteins. These proteins are continuously being made and degraded – a mouse liver regenerates itself every day through protein synthesis and degradation – and this flux of proteins within cells is an important research area within proteomics.
Depending on the experimental method, proteomics can generate vast amounts of data. Peptide mass fingerprinting (PMF) is one such method, which allows multiple proteins to be analysed at once. The technique involves cleaving proteins into constituent peptides and using mass spectrometry to identify the peptide masses. The spectrums are then compared against those of known proteins on a database. Professor Rob Beynon, head of the proteomics and functional genomics group at the University of Liverpool, is using FPGA technology to accelerate the spectrum analysis and database searching involved in PMF.
Speaking at the New Technologies for Protein Analysis Workshop, which took place on 10 December 2008, Beynon explained that using FPGA technology, raw mass spectrometry data can be analysed in microseconds, far quicker than conventional processing. 'Searching an entire protein database takes 240 milliseconds – 1,600 times faster than using C programming running on a 3GHz processor,' he said.
Beynon commented that getting to a good level of understanding of protein interactions will be nowhere near as easy as with DNA and that to improve research capabilities in proteomics the technology must be available to analyse data as fast as it is generated.
With the large amounts of data generated, genomics and proteomics have joined disciplines such as metrology and astronomy as producers of petabyte data volumes. Sarah Hunter, InterPro team leader at the European Bioinformatics Institute (EBI), commented at the workshop that issues such as data storage, data transfer, analysis and interpretation, and providing access to the data are all considerations organisations involved in proteomics research must take seriously. EBI are currently using cloud computing techniques to improve the handling of data.
Systems biology is a growing area for the study of living systems and a lot of proteomics data goes into building and validating complex in silico models of biological pathways. Darren Wilkinson, professor of stochastic modelling at Newcastle University, noted in his presentation that a lot of computational systems biology (CSB) researchers work with deterministic models, but there is increasing evidence that many intra-cellular interactions are inherently stochastic and only through incorporating stochastic effects into CSB models will our understanding of biological pathways be improved.
Wilkinson's team at Newcastle University has been using single-cell time-course data from fluorescence microscopy to build stochastic models of specific cellular events. However, this type of data is difficult to obtain for large numbers of proteins and Wilkinson's team is also incorporating high-throughput data into the models. One project currently under development at Newcastle University is CaliBayes, a Biotechnology and Biological Sciences Research Council (BBSRC) funded project providing a software infrastructure for calibration and validation of CSB models.
The New Technologies for Protein Analysis Workshop was organised by the Science and Technology Facilities Council's KITE Club in combination with the Sensors and Instrumentation KTN.