Bioinformatics as a discipline has been evolving and expanding throughout its short life. When the word was coined, some time in the 1990s, it was regarded as almost synonymous with sequence analysis. About a decade ago, bioinformatics meetings and journals were principally concerned with gene and protein sequences, how to analyse them, and what insights that analysis might give into the function of genes and the structures of the proteins they encode.
A more recent definition, taken from Merriam-Webster's Medical Dictionary, defines it as 'the collection, classification, storage, and analysis of biochemical and biological information using computers, especially as applied in molecular genetics and genomics'. But even that fails to do justice to the breadth of knowledge that now falls within the remit of bioinformatics. Organisers of the largest international bioinformatics conference last year, ISMB/ECCB 2004, were taken a little by surprise by the relative unpopularity of the sequence analysis sessions. In contrast, delegates packed into one of the smaller rooms to listen to speakers on more arcane topics such as textual analysis and ontologies. The analysis of transcriptomics and proteomics data, via microarrays and 2D gels, is another area that has become a core part of bioinformatics within only a few years.
So has this rapidly developing discipline 'come of age' as Janet Thornton, director of the European Bioinformatics Institute, now believes? Thornton, with David Jones of University College London and Michael Sternberg of Imperial College London, recently organised an extremely popular Royal Society discussion meeting, 'Bioinformatics - from molecules to systems', that explored one of the boundaries of modern bioinformatics. Systems biology is a new discipline with very close links to bioinformatics. There has been plenty of discussion, and some argument - not least by funding bodies - in recent years about which topics come into which area, and even whether 'systems biology' is part of bioinformatics or a discipline in its own right.
History is important, even in an area as fast-moving as twenty-first century biology. David Eisenberg of the University of California Los Angeles, one of the pioneers of protein structure analysis and prediction, set the scene at the Royal Society meeting by locating the 'birth' of bioinformatics with Fred Sanger's sequencing of bovine insulin in 1952. 'A protein is a single chemical substance... thus it is possible to assign a unique structure to the chain of insulin', wrote Sanger. Thus he implied the existence of a genetic code, months before Watson and Crick's pioneering work revealed the structural basis for that code. Three years later, the alignment of insulin from several species was published: the first protein family had been discovered, and, with it, the molecular basis of evolution.
It took more than three decades for the trickle of data that followed Sanger's first sequence to become a flood. The explosion of data from the 1990s onwards was fuelled by both unprecedented technical advances and an expansion in funding through the various genome projects. The human genome project, completed in 2003, must be one of only a few public projects ever to be finished ahead of schedule. The number of completed genome sequences now stands at over 200, although, admittedly, the vast majority of these come from bacteria. The main gene-sequence databases now contain about 50 million sequences and more than 85 billion base-pairs of genetic data, and more than 26,000 different protein structures are known. More than a hundred structures are added to the main structural database, the Protein Data Bank, each week: this is not far off the total number solved between the first release of the database in 1972 and the end of that decade. And bioinformaticians are largely living up to the challenge of 'collecting, classifying, storing, and analysing' this avalanche of data.
But it can be argued that the genomics revolution has not, or at least has not yet, produced similar advances in understanding biological processes, particularly in practical applications such as drug design. The prediction at the turn of the millennium that 'all signalling pathways' would be known by 2005 has not been fulfilled. Within the pharmaceutical industry, molecules showing promise in vitro and in vivo are increasingly failing in the development phase, and the number of new drugs coming on to the market is decreasing. Recent problems with the COX-2 inhibitor class of anti-inflammatory drugs only emphasise the difficulty of producing drugs that must be safe while taken, not over weeks or months, but over the rest of the patients' lives.
It is clear that understanding how one molecule, such as a drug, interacts with its target cannot be sufficient. An enzyme-inhibitor interaction, for example, takes place in the context of a metabolic pathway, a cell, a tissue, an organ, and an organism.
Systems biology can be defined trivially as the computer-based analysis or simulation of molecular data within the context of a system. Many researchers in the field, however, use their own, occasionally contradictory, definitions. Adriano Henney, director of the Pathways Capability at AstraZeneca's Alderley Park site, stresses its pharmaceutical application: 'Systems biology is a different way of doing physiology, understanding how a target will respond in the context of cellular networks.'
Steve Oliver, from the University of Manchester, asked the pertinent question at the Royal Society meeting: 'What is a system?' The eukaryotic cell is undoubtedly a biological system. Oliver has spent his research career studying yeast, and yeast cells are some of the best-understood eukaryotic cells in nature. 'Yet we are still a long way from a complete model of a single yeast cell, and we probably will be for a long time', he says. 'We therefore need to work with subsystems.' This introduces two problems. The first is simply one of terminology: what is the smallest subsystem? Put simply, is a researcher such as Sarah Teichmann from the MRC Laboratory of Molecular Biology in Cambridge, who studies the interactions of small numbers of proteins in complexes, doing systems biology, or bioinformatics?
Oliver defined a second, more serious problem: 'you can't get there from here'. In systems biology, there may be many groups simulating different subsystems - networks, metabolic pathways, organelles, or whatever - within the yeast cell. Eventually these groups will want, or need, to fit their models together. 'We will need a model yeast cell to build our subsystems into', says Oliver. 'At the moment, we still need Saccharomyces cerevisiae to do it for us.'
Oliver uses mathematical models to simulate metabolic pathways in yeast and determine which enzymes in a complex pathway have the most control over its outcome. His work has recently benefited from a £6 million grant, to the University of Manchester from the UK research councils BBSRC and EPSRC, to set up one of three Centres for Integrative Systems Biology in the UK. Its remit will be to develop generic systems biology technologies and test them on yeast, a widely studied model organism.
The other two systems-biology centres, at Imperial College and the University of Newcastle, have more direct links with medicine. Researchers at the Imperial College centre will look at interactions between pathogens and the cells they infect at the molecular and cellular level; the Newcastle centre, under the leadership of Tom Kirkwood, professor of gerontology, will study ageing. Daryl Shankley, working with Kirkwood, has already produced a 'virtual ageing cell' to simulate processes involved in ageing, such as telomere shortening and oxidative damage. Shankley's models have been mounted on the web as a pilot E-science Grid project and are freely available to all potential collaborators, from industry as well as academia.
At the Royal Society meeting, Eisenberg hoped that the maturing disciplines of bioinformatics and systems biology would become integrated into medicine. 'In 10 years' time, half the participants in meetings like this one will be medically qualified', he predicted. The clear medical emphasis of two of the three systems-biology centres - although funded by non-medical research councils - is a sign that funding bodies are taking such applications seriously.
On the other side of the Atlantic, Hamid Bolouri and his group at the Institute for Systems Biology in Seattle, USA, are also looking at infectious disease, using systems biology to understand the response of macrophages to pathogens. 'Macrophages are our first line of defence; they send out signals to stimulate the rest of the immune system', he says. 'We are now able to simulate the order in which genes in macrophages are activated when pathogens bind to the receptors on their surfaces, and we are beginning to understand the signalling pathways involved.'
Thornton has summed up the relationship between bioinformatics and systems biology by turning it full circle: 'Systems will inform at the molecular level; we can learn more about molecules from looking at systems as well as vice versa'. The boundaries between bioinformatics and systems biology may be fuzzy, but they are separate, if very closely interlinked, disciplines, and they need each other. So what is in a name, after all?