The science of structural biology was born 70 years ago with the publication of a Nature paper by a Cambridge professor, John D. Bernal, and his student, Dorothy Crowfoot. After describing crystals of a small protein, pepsin, Bernal and Crowfoot wrote: 'It is clear that we [now] have the means of... arriving at far more detailed conclusions about protein structure than previous physical or chemical methods have been able to give.' It was more than 20 years later that the first protein structure was published, and it was not until the 1990s that the trickle of protein structures became a deluge. But it now seems that Bernal and Crowfoot's prophecy was not bold enough. Protein structure has given us unique insights into protein function, mechanism and evolution, and drugs such as HIV protease inhibitors. Biologists owe their understanding of these structures, and the intimate relationship between protein sequence, structure and function, to a few pioneering structural biologists. One of these is Janet Thornton, director of the European Bioinformatics Institute, near Cambridge, UK.
Throughout her distinguished research career, Thornton has aimed to understand biology at a molecular level, by using computers to study protein sequences and structures (what is now known as bioinformatics). She was one of the first to attempt to classify structures and to describe them in terms of their component motifs. 'I have always been interested in understanding how things work, and finding patterns in nature', she says. As an undergraduate physics student at Nottingham in the late 1960s, she was drawn to the developing field of biophysics, at the intersection of physics and biochemistry. After a PhD at the University of London, studying the structures of nucleotides, she moved to Oxford to work with Professor (later Sir) David Phillips. 'My first job in Oxford was as a system administrator', she remembers, 'but David [Phillips] gave me immense freedom to work in protein structure research.' One of her colleagues at Oxford was the former Dorothy Crowfoot - by then, as Dorothy Hodgkin, winner of the Nobel Prize for her work on the structure of vitamin B12.
Thornton was asked to study the sequence of an enzyme called triose phosphate isomerase, known to biochemists worldwide as 'TIM'. While she was engaged in this work, her colleagues, Phillips and a student, Ian Wilson, solved the enzyme's structure. It is almost impossible for younger structural biologists to imagine the breakthrough that this represented then, when only a handful of structures were known. The structure was revealed as two concentric barrels: eight strands in an inner circle, surrounded by eight helices in an outer one. 'I can remember thinking how wonderful this structure is, and how elegantly it folds round', says Thornton. 'However, protein structures are not only beautiful, but also useful; you can often use structure to understand how they work.' In this particular case, it was clear that the double barrel had evolved so that amino acids from different parts of the protein's sequence that were involved in the enzyme reaction could come close together in space.
Collecting and collating information
This structure, which quickly became known as the 'TIM barrel', is of more than aesthetic, or even functional, importance. We now know that an enormous number of proteins take up this fold; all these are enzymes. The TIM barrel is one of only a few of these very common folds, or 'super-folds'. Another example is the immunoglobulin fold, a collection of near parallel strands that is found in many proteins in the mammalian immune system. Although, in most cases, proteins with similar folds can be assumed to have evolved from a common ancestor, this is not necessarily true of the super-folds. They are examples of convergent evolution, where nature seems to have come up with the same solution at several different points in evolutionary history.
The structures of several super-folds were first solved during Thornton's time in Oxford, and the seeds of the interest in structure classification that has influenced her research ever since were sown there. 'At Oxford, I began to collect information about each new structure as it was published,' she says.
In 1980, Thornton moved to work with Professor (now Sir) Tom Blundell at Birkbeck College, London, mainly for family reasons. Her children had been born whilst she was at Oxford, and she was looking for a way to combine her family responsibilities with her scientific career. 'Tom [Blundell] was very supportive, but very demanding, when I was a part-time lecturer', she remembers. 'He treated me as a group leader, even when I had a group of one'. At Birkbeck, she carried on collecting and classifying information about each new structure as it was published. She developed a hierarchical system, grouping these structures into families. 'Initially I kept a handbook - a big, blue book - containing details of every protein structure I knew about,' she says.
By the mid-1990s, Thornton was known as one of the world's authorities on protein structure classification, but much of her work was languishing unpublished. She already had such a mass of data that no journal was prepared to publish it, and she didn't have time to write a book. Then along came the web, and the rest was history. By then, she had taken up a chair at University College, London, across the road from Birkbeck, and Christine Orengo, now a professor there, had arrived to work with her as a postdoc. The data that was regarded as too unwieldy to be published on paper was ideally suited for re-formatting into a web-based database. This electronic version of the 'big, blue book' is CATH, one of the most widely used protein-structure databases. The acronym, which stands for Class, Architecture, Topology and Homology, illustrates the hierarchical nature of the database. Each clearly defined protein family is given four numbers. 'CATH numbers', as they are called, are - unlike URLs - 'big-endian', with the Class number at the left-hand end representing the most general aspect of the structure.
'Threading' process unveiled
At UCL, Thornton built up a large group of talented scientists and programmers, who, together, developed many more of the tools that protein scientists worldwide rely on today. These include Roman Laskowski's Procheck, for evaluating the quality of protein structures, and David Jones' Threader. Jones, who, like Orengo, is now a professor at UCL, coined the term 'threading' to describe a technique of predicting protein structure in which a protein sequence is physically 'threaded' onto a structure backbone and the resulting structure evaluated. The descendent of the original Threader is considered one of the most accurate structure prediction tools available today.
Many academic bioscientists dream of starting successful biotechnology companies, but few succeed, and even fewer are 'head-hunted' by industry. Thornton is one of this minority. 'I was approached by a group of investors who wanted to start a biotech company at University College, and asked if I would be interested in setting it up', she remembers. The company - Inpharmatica - now employs about 100 people, mostly in central London. Its core technology began with Biopendium, a comprehensive and user-friendly package for annotating protein sequences with information about sequence features, structure, and ligand binding, and for selecting druggable targets. The company is increasingly focused on bioinformatics-driven drug discovery in specific protein families, and it is looking to take its first molecules into clinical trials before long. Although Thornton is no longer CSO, she still works with the board. She explained: 'It has been great fun to do', she says. 'I meet different types of people working for Inpharmatica, and its challenges are just as big as those in academia. But basic research will always remain my main focus.'
Thornton became the director of the European Bioinformatics Institute, which shares its site with the Wellcome Trust Sanger Institute, where about a quarter of the human genome was sequenced, in 2001. She is still developing new techniques for organising the vast torrents of structural information that are now arising from structural genomics programmes, where protein structures are studied 'at industrial scale'. She is a partner in the Midwest Structural Genomics Consortium, which has more than 140 structures to its credit, and is developing techniques for predicting a protein's function from its structure. 'Until about ten years ago, you could guarantee that you would know the function, in some detail, of any protein that you wanted to work on the structure of,' she says. 'Now, with high throughput programmes, we often find that we are working on proteins with no known function. We know that the fold of a protein cannot determine its detailed function; for example, proteins with the same TIM barrel fold catalyse more than 60 sixty different reactions. But we can develop tools to pick out the binding sites on a protein, and determine which other molecules bind to it and sometimes even what its biochemical function is.' In a recent blind test, these tools were able to predict some functional information correctly for more than 70 per cent of a set of 30 protein structures.
Asked about her achievements, Thornton prefers to point to those of her group members. 'I have been lucky to be surrounded by talented young graduate students and postdocs, and trained them to appreciate the pleasure of doing science. That is an achievement to be proud of,' she says. It cannot be surprising that many of these students and postdocs, Professors Orengo and Jones among them, have developed very successful independent research careers.
Thornton has now been at the EBI for half of her five-year term. She is still undecided about her plans for when that contract expires. 'At some point I will have to decide whether to stay at the EBI with administrative responsibilities and challenges or return to being a full-time research group leader', she says. Whatever she decides, she is bound to face new challenges in her endeavour to find structure and sense in the marvellous complexity that is life at the molecular level.