In the last few years, new DNA sequencing technologies have begun to replace the methods developed in the 1970s by double Nobel laureate Fred Sanger that dominated the first decade of the 'genome era'. Although new technologies are introduced and modified almost daily, three companies dominate the field: 454 Life Sciences (now owned by Roche), Illumina, and the tried and trusted Applied Biosystems. Thanks to these new methodologies, DNA is being sequenced ever more quickly and cheaply: it now costs fewer than $2,000 and take less than a day to sequence a gigabase. However, the data pouring off the new machines is in the form of much smaller fragments than that produced by Sanger sequencers. Scientists need to turn to novel approaches to assemble and analyse these 'tiny bits and pieces' of DNA.
'Tiny bits and pieces' was the title of a keynote address given by David Jaffe of the Broad Institute MIT and Harvard, Massachusetts, USA at the International Systems in Molecular Biology conference held in Toronto in July 2008. While conceding that short reads can be hard to work with, Jaffe described three already successful applications of the technology, epigenetics, analysis of DNA variation, and de novo genome assembly. He claimed that next-generation sequencing was becoming a 'general-purpose tool for answering any biological question that can be answered from a short DNA run'.
Software companies are investing heavily in methods for storing the enormous quantities of DNA data pouring off these new generation sequencers, and in assembling the short reads. They are now turning their attention to developing tools for analysing and extracting useful biological information from this data deluge. Two important players in this field are Synamatix, based in Kuala Lumpur, Malaysia, and California-based Active Motif, which owes its interest in sequencing software to its acquisition of TimeLogic. These two companies have taken strikingly different approaches: while Synamatix prides itself on producing software that performs efficiently on most hardware platforms, Active Motif has developed hardware optimised for the specialist calculations needed for high-throughput sequence analysis.
Arif Anwar, general manager of Synamatix, describes the company’s close relationship with the new sequencing technologies: 'We were founded in 2002, and we very soon realised that we were ideally placed to take advantage of these new technologies and develop fast, sensitive sequence assembly and analysis tools on ordinary PCs.' Their clients now include not only the large sequencing centres, but hospitals, agro-biotech companies, and small biology labs with a single sequencer.
'As our client range has broadened, we have developed a wider range of analysis tools to match. The range of things that can be done with next-generation sequences is growing all the time,' says Anwar. The core of Synamatix’ software is the SynaBASE database and SynaWORKS analysis platform, which can map 454, Illumina or AB reads to both genomic and cDNA sequences. The latter analysis is an alternative to the 'gold standard' of microarray technology for transcription analysis, and has some advantages: 'With our technology, unlike with microarrays, you don’t need to know which genes you are looking for,' he adds.
Active Motif’s bioinformatics software products are all tied to custom field programmable gate array (FPGA) hardware that has been optimised to accelerate - up to 500 times - some of the most commonly used algorithms in the bioinformatics armoury, including hidden Markov models, Smith-Waterman and the ubiquitous BLAST. 'In conjunction with Invitrogen, we are exploring ways to apply our FPGA technology to the analysis of next-generation sequencing data,' says Chris Hoover, business development manager for TimeLogic. 'Our customers are already using their DeCypher systems to fill computational gaps in their next-generation analysis pipelines,' comments Hoover. And, in straitened economic times, these users may be particularly grateful for a perhaps unexpected benefit of optimised hardware. 'One of our accelerators has the number-crunching power of hundreds of general purpose CPU cores, but with power consumption no higher than a 15 watt light-bulb.'
It is encouraging that these companies are developing tools that are affordable and accessible for biologists to use, even without specialist bioinformatics support. This may be the best way of ensuring that next-generation sequencing reaches its full potential as a general-purpose tool.