Application of a Reconfigurable Computing Cluster to Next Generation Genome Sequencing

Kristian Stevens1,  Henry Chen2,  Terry Filiba2,  Peter McMahon3,  Yun Song2
1UC Davis, 2UC Berkeley, 3Stanford


Abstract

Recent advances in ultra-high-throughput sequencing technology are allowing researchers to generate immense amounts of raw data in the form of short reads from ultra high-throughput platforms. In this paper we study how Field Programmable Gate Arrays (FPGAs) may be used to address computing challenges associated with next generation genome sequencing. A common prerequisite to utilizing data generated by next generation sequencers is alignment to a reference genome. While dynamic programming (DP) alignment algorithms are generally avoided on conventional architectures due to their computational complexity, they can be tailored for efficient implementation on systolic architectures. We implemented application-specific DP algorithms for aligning data from next generation sequencers in an application-specific reconfigurable computing cluster. Each FPGA is capable of rapidly aligning multiple sequences in parallel against a long reference genome. The reconfigurable cluster proves to be scalable and capable of processing real world datasets. We examine the advantages and practicality of this approach by benchmarking using real genomic data from a large high-throughput sequencing project. Our extensive validation showed that application specific algorithms and computing hardware can provide better results than current heuristic methods and may be particularly useful in circumstances where error rates or evolutionary divergence is high. While directly addressing the important problem of cheaply sequencing and assembling novel genomes, the methods presented are also relevant to many other "-omics" research applications.