Novel algorithms and hardware designs for ultra-fast next-gen
sequence analysis
- United States of America National Institutes of Health (R01
HG006004), 2011-2015
- PI: Onur Mutlu
- Co-PI: Can Alkan
- Subaward amount (4 years): $462,847
- The goal of this project is to develop specialized hardware
architectures to accelerate mapping reads generated with the high
throughput sequencing platforms.
People
- Principal Investigators: Assistant Prof. Can Alkan (Bilkent U.)
and Assistant Prof. Onur Mutlu (Carnegie Mellon U.)
- Students
- CMU: Hongyi Xin, Donghyuk Lee, Samihan Yedkar, Damla Şenol
Çalı
- UCLA: Farhad Hormozdiari
- Bilkent: Mustafa Korkmaz, Azita Nouri, Mohammed Alser, Tuğba
Doğan
Abstract
Our proposed research aims to accelerate next generation sequence
analysis 1000-fold or more by combining our knowledge in genomic
sequence analysis, algorithms development, and computer
architecture/engineering. Our plan to address the problems of processing
unprecedented amounts of sequence data has three major components.
First, we will develop and improve sophisticated software algorithms and
tools to handle large amounts of sequence reads generated by all major
NGS platforms without sacrificing sensitivity while correcting for the
sequencing biases associated by each of the NGS platforms. Our
algorithms will also be able to map reads in the duplicated regions of
the genome and report the underlying sequence variation, an important
feature especially to characterize segmental duplications and structural
variation that no other read mapping tool can currently achieve. Second,
we will boost the performance and efficiency of our algorithms (100 to
1000-fold) by accelerating the required inherently-parallel computations
of the sequence search problem on massively-parallel hardware engines
available today, graphics processing units (GPUs). Finally, we will
design specialized hardware architectures to enhance the speed of
sequence analysis beyond orders of magnitude while reducing energy
consumed by it by 100-fold or more.
Dissemination
- SCALCE: boosting Sequence
Compression Algorithms using Locally Consistent Encoding.
Faraz Hach, Ibrahim
Numanagić, Can
Alkan, S. Cenk Sahinalp. Bioinformatics,
Dec
1;28(23):3051-57, 2012.
- Accelerating
read mapping with FastHASH. Hongyi Xin, Donghyuk
Lee, Farhad Hormozdiari, Samihan Yedkar, Onur Mutlu, Can
Alkan. BMC Genomics, 14(Suppl
1):S13, 2013.
- mrsFAST-Ultra:
a compact, SNP-aware mapper for high performance sequencing
applications. Faraz Hach*, Iman Sarrafi*, Farhad
Hormozdiari, Can Alkan, Evan
E. Eichler, S. Cenk Sahinalp. Nucl Acids Res, Jul;42(Web
Server issue):W494-500, 2014.
- Fast
and accurate mapping of Complete Genomics reads. Donghyuk
Lee*, Farhad Hormozdiari*, Hongyi Xin, Faraz Hach, Onur Mutlu, Can Alkan. Methods,
[epub October 22], doi :10.1016/j.ymeth.2014.10.012, 2014.
- Shifted
Hamming Distance: a fast and accurate SIMD-friendly filter to
accelerate alignment verification in read mapping. Hongyi
Xin, John Greth, John Emmons, Gennady Pekhimenko, Carl Kingsford, Can Alkan* and Onur
Mutlu*. Bioinformatics, [published online, Jan 10], 2015.
- Optimal
Seed Solver: Optimizing Seed Selection in Read Mapping. Hongyi
Xin, Sunny Nahar, Richard Zhu, John Emmons, Gennady Pekhimenko, Carl
Kingsford, Can
Alkan*, Onur Mutlu*. Bioinformatics,
Jun 1;32(11):1632-42, 2016.
- MAGNET:
understanding and improving the accuracy of genome pre-alignment
filtering. Mohammed
Alser, Onur Mutlu*, Can
Alkan*. IPSI Transactions on Internet Research, 13(2),
2017.
- GateKeeper:
a new hardware architecture for accelerating pre-alignment in
DNA short read mapping. Mohammed
Alser, Hasan Hassan, Hongyi
Xin, Oguz Ergin, Onur Mutlu*, Can
Alkan*. Bioinformatics, Nov 1;
33(21):3335-63, 2017.
- GRIM-Filter:
fast seed location filtering in DNA read mapping using
processing-in-memory technologies. Jeremie
S. Kim, Damla Senol Cali, Hongyi Xin, Donghyuk Lee, Saugata Ghose, Mohammed Alser, Hasan
Hassan, Oguz Ergin, Can
Alkan*, Onur Mutlu*. BMC Genomics,
19 (Suppl 2):89, 2018.
- Nanopore
sequencing technology and tools for genome assembly:
computational analysis of the current state, bottlenecks and
future directions. Damla
Senol Cali, Jeremie S. Kim, Saugata Ghose, Can
Alkan*, Onur Mutlu*. Briefings in
Bioinformatics, [epub Apr 2; doi: 10.1093/bib/bby017], 2018.