Integrated approaches for genomic variation discovery using
high throughput sequencing
- European Union Marie Curie Actions Career Integration Grant
(PCIG10-GA-2011-303772), 2012-2016
- PI: Can Alkan
- Students: Marzieh Eslami Rasekh, Can Fırtına
- Total €100,000 for four years.
- The goal of this project is to develop computational methods to
understand genomic variation using high throughput sequencing
(HTS) with a special focus on structural variation (SV) including
copy-number variation (CNV) and balanced rearrangements
(inversions, translocations) in the complex regions of the human
genome that are rich in repeats and duplications.
Abstract
The new sequencing technologies revolutionize genomics as they promise
low-cost, high-throughput sequencing of both new species and different
individuals to more fully analyze the patterns of genetic variation.
These “next-generation” platforms started to contribute our
understanding of human genome diversity with the 1000 Genomes Project
that employs the high throughput sequencing (HTS) methods to produce the
most detailed map of human variation. Other large scale sequencing
projects are initiated to characterize genomes to assess characteristics
of human genome diversity, to find genetic causes for disease, and infer
the evolutionary history of species. Although we can now generate data
at a rate previously unimaginable, the analysis of such data is
lingering as currently available algorithms to analyze HTS data show
different strengths and biases for different classes of variation. There
is a need to forge an alliance between computer science and genomics to
devise better methods to use the massive amount of sequence data. Here
we propose to develop novel algorithms to comprehensively and quickly
discover all forms of genomic variants including point mutations, indel
polymorphisms and structural variation while resolving inconsistencies
among different variants to accurately identify normal and
disease-causing variation.
Dissemination
- Accelerating
read mapping with FastHASH. Hongyi Xin, Donghyuk
Lee, Farhad Hormozdiari, Samihan Yedkar, Onur Mutlu, Can
Alkan. BMC Genomics, 14(Suppl
1):S13, 2013
- Genome
Sequencing Highlights the Dynamic Early History of Dogs.
Adam H. Freedman, Ilan Gronau, Rena M. Schweizer, Diego Ortega-Del
Vecchyo, Eunjung Han, Pedro M. Silva, Marco Galaverni, Zhenxin Fan,
Peter Marx, Belen Lorente-Galdos, Holly Beale, Oscar Ramirez, Farhad
Hormozdiari, Can Alkan,
Carles Vilà, Kevin Squire, Eli Geffen, Josip Kusak, Adam R. Boyko,
Heidi G. Parker, Clarence Lee, Vasisht Tadigotla, Adam Siepel,
Carlos D. Bustamante, Timothy T. Harkins, Stanley F. Nelson, Elaine
A. Ostrander, Tomas Marques-Bonet, Robert K. Wayne, John Novembre. PLoS
Genetics, 10(1): e1004016, 2014.
- Early postzygotic mutations
contribute to de novo variation in a healthy monozygotic twin
pair. Gülşah M Dal, Bekir Ergüner, Mahmut S
Sağıroğlu, Bayram Yüksel, Onur Emre Onat, Can
Alkan, Tayfun Özçelik. J Med Genet,
51(7):455-459, 2014.
- mrsFAST-Ultra:
a compact, SNP-aware mapper for high performance sequencing
applications. Faraz Hach*, Iman Sarrafi*, Farhad
Hormozdiari, Can Alkan, Evan
E. Eichler, S. Cenk Sahinalp. Nucl Acids Res, Jul;42(Web
Server issue):W494-500, 2014.
- Fast
and accurate mapping of Complete Genomics reads. Donghyuk
Lee*, Farhad Hormozdiari*, Hongyi Xin, Faraz Hach, Onur Mutlu, Can Alkan. Methods,
Jun;79-80:3-10, 2015.
- Shifted
Hamming Distance: a fast and accurate SIMD-friendly filter to
accelerate alignment verification in read mapping. Hongyi
Xin, John Greth, John Emmons, Gennady Pekhimenko, Carl Kingsford, Can Alkan* and Onur
Mutlu*. Bioinformatics, [published online, Jan 10], 2015.
- Optimal
Seed Solver: Optimizing Seed Selection in Read Mapping.
Hongyi Xin, Sunny Nahar, Richard Zhu, John Emmons, Gennady
Pekhimenko, Carl Kingsford, Can
Alkan*, Onur Mutlu*. Bioinformatics, Jun
1;32(11):1632-42, 2016.
- Robustness
of massively parallel sequencing platforms. Pınar Kavak, Bayram
Yüksel, Soner Aksu, M. Oğuzhan Külekçi, Tunga Güngör, Faraz Hach, S.
Cenk Sahinalp, Turkish Human Genome Project, Can
Alkan*, M. Şamil Sağıroğlu*. PLoS ONE, Sep
18;10(9):e0138259, 2015.
- A
global reference for human genetic variation. 1000
Genomes Project Consortium. Nature,
Oct 1; 526
(7571):68–74, 2015.
- On
genomic repeats and reproducibility. Can
Firtina and Can
Alkan. Bioinformatics,
Aug 1;32(15):2243-7, 2016.
- Discovery
of large genomic inversions using long range information.
Marzieh Eslami Rasekh,
Giorgia Chiatante, Mattia Miroballo, Joyce Tang, Mario Ventura,
Chris T. Amemiya, Evan E. Eichler, Francesca Antonacci*, Can
Alkan*. BMC Genomics, Jan 10;18(1):65, 2017.
- MAGNET:
understanding and improving the accuracy of genome pre-alignment
filtering. Mohammed
Alser, Onur Mutlu*, Can
Alkan*. IPSI Transactions on Internet Research, 13(2),
2017.
- GateKeeper:
a new hardware architecture for accelerating pre-alignment in
DNA short read mapping. Mohammed
Alser, Hasan Hassan, Hongyi
Xin, Oguz Ergin, Onur Mutlu*, Can
Alkan*. Bioinformatics, Nov 1;
33(21):3335-63, 2017.
-