Course Slides
(updated as of Fall 2022 - subject to further updates)
Note: Some slides are adapted from https://www.bioinformaticsalgorithms.org/
All pseudocodes in the slides are provided as guidelines, not meant for direct “copy-paste” implementation.
- 0- Introduction
- 0b- Introduction
- 1- Motif Finding & Algorithm Design Techniques
- 2- Exact String Matching - Knuth-Morris-Pratt and Boyer-Moore
- 3- Exact String Matching - Rabin-Karp, Finite Automata. Shift/And
- 4- Pattern Matching - Hashing, keyword trees, and suffix trees, suffix arrays, Aho-Corasick
- 5- Burrows-Wheeler Transformation and Ferragina-Manzini index
- 6- Dynamic programming, Manhattan grid, edit distance, longest common subsequence
- 7- Global alignment, affine gap penalties, local alignment, Needleman-Wunsch and Smith-Waterman
- Needleman-Wunsch with affine gaps sample
- Smith-Waterman samples
- 8- Banded alignment, linear space alignment, block edit distance, Four Russians method
- 9- Multiple sequence alignment
- 10- Phylogenetic tree construction: Guide trees, evolutionary trees - Neighbor-Joining and UPGMA, parsimony, Sankoff and Fitch algorithms
- 11 - Heuristic sequence search. Short introduction to BLAST. Hash table indexes, minimizers. Chaining
- 12 - MEMs, MUMs, BWA-MEM, minimap2, K-mer data structures
- 13- Graphs in genome analysis. OLC, de Bruijn, string graphs, aligning reads to graphs
- 14- Applications: short introduction to genome sequencing. Current platforms and data types. Standard file formats
- 15- Applications: programming libraries, application-specific programming languages