Smith-Waterman
What it is
Smith-Waterman is a dynamic programming algorithm for local sequence alignment, finding the best-matching subsequence regions between two sequences. Unlike global alignment (Needleman-Wunsch), it identifies locally similar regions.
How it works
Algorithm:
- Create scoring matrix; initialise to 0 (local alignment starts fresh anywhere)
- For each cell (i, j), compute:
- Match/mismatch: dp[i-1][j-1] + score(A[i-1], B[j-1])
- Insertion: dp[i-1][j] + gap_penalty
- Deletion: dp[i][j-1] + gap_penalty
- Zero (start local alignment here)
- dp[i][j] = max of above
- Find maximum value in matrix; backtrack to reconstruct alignment
Time: O(mn), space: O(mn). Scores are configurable; typical: +1 match, -1 mismatch, -1/-2 gap.
[illustrate: Scoring matrix showing local alignment region with highest score; traceback path highlighted]
Example
Aligning DNA: “ACGTACGT” vs “CGTACG”
Local alignment might find “CGTAC” region in both, ignoring mismatches at extremes.
Variants and history
Developed by Smith and Waterman (1981). Computationally expensive but accurate. Widely used in BLAST and other bioinformatics tools. Gap penalties can be linear or affine (fixed + per-character). Heuristic variants for speed.
When to use it
Bioinformatics: DNA, RNA, protein sequence alignment. Finding conserved domains and homologous regions. More useful than global alignment when comparing sequences of different lengths or when local regions are of interest. Too slow for large-scale database searches (use BLAST).