Smith-Waterman

What it is

Smith-Waterman is a dynamic programming algorithm for local sequence alignment, finding the best-matching subsequence regions between two sequences. Unlike global alignment (Needleman-Wunsch), it identifies locally similar regions.

How it works

Algorithm:

  1. Create scoring matrix; initialise to 0 (local alignment starts fresh anywhere)
  2. For each cell (i, j), compute:
    • Match/mismatch: dp[i-1][j-1] + score(A[i-1], B[j-1])
    • Insertion: dp[i-1][j] + gap_penalty
    • Deletion: dp[i][j-1] + gap_penalty
    • Zero (start local alignment here)
    • dp[i][j] = max of above
  3. Find maximum value in matrix; backtrack to reconstruct alignment

Time: O(mn), space: O(mn). Scores are configurable; typical: +1 match, -1 mismatch, -1/-2 gap.

[illustrate: Scoring matrix showing local alignment region with highest score; traceback path highlighted]

Example

Aligning DNA: “ACGTACGT” vs “CGTACG”

Local alignment might find “CGTAC” region in both, ignoring mismatches at extremes.

Variants and history

Developed by Smith and Waterman (1981). Computationally expensive but accurate. Widely used in BLAST and other bioinformatics tools. Gap penalties can be linear or affine (fixed + per-character). Heuristic variants for speed.

When to use it

Bioinformatics: DNA, RNA, protein sequence alignment. Finding conserved domains and homologous regions. More useful than global alignment when comparing sequences of different lengths or when local regions are of interest. Too slow for large-scale database searches (use BLAST).

See also