Needleman-Wunsch
What it is
Needleman-Wunsch is a dynamic programming algorithm for global sequence alignment, finding the optimal alignment between two entire sequences. It aligns sequences end-to-end, accounting for all positions.
How it works
Algorithm:
- Create scoring matrix; initialise first row and column with cumulative gap penalties
- For each cell (i, j), compute:
- Match/mismatch: dp[i-1][j-1] + score(A[i-1], B[j-1])
- Insertion: dp[i-1][j] + gap_penalty
- Deletion: dp[i][j-1] + gap_penalty
- dp[i][j] = max of above
- Bottom-right cell contains optimal alignment score
- Backtrack from bottom-right to reconstruct alignment
Time: O(mn), space: O(mn). Scores configurable; typical: +1 match, -1 mismatch, -1 gap.
[illustrate: Complete scoring matrix with initialisation and traceback path from bottom-right to top-left]
Example
Aligning “GCATGC” vs “GATTACA”:
Optimal global alignment might be:
G-CATGC
GATTACA
With scores for each match, mismatch, and gap summed.
Variants and history
Developed by Needleman and Wunsch (1970). Seminal work in bioinformatics and sequence analysis. Similar to Levenshtein but with configurable scoring. Affine gap penalties (fixed + per-gap) improve biological realism. Faster heuristics (FASTA, BLAST) use seed-and-extend strategies.
When to use it
Comparing full-length sequences where global alignment is appropriate. Bioinformatics: DNA/RNA/protein alignment. More appropriate than local alignment (Smith-Waterman) when sequences are similar length and full alignment matters. Too slow for large databases; use heuristics.