Needleman-Wunsch

What it is

Needleman-Wunsch is a dynamic programming algorithm for global sequence alignment, finding the optimal alignment between two entire sequences. It aligns sequences end-to-end, accounting for all positions.

How it works

Algorithm:

  1. Create scoring matrix; initialise first row and column with cumulative gap penalties
  2. For each cell (i, j), compute:
    • Match/mismatch: dp[i-1][j-1] + score(A[i-1], B[j-1])
    • Insertion: dp[i-1][j] + gap_penalty
    • Deletion: dp[i][j-1] + gap_penalty
    • dp[i][j] = max of above
  3. Bottom-right cell contains optimal alignment score
  4. Backtrack from bottom-right to reconstruct alignment

Time: O(mn), space: O(mn). Scores configurable; typical: +1 match, -1 mismatch, -1 gap.

[illustrate: Complete scoring matrix with initialisation and traceback path from bottom-right to top-left]

Example

Aligning “GCATGC” vs “GATTACA”:

Optimal global alignment might be:

G-CATGC
GATTACA

With scores for each match, mismatch, and gap summed.

Variants and history

Developed by Needleman and Wunsch (1970). Seminal work in bioinformatics and sequence analysis. Similar to Levenshtein but with configurable scoring. Affine gap penalties (fixed + per-gap) improve biological realism. Faster heuristics (FASTA, BLAST) use seed-and-extend strategies.

When to use it

Comparing full-length sequences where global alignment is appropriate. Bioinformatics: DNA/RNA/protein alignment. More appropriate than local alignment (Smith-Waterman) when sequences are similar length and full alignment matters. Too slow for large databases; use heuristics.

See also