Commit
What it is
A commit is the operation that makes recently indexed documents durable — persisted to disk in a way that survives a crash or restart. In Lucene and Elasticsearch, “commit” specifically means writing a new commit point that includes all current segments and clearing the write-ahead transaction log.
Commit must be distinguished from refresh: a refresh makes documents visible to search (near real-time) without durability guarantees. A commit makes them durable but is more expensive.
How it works
Lucene’s write path has three stages:
- Buffer — documents accumulate in an in-memory index buffer.
- Refresh — the buffer is flushed to a new, searchable segment in the OS page cache. Documents are now visible to search but not yet on durable storage. Elasticsearch
refresh_intervaldefaults to 1 second. - Commit (flush) — all dirty segments are
fsync’d to disk; a newsegments_Ncommit point file is written listing all current segments; the transaction log is cleared. This is the true durability boundary.
In Elasticsearch, there are two types of commit:
Hard commit (flush): Calls fsync, writes the Lucene commit point, clears the translog. Triggered by POST /index/_flush or automatically when the translog reaches a size/age threshold (index.translog.flush_threshold_size, default 512MB).
Soft commit (refresh): Makes segments searchable without syncing to disk. Triggered by POST /index/_refresh or the refresh parameter on index requests.
[illustrate: timeline showing three stages — documents enter memory buffer (stage 1) → refresh creates searchable segment in page cache (stage 2, documents appear in search) → flush writes fsync’d commit point and clears translog (stage 3, documents survive crash) — with crash recovery arrow showing that data between last flush and crash is replayed from the translog]
Example
Indexing workflow for a bulk load:
- Disable automatic refresh (
refresh_interval: -1) to avoid creating many small segments. - Bulk index all documents.
- Call
POST /index/_refreshto make all documents searchable. - Call
POST /index/_flushto make them durable and clear the translog.
This pattern maximises indexing throughput by avoiding per-document or per-second refresh overhead during the load.
Variants and history
Lucene’s commit design follows write-ahead logging (WAL) patterns standard in databases. The translog in Elasticsearch is functionally equivalent to a WAL: in the event of a crash before a hard commit, the translog is replayed to recover the un-flushed documents.
In Elasticsearch 7.0+, the translog was changed from synchronous fsync per document to asynchronous fsync (configurable via index.translog.durability), improving write throughput at the cost of a small risk of losing the last second of writes on crash.
When to use it
- Explicit flush after bulk indexing — after a large bulk load, call
_flushto clear the translog and ensure durability. - Tuning
refresh_interval— increase to 30s or-1during heavy indexing; reduce to 1s (or less) for near real-time search requirements. - Monitoring translog size — a large translog indicates uncommitted data and slow flush times. Check via
GET /_cat/indices?v(thepri.store.sizecolumn includes translog).