Vocabulary grows sub-linearly with corpus size; predicts vocabulary size from token count via power law.