Dataset
-
MS MARCO
Microsoft MAchine Reading COmprehension dataset; the dominant benchmark for passage retrieval and document ranking with 8.8M passages, 1M training queries, and sparse binary relevance judgments.
-
Corpus Annotation
Adding linguistic labels to corpus text (POS tags, NER tags, dependencies, etc.); creates training data for supervised NLP tasks.