150K crowdsourced question-answer pairs on Wikipedia passages, including unanswerable questions.
Browse commercial NLP / Text → Visit original source ↗SQuAD 2.0 combines 100K questions from SQuAD 1.1 with 50K unanswerable questions adversarially written by crowdworkers. Answers are spans of text from Wikipedia passages. It's the canonical benchmark for extractive question answering and reading comprehension.
LQS is our 7-dimension quality score, computed from the dataset's published statistics. See methodology →
Composite score computed from the 7 dimensions below: completeness, uniqueness, validation health, size adequacy, format compliance, label density, and class balance.
Common tasks and benchmarks where SQuAD 2.0 — Stanford Question Answering Dataset is the default or competitive choice.
What's actually in the dataset — from the maintainer's published stats.
SQuAD 2.0 — Stanford Question Answering Dataset is distributed under CC BY-SA 4.0. This is a third-party public dataset; LabelSets indexes and scores it but does not host or redistribute the data. Always verify current license terms with the maintainer before commercial use.
LabelSets sellers offer paid nlp / text datasets with what public datasets often can't give you:
Other entries in the NLP / Text catalog.