Trivia questions paired with Wikipedia + web evidence — long-form reading comprehension at scale.
Browse commercial NLP / Text → Visit original source ↗TriviaQA is a large-scale reading comprehension dataset containing 650K question-answer-evidence triples. Questions are authored by trivia enthusiasts (higher syntactic complexity than crowd-sourced QA), and each is paired with both Wikipedia articles and web search results as evidence documents. Distinct from SQuAD-style benchmarks because answers can require multi-sentence or cross-document reasoning.
LQS is our 7-dimension quality score, computed from the dataset's published statistics. See methodology →
Composite score computed from the 7 dimensions below: completeness, uniqueness, validation health, size adequacy, format compliance, label density, and class balance.
Common tasks and benchmarks where TriviaQA — Large-Scale Reading Comprehension is the default or competitive choice.
What's actually in the dataset — from the maintainer's published stats.
TriviaQA — Large-Scale Reading Comprehension is distributed under Apache 2.0. This is a third-party public dataset; LabelSets indexes and scores it but does not host or redistribute the data. Always verify current license terms with the maintainer before commercial use.
LabelSets sellers offer paid nlp / text datasets with what public datasets often can't give you:
Other entries in the NLP / Text catalog.