Real anonymised Google search queries answered from Wikipedia — the original open-domain QA benchmark.
Browse commercial NLP / Text → Visit original source ↗Natural Questions (NQ) was built by Google Research from real anonymised queries issued to Google Search. Each example pairs a question with a full Wikipedia article; annotators mark a long answer (a containing paragraph / table) and a short answer (extracted span or yes/no) when present. Its realistic-query distribution and open-domain formulation made it the de-facto benchmark for retrieval-augmented QA systems.
LQS is our 7-dimension quality score, computed from the dataset's published statistics. See methodology →
Composite score computed from the 7 dimensions below: completeness, uniqueness, validation health, size adequacy, format compliance, label density, and class balance.
Common tasks and benchmarks where Natural Questions — Open-Domain QA is the default or competitive choice.
What's actually in the dataset — from the maintainer's published stats.
Natural Questions — Open-Domain QA is distributed under CC BY-SA 3.0. This is a third-party public dataset; LabelSets indexes and scores it but does not host or redistribute the data. Always verify current license terms with the maintainer before commercial use.
LabelSets sellers offer paid nlp / text datasets with what public datasets often can't give you:
Other entries in the NLP / Text catalog.