Linguistically diverse grade-school-level math word problems — the chain-of-thought reasoning benchmark.
Browse commercial NLP / Text → Visit original source ↗GSM8K is 8,500 linguistically diverse math word problems at the grade-school level, authored by human problem writers at OpenAI. Each problem requires 2–8 sequential reasoning steps and basic arithmetic. It became the canonical CoT (chain-of-thought) prompting benchmark and drove the mainstream adoption of reasoning evaluations in LLM research.
LQS is our 7-dimension quality score, computed from the dataset's published statistics. See methodology →
Composite score computed from the 7 dimensions below: completeness, uniqueness, validation health, size adequacy, format compliance, label density, and class balance.
Common tasks and benchmarks where GSM8K — Grade School Math is the default or competitive choice.
What's actually in the dataset — from the maintainer's published stats.
GSM8K — Grade School Math is distributed under MIT. This is a third-party public dataset; LabelSets indexes and scores it but does not host or redistribute the data. Always verify current license terms with the maintainer before commercial use.
LabelSets sellers offer paid nlp / text datasets with what public datasets often can't give you:
Other entries in the NLP / Text catalog.