Multiple-choice benchmark across 57 academic and professional subjects — the standard LLM knowledge eval.
Browse commercial NLP / Text → Visit original source ↗MMLU is the dominant zero/few-shot knowledge benchmark for large language models. It covers 57 subjects spanning STEM, humanities, social sciences, law, medicine, and more, in a single four-option multiple-choice format. Released by Hendrycks et al. in 2020 and reported in nearly every frontier-model system card since. Commonly cited alongside HellaSwag and ARC as part of the LLM leaderboard 'big three.'
LQS is our 7-dimension quality score, computed from the dataset's published statistics. See methodology →
Composite score computed from the 7 dimensions below: completeness, uniqueness, validation health, size adequacy, format compliance, label density, and class balance.
Common tasks and benchmarks where MMLU — Massive Multitask Language Understanding is the default or competitive choice.
What's actually in the dataset — from the maintainer's published stats.
MMLU — Massive Multitask Language Understanding is distributed under MIT. This is a third-party public dataset; LabelSets indexes and scores it but does not host or redistribute the data. Always verify current license terms with the maintainer before commercial use.
LabelSets sellers offer paid nlp / text datasets with what public datasets often can't give you:
Other entries in the NLP / Text catalog.