9 NLU tasks bundled as the industry-standard fine-tuning benchmark.
Browse commercial NLP / Text → Visit original source ↗GLUE (General Language Understanding Evaluation) from NYU bundles 9 natural language understanding tasks — SST-2, MNLI, QNLI, QQP, RTE, CoLA, STS-B, MRPC, WNLI — into a single benchmark for fine-tuning and evaluating pretrained language models. Superseded by SuperGLUE for top models, but still widely reported.
LQS is our 7-dimension quality score, computed from the dataset's published statistics. See methodology →
Composite score computed from the 7 dimensions below: completeness, uniqueness, validation health, size adequacy, format compliance, label density, and class balance.
Common tasks and benchmarks where GLUE Benchmark is the default or competitive choice.
What's actually in the dataset — from the maintainer's published stats.
GLUE Benchmark is distributed under Various per-task (mostly CC BY / MIT). This is a third-party public dataset; LabelSets indexes and scores it but does not host or redistribute the data. Always verify current license terms with the maintainer before commercial use.
LabelSets sellers offer paid nlp / text datasets with what public datasets often can't give you:
Other entries in the NLP / Text catalog.