1,000 hours of read English audiobook speech — the standard ASR benchmark.
Browse commercial Audio → Visit original source ↗LibriSpeech is a 1,000-hour corpus of English read speech derived from public-domain LibriVox audiobooks, sampled at 16kHz. Widely used as the standard ASR training and evaluation set. Split into train-clean-100, train-clean-360, train-other-500, dev-clean, dev-other, test-clean, test-other for benchmarking on clean vs. noisy conditions.
LQS is our 7-dimension quality score, computed from the dataset's published statistics. See methodology →
Composite score computed from the 7 dimensions below: completeness, uniqueness, validation health, size adequacy, format compliance, label density, and class balance.
Common tasks and benchmarks where LibriSpeech is the default or competitive choice.
What's actually in the dataset — from the maintainer's published stats.
LibriSpeech is distributed under CC BY 4.0. This is a third-party public dataset; LabelSets indexes and scores it but does not host or redistribute the data. Always verify current license terms with the maintainer before commercial use.
LabelSets sellers offer paid audio datasets with what public datasets often can't give you:
Other entries in the Audio catalog.