Dataset leaderboard · ranked by LQS

79 public ML datasets, ranked.

Every major public training dataset that matters for AI, scored across all 19 LQS dimensions. COCO leads computer vision. MIMIC-IV tops clinical. The Pile + Common Crawl sit where you'd expect — in the noise tail. Filter by domain, sort any column, click any row for the full 19-dim breakdown.

Jump to leaderboard ↓ Score your own dataset →

Podium

The top three datasets by LQS.

Full ranking · 79 datasets

Filter, sort, expand.

Click any row to see the full 19-dimension breakdown. Use the chips to filter by domain. Every score is computed by the LQS v3.1 scorer with 95% confidence intervals and published rationales.

All domains

Rank

Dataset

Domain

LQS · tier

Items

License

Loading…

Score yours

Don't see your dataset? Score it now.

Paste any HuggingFace or Zenodo URL on the homepage. Get a signed LQS cert in under 1 second. Embed the badge. Ship it.

Score live → Upload for full scoring