256 GB of legal text — court opinions, contracts, statutes, and regulatory filings.
Browse commercial Legal → Visit original source ↗Pile of Law is a 256 GB corpus of English legal and administrative text from Stanford CRFM. It includes federal/state court opinions, SEC filings, administrative agency rulings, the Code of Federal Regulations, state statutes, and dozens of other sources. Designed as a domain-specific pretraining corpus for legal language models.
LQS is our 7-dimension quality score, computed from the dataset's published statistics. See methodology →
Composite score computed from the 7 dimensions below: completeness, uniqueness, validation health, size adequacy, format compliance, label density, and class balance.
Common tasks and benchmarks where Pile of Law is the default or competitive choice.
What's actually in the dataset — from the maintainer's published stats.
Pile of Law is distributed under CC BY-NC-SA 4.0. This is a third-party public dataset; LabelSets indexes and scores it but does not host or redistribute the data. Always verify current license terms with the maintainer before commercial use.
LabelSets sellers offer paid legal datasets with what public datasets often can't give you:
Other entries in the Legal catalog.