For buyers · procurement + model-risk

Training data you can cite in audit paperwork.

Every dataset ships with an Ed25519-signed cert, a 19-dim LQS quality report, contamination-clean flags against 40+ public benchmarks, and a revocation ID your CI can poll. Built for SR 11-7, EU AI Act Art. 10, and 21 CFR 11 documentation your risk team already files.

40+
Datasets indexed
87
Avg LQS v3.1
100%
Benchmark-contamination checked
100%
Signed-cert coverage
What you get

Four artifacts on every purchase, not a sales deck.

Price includes the raw data, the signed cert, the quality report, and the registry entry. Nothing you pay extra for. Nothing that requires a call.

Ed25519-signed procurement cert

Cryptographically signed certificate containing the 19-dim quality breakdown, provenance chain, license text, and revocation ID. Verifiable offline against our public key aa4c070af907e2ea.

19-dim LQS v3.1 quality report

Structural integrity, annotation quality, statistical health, training fitness, provenance, subgroup equity, oracle agreement — each with a per-dim confidence interval you can drop into a model validation package.

Benchmark-contamination clean flags

Every dataset is cross-checked against 40+ public evaluation sets. The cert lists per-benchmark contamination flags, so you know what you can train on before you burn compute on a leaked eval.

Revocation registry monitoring

Cert revoked post-release because of newly-discovered contamination or a provenance dispute? Your CI polls the public registry by cert ID. Failure modes discovered a year from now don't require a support ticket.

Documented commercial license

Perpetual commercial license, attached to the receipt. No "check each dataset's README" workflow. Legal can sign off before you schedule training time.

SDK + GitHub Action + verify CLI

pip install labelsets. One-line loader, cert-verify step in CI, GitHub Action for release gates. Same ergonomics as datasets.

Cert fields map to the compliance paperwork your risk team already files
SOC 2
Type II · audit
HIPAA
BAA ready
EU AI Act
Art. 10
SR 11-7
Fed · model risk
21 CFR 11
FDA · e-records
GDPR
Art. 28 + SCCs
ECOA
Fair-lending
Procurement format

What your risk team actually drops into the file.

Every cert fits the shape of a standard vendor-evidence line item. Copy/paste into your SR 11-7 validation template or EU AI Act Art. 10 data-governance filing.

Example — model validation evidence line
Training data source: LabelSets — Legal Reasoning Dataset v2 Cert ID: ls-cert-7f3a1c9e-2026-04 Signing key: Ed25519 aa4c070af907e2ea LQS v3.1 score: 87 (CI 84.2 – 89.8) Oracle agreement: 0.91 (strong) Contamination-clean: 40/40 public benchmarks verified Revocation status: ACTIVE — polled 2026-04-23T09:15Z License: Perpetual commercial, documented in receipt Regulatory mapping: SR 11-7 §IV · EU AI Act Art. 10 §2(a-h) · 21 CFR 11 §11.10
Format above is illustrative. Every field on the right is a real cert field; every field on the left maps to a standard validation-package section. We can provide an editable template.

Browse procurement-grade datasets.

Three flagship datasets live: legal reasoning ($799), financial routing ($549), clinical reasoning ($699). Every cert verifies against our public key. Perpetual license, no subscription, no call required.