Training data you can cite in audit paperwork

What you get

Four artifacts on every purchase, not a sales deck.

Price includes the raw data, the signed cert, the quality report, and the registry entry. Nothing you pay extra for. Nothing that requires a call.

Ed25519-signed procurement cert

Cryptographically signed certificate containing the 19-dim quality breakdown, provenance chain, license text, and revocation ID. Verifiable offline against our public key aa4c070af907e2ea.

19-dim LQS v3.1 quality report

Structural integrity, annotation quality, statistical health, training fitness, provenance, subgroup equity, oracle agreement — each with a per-dim confidence interval you can drop into a model validation package.

Benchmark-contamination clean flags

Every dataset is cross-checked against 40+ public evaluation sets. The cert lists per-benchmark contamination flags, so you know what you can train on before you burn compute on a leaked eval.

Revocation registry monitoring

Cert revoked post-release because of newly-discovered contamination or a provenance dispute? Your CI polls the public registry by cert ID. Failure modes discovered a year from now don't require a support ticket.

Documented commercial license

Perpetual commercial license, attached to the receipt. No "check each dataset's README" workflow. Legal can sign off before you schedule training time.

SDK + GitHub Action + verify CLI

pip install labelsets. One-line loader, cert-verify step in CI, GitHub Action for release gates. Same ergonomics as datasets.

Procurement format

What your risk team actually drops into the file.

Every cert fits the shape of a standard vendor-evidence line item. Copy/paste into your SR 11-7 validation template or EU AI Act Art. 10 data-governance filing.

Example — model validation evidence line

Training data source: LabelSets — Legal Reasoning Dataset v2 Cert ID: ls-cert-7f3a1c9e-2026-04 Signing key: Ed25519 aa4c070af907e2ea LQS v3.1 score: 87 (CI 84.2 – 89.8) Oracle agreement: 0.91 (strong) Contamination-clean: 40/40 public benchmarks verified Revocation status: ACTIVE — polled 2026-04-23T09:15Z License: Perpetual commercial, documented in receipt Regulatory mapping: SR 11-7 §IV · EU AI Act Art. 10 §2(a-h) · 21 CFR 11 §11.10

Format above is illustrative. Every field on the right is a real cert field; every field on the left maps to a standard validation-package section. We can provide an editable template.

Training data you can cite in audit paperwork.