About LabelSets

Built so your audit team never has to take your word for it.

LabelSets is a small operation focused on one thing: making AI training-data quality cryptographically verifiable so it stops being a trust exercise. The LQS standard, the marketplace, the SDK, the open methodology — all of it serves that one goal.

Why LabelSets exists

Procurement of AI training data was stuck on trust me bro.

Every other category has a third-party rating you can cite in regulated paperwork. Bonds have Moody's. Cyber has SOC 2. Cloud has ISO 27001. AI training data had a README and a vibe check. We built the rating that fills the gap.

Methodology has to be open.

Closed scoring is a black box no auditor accepts. The LQS specification, the 19 dimensions, the oracle math, and the conformal-prediction layer are all published. The cert is signed; you can verify it offline against our public key without ever calling our API.

Honest abstention beats false confidence.

If we can't justify a number, we don't make one up. Conformal intervals widen when calibration data is thin. The license suggester refuses to suggest when blockers are present. We'd rather say "we don't know yet" than burn trust the first time a claim is wrong.

The data is the moat — not the algorithm.

Algorithms get copied in 6 weeks. The Outcome Registry — every buyer's downstream eval result tied back to the dataset and signed cert — compounds for years. We're building Moody's: not the smartest, the most-data-in-one-place.

No scheduled calls. No drip campaigns.

If you're a procurement team or auditor, every artifact you'd need is on this site, public, no signup. The SDK runs offline cert verification. The methodology paper is downloadable. Reach us by email; we reply within a business day, in writing your audit team can quote.

What we've built

Public timeline.

Anything we ship is dated and citable. No "coming soon" boxes that never resolve.

2026-04 · v3.1

Procurement-grade LQS

Multi-oracle consensus, contamination grading, conformal prediction intervals, signed-cert revocation registry, open methodology whitepaper. 19 dimensions; 7 oracles; Ed25519 signature verifiable offline.

2026-04 · Outcome Registry

Buyer-outcome registry & cosign certs

SDK + API for buyers to register downstream model-eval outcomes against the dataset's LQS cert. Cosigns are independently Ed25519-signed and verifiable. The registry compounds — public bands publish at N≥3 reports per (model × task × metric) bucket.

2026-04 · Auto-attestation

License suggester + gaming firewall

One-click license attestation derived from existing originality / watermark / contamination / PII signals. Hard blockers when signals point at theft. Private-oracle layer cross-checks public scores; disagreement gap >25 flags for review.

2026-04 · Marketplace

Public marketplace + curated catalog

Three flagship datasets live (legal · financial · clinical). 79-dataset curated catalog of public ML datasets with LQS scores. Stripe-backed instant download; 85% seller revenue share.