Working paper · v3.1 · arXiv submission Q2 2026

LQS v3.1: A Procurement-Grade Quality Standard for AI Training Data with Cryptographically Verifiable Certificates

Authors: LabelSets Research  ·  Date: April 2026  ·  License: CC BY 4.0  ·  Pages: ~20
Principal author: identity disclosed under NDA — pending counsel review of employment-contract scope. Direct contact via /pilot.

Abstract

Procurement of training data for AI systems in regulated industries (financial services, healthcare, legal) currently lacks an independent quality measurement that satisfies model-risk audit requirements such as SR 11-7, EU AI Act Article 10, FDA 21 CFR 11.10(e), and HHS §1557.

We introduce LQS v3.1, a 19-dimension quality standard for tabular, text, and image datasets that addresses three documented weaknesses of existing single-model quality scores: (1) reference-model bias via a 7-oracle consensus across 5 algorithm families with cross-validated agreement reporting (Cohen and Fleiss κ); (2) brittleness of metadata-derived task inference via a data-driven task detection layer with explicit ambiguity flagging; (3) over-confidence of point estimates via Wilson binomial intervals on rate-based dimensions, pooled-fold standard deviation on oracle-derived dimensions, and bootstrap-derived intervals on the composite.

We add inductive split-conformal prediction (Vovk 2005, Romano 2019) producing 90% prediction intervals on downstream macro-F1 with provable coverage guarantees, and a graded benchmark-contamination dimension covering 40+ public evaluation suites (MMLU, HumanEval, GSM8K, SQuAD, etc.). Every score is bound to a canonical-JSON-serialized payload and signed with an Ed25519 keypair, producing a cryptographically verifiable certificate auditable offline against a published public key.

We provide a reference implementation, a public verification API, and an SDK with no-auth verification helpers. The full LQS v3.1 specification is presented as a candidate reference methodology for ongoing standards work in IEEE P2841, NIST AI RMF, and ISO/IEC JTC 1 SC 42.

Figure 1 · Representative profile

The 19 dimensions, on one chart

Each axis is a quality dimension scored 0–100. Composite is a weighted aggregate of the 19 axes, with bootstrap intervals on the composite and per-dimension Wilson or pooled-fold intervals on the spokes.

A single weak spoke is the kind of failure model-risk auditors look for. A composite of 91 with a sub-50 contamination axis is a different risk profile than a composite of 91 with no weak spokes — and the radar is the only view that surfaces it at a glance.

Sample shown: Platinum-tier legal-corpus profile · v3.1 spec

Keywords

dataset quality, multi-oracle consensus, confidence intervals, contamination detection, scaling laws, adversarial robustness, fairness, cryptographic certificates, procurement-grade ML

Cite this paper

@misc{labelsets2026lqsv31,
  title  = {LQS v3.1: A Procurement-Grade Quality Standard for
            AI Training Data with Cryptographically Verifiable
            Certificates},
  author = {{LabelSets Research}},
  year   = {2026},
  month  = {April},
  url    = {https://labelsets.ai/paper.pdf},
  note   = {Reference implementation: labelsets.ai. Principal author
            identity disclosed under NDA pending counsel review.}
}

Reference implementation

The full reference implementation is deployed at labelsets.ai. The public verification API accepts any LQS certificate hash and returns the signed payload plus signature validation:

GET /api/verify-lqs-cert/:hash

Pilot program · 5–10 design partners

Putting LQS into your model-risk workflow

Reading this paper because you're evaluating training-data quality for an SR 11-7, EU AI Act, FDA, or §1557 model package? We're picking 5–10 design partners in regulated industries for 6 months of LQS Enterprise — free, in exchange for a logo and a short case study.

Apply for the pilot → Two-line application · No demo · No sales call