Open source roadmap · MIT license · Q4 2026

The LQS scorer is going open source.

In Q4 2026 we'll publish the file-based 19-dimension scorer, calibrated weights, and full calibration corpus at github.com/labelsets/lqs-scorer under MIT license. You'll be able to run the same scorer on your air-gapped data and verify our published scores reproduce bit-for-bit. The methodology is the moat; closing it doesn't help us.

Repo preview

What you'll get when it lands.

github.com/labelsets/lqs-scorer ⏳ launching Q4 2026

# LabelSets LQS Scorer

The 19-dimension training-data quality scorer used by labelsets.ai. Open methodology, reproducible weights, MIT license.

Install

$ pip install labelsets-lqs
# or via Docker — air-gapped tenancies
$ docker pull labelsets/lqs-scorer:v3.1

Score a dataset

# Python
from labelsets_lqs import Scorer

scorer = Scorer.from_pretrained('lqs-v3.1-public')
result = scorer.score('./my-dataset.parquet')

print(result.composite)        # 87.4
print(result.dims.label_quality.score)  # 91.2 (CI: 89.4–93.0)
print(result.cert_payload)      # dict ready for signing

Verify our published scores

Every cert at labelsets.ai/verify includes a cert_hash. Run the OSS scorer on the same dataset, sign with our public key, and confirm the hash matches. If it doesn't, the published score is wrong — file an issue and we'll investigate.

What's in the box

  • File-based 19-dim scorer (Python + Rust hot paths)
  • Calibrated weights for v3.1 + version pin
  • Full calibration corpus (1,000+ datasets, growing)
  • Oracle registry — 7 oracles, 5 algorithm families
  • Contamination check vs 40+ public benchmarks
  • Cert generator (canonical JSON + Ed25519 sign hook)
  • CLI: labelsets-lqs score ./dataset --out cert.json

What's not in the box (yet)

  • The marketplace pricing model (commercial, separate)
  • The hosted cert verifier API (commercial, separate)
  • Custom calibration corpora for regulated domains (enterprise tier)

License

MIT. Use it commercially. Modify it. Redistribute it. We just ask that derivative works are clear about which version of the scorer they're using so reproducibility holds.

Roadmap

Where we are. Where we're going.

Closing the loop on the OSS launch. Each milestone has an owner and a date.

✓ Done · 2026-Q1
v3.1 scorer in production
147 marketplace datasets scored. Internal Python implementation, calibrated against downstream F1.
In flight · Q2 2026
arXiv methodology paper
Draft + abstract done. Co-author review in progress. Will lock the v3.1 spec for OSS release.
Q3 2026
Code freeze + license review
Extract scorer from monorepo. MIT vs Apache 2 decision. Legal sign-off on calibration corpus redistribution.
Q4 2026
Public release
v3.1 frozen. Repo public. PyPI package live. Docker image on docker.io. Reproducibility test suite published.
Email me on launch

Be there day one.

No drip campaign. No newsletter. One email when the repo goes public, then we go away. If you're building infra in regulated AI, you'll want to be in the first cohort to run reproducibility tests against our published scores.