Python SDK — LabelSets LQS

Install

One command. Then you're scoring.

The SDK ships with offline scoring (local files), hosted scoring (HF/Zenodo URLs), cert verification, and a CLI. No API key required for the free tier — 100 scores/month.

pip · PyPI · python ≥ 3.9

$pip install labelsets

Also available as npx @labelsets/lqs-cli for Node. Docker image labelsets/lqs:latest for air-gapped tenancies.

Notebook walkthrough

A Jupyter cell that actually does something.

Every ML engineer's home. Below is what the SDK looks like in a real notebook. Copy the install above, paste the cells, run them against your own parquet/JSONL file.

score_my_dataset.ipynb — labelsets v3.1 Python 3.11 · jupyter

1. Score a local dataset

Point the scorer at a parquet / JSONL / CSV / HDF5 file. Returns composite + per-dim breakdown + a signed cert envelope ready to embed anywhere.

In [1]

from labelsets import Scorer scorer = Scorer.from_pretrained('lqs-v3.1-public') result = scorer.score('./my_dataset.parquet') print(f"LQS composite: {result.composite}") print(f"Tier: {result.tier} · confidence {result.confidence}") print(f"Contamination clean: {result.dims.contamination_clean.score}")

Out [1]

LQS composite: 87.4 Tier: gold · confidence 0.91 Contamination clean: 94.2

2. Score a public URL (no download)

Pass a HuggingFace or Zenodo URL — the scorer hits the public metadata API, derives signals, and signs a proxy cert. No file download required.

In [2]

result = scorer.score_url('https://huggingface.co/datasets/openai/gsm8k') print(result.composite, result.tier, result.confidence) # → 81 gold 0.4 (proxy cert — metadata-only)

3. Inspect every dimension

All 19 dimensions with 95% CIs + the signal that produced each score. Makes "why did it score that?" a 2-line answer.

In [3]

import pandas as pd df = pd.DataFrame([ {'dim': d.name, 'score': d.score, 'ci_low': d.ci.low, 'ci_high': d.ci.high} for d in result.dims ]) df.sort_values('score', ascending=False)

Out [3]

dim score ci_low ci_high 0 label_quality 93.1 91.2 94.8 1 schema_integrity 98.0 96.5 99.2 2 oracle_agreement 91.0 88.4 93.1 3 contamination_clean 94.2 92.1 95.8 4 downstream_headroom 88.4 85.9 90.7 ... (15 more rows)

4. Sign + verify offline

Same Ed25519 scheme as the marketplace. Sign with your own key for private-mode enterprise scoring; verify against any public key offline — no LabelSets server in the trust chain.

In [4]

# Sign with OUR production key (hosted SaaS path) cert = result.sign(backend='labelsets-hosted') # OR sign with your own key (enterprise private mode) cert = result.sign(private_key=open('my_ed25519.pem').read()) # Verify offline — no server contact from labelsets import verify_offline valid = verify_offline(cert, public_key=open('labelsets_pk.pem').read()) print(valid) # True / False / "revoked"

5. Log to W&B / MLflow alongside every training run

Gets LQS on your experiment dashboard as a first-class metric.

In [5]

import wandb wandb.log({ 'lqs.composite': result.composite, 'lqs.tier': result.tier, 'lqs.cert_hash': cert.cert_hash, # permalinks to labelsets.ai/verify?hash=... 'lqs.contamination': result.dims.contamination_clean.score, })

What's in the box

Six things the SDK does out of the box.

Local file scoring

Point at a parquet, JSONL, CSV, HDF5, or DICOM file. Runs the full 19-dimension scorer offline. Works in air-gapped environments with --offline.

URL scoring (HF / Zenodo)

Pass a public URL, get the proxy cert. Same endpoint as the live homepage demo. Confidence 0.4 (metadata-derived) by design.

Cert signing + verification

Sign with LabelSets' hosted key OR bring your own Ed25519 key (enterprise private-mode). Verify any cert offline against any public key.

CLI tool

Ships with a lqs command. lqs score ./data.parquet --out cert.json. Use in CI pipelines or cron jobs.

W&B + MLflow integration

One-line helper: lqs.log_to_wandb(result) or lqs.log_to_mlflow(result). LQS becomes a first-class training metric on your existing dashboard.

Contamination-only mode

Check if your data overlaps with MMLU / HumanEval / HellaSwag / GSM8k / 36 more benchmarks. lqs.contamination(data) returns a per-benchmark overlap rate. Worth the install price alone.

CI integration · 6-line yaml

GitHub Action — auto-score on every commit.

Drop this into .github/workflows/lqs.yml. Every commit that touches a dataset file auto-scores and posts the cert hash as a PR comment with the embed badge.

.github/workflows/lqs.yml Q2 2026

yml

name: LQS Quality Score on: [push, pull_request] jobs: score: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - uses: labelsets/lqs-action@v1 with: dataset-path: ./data/ post-pr-comment: true fail-if-below: 75

PR comment on every push: LQS 87.4 · gold · ✓ signed with the embed-ready badge. Procurement-grade evidence for your model-risk team, generated by your CI.

`pip install labelsets`
Score a dataset. Three lines.

One command. Then you're scoring.

A Jupyter cell that actually does something.

1. Score a local dataset

2. Score a public URL (no download)

3. Inspect every dimension

4. Sign + verify offline

5. Log to W&B / MLflow alongside every training run

Six things the SDK does out of the box.

GitHub Action — auto-score on every commit.

Get the SDK on day one.

pip install labelsets Score a dataset. Three lines.

One command. Then you're scoring.

A Jupyter cell that actually does something.

1. Score a local dataset

2. Score a public URL (no download)

3. Inspect every dimension

4. Sign + verify offline

5. Log to W&B / MLflow alongside every training run

Six things the SDK does out of the box.

GitHub Action — auto-score on every commit.

Get the SDK on day one.

`pip install labelsets`
Score a dataset. Three lines.