Open methodology

LQS Index — how constituents get picked, by rule, not opinion.

Every LQS Index is a deterministic selection on top of the published LQS v3.1 score. No editorial committee. No paid placement. The rules below are machine-readable and applied identically across all datasets each rebalance.

1. The underlying score

Every constituent's LQS composite is computed by the LQS v3.1 scorer — 19 dimensions, multi-oracle consensus across 7 algorithm families, contamination scoring against 40+ public benchmarks, and Wilson + bootstrap confidence intervals. Full methodology lives at /lqs-methodology. The scorer outputs a number in [0, 100] with a 95% CI.

Indices use only the point estimate. Tie-breaking on equal composite uses lower CI bound, then alphabetical title.

2. Inaugural ticker rules

Each ticker is parameterized by a JSON selection rule stored in public.lqs_index.selection_rule. The rules are applied verbatim by tools/lqs-index-build.js on every rebalance.

TickerRule
LQS-FINANCE-TOP25 Top 25 marketplace datasets in category=Financial / Crypto, ranked by composite, min 60.
LQS-NLP-TOP25 Top 25 marketplace datasets in category=NLP / Text, ranked by composite, min 60.
LQS-MEDICAL-TOP10 Top 10 marketplace datasets in category=Medical Imaging, ranked by composite, min 60.
LQS-VERIFIED-PLATINUM Open-membership: every dataset (marketplace or catalog) at composite ≥ 90. Marketplace constituents must additionally have a current LabelSets-issued v3.1 cert.
LQS-PROCUREMENT-GRADE Open-membership: marketplace datasets at composite ≥ 75 with a valid v3.1 cert, oracle_agreement ≥ 70 (where measured), and contamination_clean ≥ 80 (where measured).
LQS-MARKETPLACE-CORE-50 Top 50 across the entire marketplace, no category filter. The cross-cutting broad-market benchmark.
LQS-CATALOG-GLOBAL-100 Top 100 across the public catalog of audited HuggingFace and Kaggle datasets. Tracks the public ML data landscape rather than the marketplace specifically.

3. Index composite

An index's reported composite is the equal-weighted mean of constituent composites. We chose equal-weighting deliberately: weighting by dataset size (item_count) would pull the index toward whichever vendor uploaded a 10M-row dump that morning. Weighting by price would create a circular incentive. Equal-weighting forces every constituent to earn its slot by quality alone.

4. Rebalance cadence

Default cadence is weekly. On a rebalance:

  • Every dataset's current LQS composite is read fresh.
  • Selection rules are reapplied. Constituents that no longer meet the rule are marked removed_at = now() in public.lqs_index_constituents (records are append-only — never deleted).
  • New constituents are inserted with a fresh added_at.
  • The index's headline composite is recomputed and snapshotted to public.lqs_index_history.

Rebalances run via tools/lqs-index-build.js. The script is idempotent — running it twice produces no diff.

5. Conflict-of-interest controls

  • No paid placement. A dataset's LQS composite cannot be raised by purchase, advertising, partnership, or seller tier. The scorer reads files; sellers cannot edit it.
  • LabelSets-uploaded datasets are eligible only when they pass the same scorer with the same code path as third-party uploads. Internal flags do not influence eligibility.
  • Datasets owned by LabelSets employees are flagged as such on their dataset page. Employee datasets remain eligible if and only if they meet the rule on the same terms.
  • Procurement-grade tickers require a third-party verifiable cert (Ed25519 signature, public key fingerprint aa4c070af907e2ea). Anyone with the public key can verify the cert offline. This means index membership is independently auditable.

6. Public API

The index is queryable directly. No auth, no rate limits beyond the standard infra protections, JSON output:

# All tickers, current state
GET /api/lqs-index

# Single ticker with full constituent list and selection rule
GET /api/lqs-index/LQS-FINANCE-TOP25

# Daily history (composite, n_constituents, additions/removals)
GET /api/lqs-index/LQS-FINANCE-TOP25/history

7. Versioning & changes

This methodology document is versioned. Substantive changes — new tickers, changes to selection rules, changes to the rebalance cadence — are announced on the blog with at least 14 days of notice before rebalance, and recorded in the audit log.

Bug fixes (e.g. typos, code path changes that don't affect outputs) take effect immediately and are noted in the changelog without a notice period.

8. License & attribution

The methodology, the selection rules, and the public API output are released under CC-BY-4.0. Cite as: "LabelSets LQS Index, methodology v1.0, retrieved {{date}}."

The cert authority Ed25519 keypair fingerprint is aa4c070af907e2ea. Public key at /api/lqs-public-key.