👁 Curated Catalog · Computer Vision

MNIST

70,000 handwritten digits — the canonical intro-ML benchmark.

LQS 83 · gold ✓ Commercial OK 70K images 50 MB Binary · JPG Released 1998

Browse commercial Computer Vision → Visit original source ↗

Source: yann.lecun.com · maintained by Yann LeCun et al.

About this dataset

MNIST is the most widely-used digit classification benchmark in machine learning. Curated by Yann LeCun from a subset of NIST Special Database 3 and 1, it contains 60,000 training + 10,000 test images of 28×28 handwritten digits (0–9), perfectly balanced across classes. It's the de facto 'hello world' for image classification and every ML library ships with loaders for it.

Maintainer

Yann LeCun et al.

License

Public Domain

Formats

Binary · JPG

Paper

Read on yann.lecun.com →

LabelSets Quality Score

LQS is our 7-dimension quality score, computed from the dataset's published statistics. See methodology →

out of 100

gold tier

Solid dataset with some trade-offs

Composite score computed from the 7 dimensions below: completeness, uniqueness, validation health, size adequacy, format compliance, label density, and class balance.

Completeness 92

No public completeness metric; using prior for 'research_release' datasets.

Uniqueness 68

Minimal deduplication disclosed.

Validation 92

Labels produced by domain experts or trained annotators.

Size adequacy 81

70,000 images — below 100,000 target for Computer Vision, but usable.

Format compliance 95

Industry-standard format — drop-in compatible with mainstream tooling.

Label density 52

Average 1.0 labels per item (sparse).

Class balance 90

Near-uniform class distribution.

Sample statistics

What's actually in the dataset — from the maintainer's published stats.

60K train + 10K test. 28×28 grayscale. Perfectly balanced across 10 classes (7K per class). Expert-curated from NIST handwritten samples.

License

MNIST is distributed under Public Domain. This is a third-party public dataset; LabelSets indexes and scores it but does not host or redistribute the data. Always verify current license terms with the maintainer before commercial use.

Need commercial-licensed Computer Vision data?

LabelSets sellers offer paid computer vision datasets with what public datasets often can't give you:

Explicit commercial license in writing
LQS-verified quality in your specific use-case
Instant download — no DUA, credentialed access, or research gating
PII scanned, deduplicated, and production-ready

Browse paid Computer Vision → Sell your dataset

Frequently Asked Questions

MNIST is distributed under Public Domain, which generally permits commercial use. Always verify the current license terms with the maintainer (Yann LeCun et al.) before using in a commercial product.

MNIST contains 70,000 images. 60K train + 10K test. 28×28 grayscale. Perfectly balanced across 10 classes (7K per class). Expert-curated from NIST handwritten samples.

MNIST is maintained by Yann LeCun et al. and is available at http://yann.lecun.com/exdb/mnist/. LabelSets indexes and scores this dataset for discoverability but does not redistribute it.

LQS is a 7-dimension quality score (completeness, uniqueness, validation, size adequacy, format compliance, label density, class balance) computed from the dataset's published statistics. Composite scores map to tiers: platinum (≥90), gold (≥75), silver (≥60), bronze (<60). Read the full methodology.

MNIST

About this dataset

LabelSets Quality Score

Solid dataset with some trade-offs

What it's used for

Sample statistics

License

Need commercial-licensed Computer Vision data?

Similar public datasets

Frequently Asked Questions