Home·Curated Catalog·Computer Vision
👁 Curated Catalog · Computer Vision

Visual Genome

108K images with dense scene graph annotations — 5.4M region descriptions.

LQS 88 · gold ✓ Commercial OK 5.4M region descriptions 15 GB JSON · JPG Released 2016
Browse commercial Computer Vision → Visit original source ↗
Source: visualgenome.org · maintained by Stanford Vision Lab
5.4M
region descriptions
15 GB
Size on disk
88
LQS · gold
2016
First released

About this dataset

Visual Genome is Stanford's scene graph dataset. 108K images annotated with 5.4M region descriptions, 1.7M question-answer pairs, 3.8M object instances, 2.3M relationships, and 2.8M attributes. It connects structured image concepts to natural language and powers visual question answering research.

Maintainer
License
Formats
JSON · JPG

LabelSets Quality Score

LQS is our 7-dimension quality score, computed from the dataset's published statistics. See methodology →

88
out of 100
gold tier

High-quality dataset across most dimensions

Composite score computed from the 7 dimensions below: completeness, uniqueness, validation health, size adequacy, format compliance, label density, and class balance.

Completeness 88
No public completeness metric; using prior for 'crowdsourced_qc' datasets.
Uniqueness 93
Exact-hash deduplication documented by maintainer.
Validation 82
Crowdsourced labels with quality-control protocol (redundancy, golden tests).
Size adequacy 95
5,400,000 images — exceeds 100,000 adequacy target for Computer Vision.
Format compliance 95
Industry-standard format — drop-in compatible with mainstream tooling.
Label density 100
Average 50.0 labels per item (high density).
Class balance 58
Long-tail distribution — dominant classes overrepresented.

What it's used for

Common tasks and benchmarks where Visual Genome is the default or competitive choice.

Sample statistics

What's actually in the dataset — from the maintainer's published stats.

108K images, 5.4M region descriptions, 1.7M Q&A pairs, 3.8M object instances, 2.3M relationships, 2.8M attributes.

License

Visual Genome is distributed under CC BY 4.0. This is a third-party public dataset; LabelSets indexes and scores it but does not host or redistribute the data. Always verify current license terms with the maintainer before commercial use.

Need commercial-licensed Computer Vision data?

LabelSets sellers offer paid computer vision datasets with what public datasets often can't give you:

Browse paid Computer Vision → Sell your dataset

Similar public datasets

Other entries in the Computer Vision catalog.

Frequently Asked Questions

Visual Genome is distributed under CC BY 4.0, which generally permits commercial use. Always verify the current license terms with the maintainer (Stanford Vision Lab) before using in a commercial product.
Visual Genome contains 5,400,000 region descriptions. 108K images, 5.4M region descriptions, 1.7M Q&A pairs, 3.8M object instances, 2.3M relationships, 2.8M attributes.
Visual Genome is maintained by Stanford Vision Lab and is available at https://visualgenome.org/api/v0/api_home.html. LabelSets indexes and scores this dataset for discoverability but does not redistribute it.
LQS is a 7-dimension quality score (completeness, uniqueness, validation, size adequacy, format compliance, label density, class balance) computed from the dataset's published statistics. Composite scores map to tiers: platinum (≥90), gold (≥75), silver (≥60), bronze (<60). Read the full methodology.