108K images with dense scene graph annotations — 5.4M region descriptions.
Browse commercial Computer Vision → Visit original source ↗Visual Genome is Stanford's scene graph dataset. 108K images annotated with 5.4M region descriptions, 1.7M question-answer pairs, 3.8M object instances, 2.3M relationships, and 2.8M attributes. It connects structured image concepts to natural language and powers visual question answering research.
LQS is our 7-dimension quality score, computed from the dataset's published statistics. See methodology →
Composite score computed from the 7 dimensions below: completeness, uniqueness, validation health, size adequacy, format compliance, label density, and class balance.
Common tasks and benchmarks where Visual Genome is the default or competitive choice.
What's actually in the dataset — from the maintainer's published stats.
Visual Genome is distributed under CC BY 4.0. This is a third-party public dataset; LabelSets indexes and scores it but does not host or redistribute the data. Always verify current license terms with the maintainer before commercial use.
LabelSets sellers offer paid computer vision datasets with what public datasets often can't give you:
Other entries in the Computer Vision catalog.