7,000 +
speakersVoxCeleb contains speech from speakers spanning a wide range of different ethnicities, accents, professions and ages.
Utterance Lengths
1 million +
utterancesAll speaking face-tracks are captured "in the wild", with background chatter, laughter, overlapping speech, pose variation and different lighting conditions.
Gender Distribution
2,000 +
hoursVoxCeleb consists of both audio and video. Each segment is at least 3 seconds long.
Nationality Distribution
We provide URLs for each YouTube video and timestamps for utterances. The frame number provided assumes that the video is saved at 25fps.
File | MD5 Checksum | |
Dev | Download | 9c3b51e34038d1bdb2174dcc66543267 |
Test | Download | 8e06592a5f604e23e8cd10f421b36cc3 |
File | MD5 Checksum | |
Dev A | Download | e395d020928bc15670b570a21695ed96 |
Dev B | Download | bbfaaccefab65d82b21903e81a8a8020 |
Dev C | Download | 017d579a2a96a077f40042ec33e51512 |
Dev D | Download | 7bb1e9f70fddc7a678fa998ea8b3ba19 |
Dev | ae63e55b951748cc486645f532ba230b | |
Test | Download | 185fdc63c3c739954633d50379a3d102 |
Full names, nationality and gender labels for all the speakers in the dataset can be downloaded from here.
File | MD5 Checksum | |
Dev | Download | 0e7a9f083c4efc27982f748f5f0b540a |
Test | Download | f305b5347c9c45362b7c838b561cea7d |
File | MD5 Checksum | |
Dev A | Download | da070494c573e5c0564b1d11c3b20577 |
Dev B | Download | 17fe6dab2b32b48abaf1676429cdd06f |
Dev C | Download | 1de58e086c5edf63625af1cb6d831528 |
Dev D | Download | 5a043eb03e15c5a918ee6a52aad477f9 |
Dev E | Download | cea401b624983e2d0b2a87fb5d59aa60 |
Dev F | Download | fc886d9ba90ab88e7880ee98effd6ae9 |
Dev G | Download | d160ecc3f6ee3eed54d55349531cb42e |
Dev H | Download | 6b84a81b9af72a9d9eecbb3b1f602e65 |
Dev | bbc063c46078a602ca71605645c2a402 | |
Test | Download | 0d2b3ea430a821c33263b5ea37ede312 |
File | MD5 Checksum | |
Dev A | Download | 29f8121f15b716852a6f1b1508c6952a |
Dev B | Download | a8e0b0b3392a542a1ae549f82db38096 |
Dev C | Download | 684b280a0b48596741b0c94b9291ba61 |
Dev D | Download | 4e901486c86b155c9c19c0c1a023a2a3 |
Dev E | Download | f74e9d95a062d69d0c954e6aa89c1868 |
Dev F | Download | d0806787088aeae093f6099d19e2d910 |
Dev G | Download | b804c96ae90b6b776307a25435d689af |
Dev H | Download | a85abd0c92e575748fedf586bb2f3841 |
Dev I | Download | e8d60e38a25f61986c3fcfd3ef23918e |
Dev | ee7e132c58ed112e2f4f83b4ac1e2aea | |
Test | Download | b3c555e7a67eb9032640b6209b5c053f |
The VoxCeleb dataset is available to download for commercial/research purposes under a Creative Commons Attribution 4.0 International License. The copyright remains with the original owners of the video. A complete version of the license can be found here.
Caution: We note that the distribution of identities in the VoxCeleb datasets may not be representative of the global human population. Please be careful of unintended societal, gender, racial and other biases when training or deploying models trained on this data.
Please contact the authors below if you have any queries regarding the dataset.
VoxCeleb: a large-scale speaker identification dataset
A. Nagrani*, J. S. Chung*, A. Zisserman
Interspeech, 2017
PDF
VoxCeleb2: Deep Speaker Recognition
J. S. Chung*, A. Nagrani*, A. Zisserman
Interspeech, 2018
PDF
VoxCeleb: Large-scale speaker verification in the wild
A. Nagrani*, J. S. Chung*, W. Xie, A. Zisserman
Computer Speech and Language, 2019
PDF