Audio Wave
A large scale audio-visual dataset of human speech

7,000 +


VoxCeleb contains speech from speakers spanning a wide range of different ethnicities, accents, professions and ages.


Utterance Lengths

1 million +


All speaking face-tracks are captured "in the wild", with background chatter, laughter, overlapping speech, pose variation and different lighting conditions.


Gender Distribution

2,000 +


VoxCeleb consists of both audio and video. Each segment is at least 3 seconds long.


Nationality Distribution


If you would like to download the audio-visual dataset, please fill this form to request a password.

URLs and timestamps

We provide URLs for each YouTube video and timestamps for utterances. The frame number provided assumes that the video is saved at 25fps.

File MD5 Checksum
Dev Download 9c3b51e34038d1bdb2174dcc66543267
Test Download 8e06592a5f604e23e8cd10f421b36cc3

Audio files
File MD5 Checksum
Dev A Download e395d020928bc15670b570a21695ed96
Dev B Download bbfaaccefab65d82b21903e81a8a8020
Dev C Download 017d579a2a96a077f40042ec33e51512
Dev D Download 7bb1e9f70fddc7a678fa998ea8b3ba19
Dev ae63e55b951748cc486645f532ba230b
Test Download 185fdc63c3c739954633d50379a3d102

Download all parts and concatenate the files using the command cat vox1_dev* >


Full names, nationality and gender labels for all the speakers in the dataset can be downloaded from here.

Identity metadata
Dataset split for identification

List of trial pairs - VoxCeleb1
List of trial pairs - VoxCeleb1 (cleaned)
List of trial pairs - VoxCeleb1-H
List of trial pairs - VoxCeleb1-H (cleaned)
List of trial pairs - VoxCeleb1-E
List of trial pairs - VoxCeleb1-E (cleaned)

VoxCeleb1-E and VoxCeleb1-H lists are drawn from the VoxCeleb1 training set. Therefore you cannot use any files in VoxCeleb1 for training if you are using these lists for testing.

URLs and timestamps
File MD5 Checksum
Dev Download 0e7a9f083c4efc27982f748f5f0b540a
Test Download f305b5347c9c45362b7c838b561cea7d

Audio files
File MD5 Checksum
Dev A Download da070494c573e5c0564b1d11c3b20577
Dev B Download 17fe6dab2b32b48abaf1676429cdd06f
Dev C Download 1de58e086c5edf63625af1cb6d831528
Dev D Download 5a043eb03e15c5a918ee6a52aad477f9
Dev E Download cea401b624983e2d0b2a87fb5d59aa60
Dev F Download fc886d9ba90ab88e7880ee98effd6ae9
Dev G Download d160ecc3f6ee3eed54d55349531cb42e
Dev H Download 6b84a81b9af72a9d9eecbb3b1f602e65
Dev bbc063c46078a602ca71605645c2a402
Test Download 0d2b3ea430a821c33263b5ea37ede312

Download all parts and concatenate the files using the command cat vox2_dev_aac* >

Video files
File MD5 Checksum
Dev A Download 29f8121f15b716852a6f1b1508c6952a
Dev B Download a8e0b0b3392a542a1ae549f82db38096
Dev C Download 684b280a0b48596741b0c94b9291ba61
Dev D Download 4e901486c86b155c9c19c0c1a023a2a3
Dev E Download f74e9d95a062d69d0c954e6aa89c1868
Dev F Download d0806787088aeae093f6099d19e2d910
Dev G Download b804c96ae90b6b776307a25435d689af
Dev H Download a85abd0c92e575748fedf586bb2f3841
Dev I Download e8d60e38a25f61986c3fcfd3ef23918e
Dev ee7e132c58ed112e2f4f83b4ac1e2aea
Test Download b3c555e7a67eb9032640b6209b5c053f

Download all parts and concatenate the files using the command cat vox2_dev_mp4* >

Identity metadata


The VoxCeleb dataset is available to download for commercial/research purposes under a Creative Commons Attribution 4.0 International License. The copyright remains with the original owners of the video. A complete version of the license can be found here.

Caution: We note that the distribution of identities in the VoxCeleb datasets may not be representative of the global human population. Please be careful of unintended societal, gender, racial and other biases when training or deploying models trained on this data.

Please contact the authors below if you have any queries regarding the dataset.


Please cite the following if you make use of the dataset.

  • VoxCeleb: a large-scale speaker identification dataset
    A. Nagrani*, J. S. Chung*, A. Zisserman
    Interspeech, 2017

  • VoxCeleb2: Deep Speaker Recognition
    J. S. Chung*, A. Nagrani*, A. Zisserman
    Interspeech, 2018

  • VoxCeleb: Large-scale speaker verification in the wild
    A. Nagrani*, J. S. Chung*, W. Xie, A. Zisserman
    Computer Speech and Language, 2019