Self-sufficient framework for continuous sign language recognition
Y. Jang, Y. Oh, J. W. Cho, M. Kim, D. Kim, I. S. Kweon, J. S. Chung
International Conference on Acoustics, Speech, and Signal Processing
PDF
Metric learning for user-defined keyword spotting
J. Jung, Y. Kim, J. Park, Y. Lim, B. Kim, Y. Jang, J. S. Chung
International Conference on Acoustics, Speech, and Signal Processing
PDF
Hindi as a second language: improving visually grounded speech with semantically similar samples
H. Ryu, A. Senocak, I. S. Kweon, J. S. Chung
International Conference on Acoustics, Speech, and Signal Processing
PDF
MarginNCE: Robust Sound Localization with a Negative Margin
S. Park, A. Senocak, J. S. Chung
International Conference on Acoustics, Speech, and Signal Processing
PDF
Advancing the dimensionality reduction of speaker embeddings for speaker diarisation: disentangling noise and informing speech activity
Y. Kim, H. Heo, J. Jung, Y. Kwon, B. Lee, J. S. Chung
International Conference on Acoustics, Speech, and Signal Processing
PDF
In search of strong embedding extractors for speaker diarisation
J. Jung, B. Lee, J. Huh, A. Brown, Y. Kwon, S. Watanabe, J. S. Chung
International Conference on Acoustics, Speech, and Signal Processing
PDF
Imaginary Voice: Face-styled Diffusion Model for Text-to-Speech
J. Lee, J. S. Chung, S. W. Chung
International Conference on Acoustics, Speech, and Signal Processing
PDF
Signing Outside the Studio: Benchmarking Background Robustness for Continuous Sign Language Recognition
Y. Jang, Y. Oh, J. Cho, D. Kim, J. S. Chung, I. S. Kweon
British Machine Vision Conference
PDF
Augmentation adversarial training for self-supervised speaker representation learning
J. Kang, J. Huh, H. Heo, J. S. Chung
Journal of Selected Topics in Signal Processing
PDF
Pushing the limits of raw waveform speaker recognition
J. Jung, Y. Kim, H. Heo, B. Lee, Y. Kwon, J. S. Chung
Interspeech
PDF
Spell my name: Keyword boosted speech recognition
N. Jung, G. Kim, J. S. Chung
International Conference on Acoustics, Speech, and Signal Processing
PDF
Multi-scale speaker embedding-based graph attention networks for speaker diarisation
Y. Kwon, H. Heo, J. Jung, Y. Kim, B. Lee, J. S. Chung
International Conference on Acoustics, Speech, and Signal Processing
PDF
AASIST: Audio Anti-Spoofing using Integrated Spectro-Temporal Graph Attention Networks
J. Jung, H. Heo, H. Tak, H. Shim, J. S. Chung, B. Lee, H. Yu, N. Evans
International Conference on Acoustics, Speech, and Signal Processing
PDF
Adapting Speaker Embeddings for Speaker Diarization
Y. Kwon, J. Jung, H. Heo, Y. Kim, B. Lee, J. S. Chung
Interspeech
PDF
Three-class Overlapped Speech Detection using a Convolutional Recurrent Neural Network
J. Jung, H. Heo, Y. Kwon, J. S. Chung, B. Lee
Interspeech
PDF
Look Who's Talking: Active Speaker Detection in the Wild
Y. Kim, H. Heo, S. Choe, S. Chung, Y. Kwon, B. Lee, Y. Kwon, J. S. Chung
Interspeech
PDF | Dataset
Playing a Part: Speaker Verification at the Movies
A. Brown, J. Huh, A. Nagrani, J. S. Chung, A. Zisserman
International Conference on Acoustics, Speech, and Signal Processing
PDF
The ins and outs of speaker recognition: lessons from VoxSRC 2020
Y. Kwon, H. Heo, B. Lee, J. S. Chung
International Conference on Acoustics, Speech, and Signal Processing
PDF
Graph Attention Networks for Speaker Verification
J. Jung, H. Heo, H. Yu, J. S. Chung
International Conference on Acoustics, Speech, and Signal Processing
PDF
Look who's not talking
Y. Kwon, H. Heo, J. Huh, B. Lee, J. S. Chung
IEEE Spoken Language Technology Workshop
Best Paper Finalist
PDF
Metric Learning for Keyword Spotting
J. Huh, M. Lee, H. Heo, S. Mun, J. S. Chung
IEEE Spoken Language Technology Workshop
PDF
Cross attentive pooling for speaker verification
S. Kye, Y. Kwon, J. S. Chung
IEEE Spoken Language Technology Workshop
PDF
Supervised attention for speaker recognition
S. Kye, J. S. Chung, H. Kim
IEEE Spoken Language Technology Workshop
PDF
Perfect Match: Self-Supervised Embeddings for Cross-modal Retrieval
S. W. Chung, J. S. Chung, H. G. Kang
Journal of Selected Topics in Signal Processing
PDF
Augmentation adversarial training for self-supervised speaker recognition
J. Huh, H. Heo, J. Kang, S. Watanabe, J. S. Chung
Workshop on Self-Supervised Learning for Speech and Audio Processing, NeurIPS
PDF
FaceFilter: Audio-visual speech separation using still images
S. W. Chung, S. Choe, J. S. Chung, H. G. Kang
Interspeech
Best Student Paper Award
PDF | Video
Seeing voices and hearing voices: learning discriminative embeddings using cross-modal self-supervision
S. W. Chung, H. G. Kang, J. S. Chung
Interspeech
PDF
Spot the conversation: speaker diarisation in the wild
J. S. Chung*, J. Huh*, A. Nagrani*, T. Afouras, A. Zisserman
Interspeech
PDF | Project page
Now you’re speaking my language: Visual language identification
T. Afouras, J. S. Chung, A. Zisserman
Interspeech
PDF | Project page
In defence of metric learning for speaker recognition
J. S. Chung, J. Huh, S. Mun, M. Lee, H. Heo, S. Choe, C. Ham, S. Jung, B. Lee, I. Han
Interspeech
PDF | Code
Self-supervised learning of audio-visual objects from video
T. Afouras, A. Owens, J. S. Chung, A. Zisserman
European Conference on Computer Vision
PDF
BSL-1K: Scaling up co-articulated sign recognition using mouthing cues
S. Albanie, G. Varol, L. Momeni, T. Afouras, J. S. Chung, N. Fox, A. Zisserman
European Conference on Computer Vision
PDF
Delving into VoxCeleb: environment invariant speaker recognition
J. S. Chung*, J. Huh*, S. Mun
Speaker Odyssey
PDF
ASR is all you need: Cross-modal distillation for lip reading
T. Afouras, J. S. Chung, A. Zisserman
International Conference on Acoustics, Speech, and Signal Processing
PDF
Disentangled Speech Embeddings using Cross-Modal Self-Supervision
A. Nagrani*, J. S. Chung*, S. Albanie*, A. Zisserman
International Conference on Acoustics, Speech, and Signal Processing
PDF
The sound of my voice: speaker representation loss for target voice separation
S. Mun, S. Choe, J. Huh, J. S. Chung
International Conference on Acoustics, Speech, and Signal Processing
PDF
Deep Audio-Visual Speech Recognition
T. Afouras*, J. S. Chung*, A. Senior, O. Vinyals, A. Zisserman
IEEE Transactions on Pattern Analysis and Machine Intelligence
PDF | Dataset
You said that? : Synthesising talking faces from audio
A. Jamaludin*, J. S. Chung*, A. Zisserman
International Journal of Computer Vision
PDF
VoxCeleb: Large-scale speaker verification in the wild
A. Nagrani*, J. S. Chung*, W. Xie, A. Zisserman
Computer Speech and Language
PDF
Who said that?: Audio-visual speaker diarisation of real-world meetings
J. S. Chung, B. Lee, I. Han
Interspeech
PDF
My lips are concealed: Audio-visual speech enhancement through obstructions
T. Afouras, J. S. Chung, A. Zisserman
Interspeech
PDF | Project page
Naver at ActivityNet Challenge 2019--Task B Active Speaker Detection (AVA)
J. S. Chung
International Challenge on Activity Recognition
PDF
Utterance-level Aggregation For Speaker Recognition In The Wild
W. Xie, A. Nagrani, J. S. Chung, A. Zisserman
International Conference on Acoustics, Speech, and Signal Processing
PDF | Project page
Perfect match: Improved cross-modal embeddings for audio-visual synchronisation
S. W. Chung, J. S. Chung, H. G. Kang
International Conference on Acoustics, Speech, and Signal Processing
PDF | Model
Learning to Lip Read Words by Watching Videos
J. S. Chung, A. Zisserman
Computer Vision and Image Understanding
PDF
VoxCeleb2: Deep Speaker Recognition
J. S. Chung*, A. Nagrani*, A. Zisserman
Interspeech
PDF | Dataset | Mirror
The Conversation: Deep Audio-Visual Speech Enhancement
T. Afouras, J. S. Chung, A. Zisserman
Interspeech
PDF | Project page
Deep Lip Reading: a comparison of models and an online application
T. Afouras, J. S. Chung, A. Zisserman
Interspeech
PDF | Project page
VoxCeleb: a large-scale speaker identification dataset
A. Nagrani*, J. S. Chung*, A. Zisserman
Interspeech
Best Student Paper Award
PDF | Dataset | Mirror
You said that?
J. S. Chung*, A. Jamaludin*, A. Zisserman
British Machine Vision Conference
PDF | Project page
Press: New Scientist, Daily Mail
Lip Reading in Profile
J. S. Chung, A. Zisserman
British Machine Vision Conference
PDF
Lip Reading Sentences in the Wild
J. S. Chung, A. Senior, O. Vinyals, A. Zisserman
IEEE Conference on Computer Vision and Pattern Recognition
PDF | Dataset | Video
Press: BBC, CBC, New Scientist, MIT Tech Review, ZDNet
Out of time: automated lip sync in the wild
J. S. Chung, A. Zisserman
Workshop on Multi-view Lip-reading, ACCV
PDF | Project page
Lip Reading in the Wild
J. S. Chung, A. Zisserman
Asian Conference on Computer Vision
Best Student Paper Award
PDF | Dataset
Signs in time: Encoding human motion as a temporal image
J. S. Chung, A. Zisserman
Workshop on Brave New Ideas for Motion Representations, ECCV
PDF | Video
K-Celeb: a collaborative approach to face dataset curation
J. H. Bae, B. Bebensee, et al.
Seoul National University
PDF
VoxSRC 2020: The Second VoxCeleb Speaker Recognition Challenge
A. Nagrani, J. S. Chung, J. Huh, A. Brown, E. Coto, W. Xie, M. McLaren, D. Reynolds, A. Zisserman
arXiv:2012.06867
PDF
VoxSRC 2019: The first VoxCeleb Speaker Recognition Challenge
J. S. Chung, A. Nagrani, E. Coto, W. Xie, M. McLaren, D. Reynolds, A. Zisserman
arXiv:1912.02522
PDF
LRS3-TED: a large-scale dataset for visual speech recognition
T. Afouras, J. S. Chung, A. Zisserman
arXiv:1809.00496
PDF | Dataset