Multimodal AI Lab @ KAIST

2026

See & Sniff: Learning Visuo-Olfactory Representations
S. Kim, S. Lee, H. Ryu, J. S. Chung, A. Senocak
European Conference on Computer Vision
PDF
ProsoCodec: Prosody-Oriented Speech Codec for Voice Conversion
J. Choi, J. Kim, S. Hu, J. S. Chung
Interspeech
PDF
Plug-and-Steer: Decoupling Separation and Selection in Audio-Visual Target Speaker Extraction
D. Kwak, S. Lee, J. S. Chung
Interspeech
PDF
Acoustic Prompting via Stage-wise Modulation for Few-Shot Learning in Audio Language Models
H. Cho, J. Jang, C. Kim, J. S. Chung
Interspeech
PDF
MamTra: A Hybrid Mamba-Transformer Backbone for Speech Synthesis
T. D. Nguyen, S. Bae, J. S. Chung, J. Kim
Interspeech
PDF
Semantic-VAE: Semantic-Alignment Latent Representation for Better Speech Synthesis
Z. Niu, S. Hu, J. Choi, Y. Chen, P. Chen, P. Zhu, Y. Yang, B. Zhang, J. Zhao, C. Wang, X. Chen
Interspeech
PDF
Inference-Time Scaling for Joint Audio-Video Generation
J. Jung, K. Rho, I. Shin, J. S. Chung
Transactions on Machine Learning Research
PDF
Two Heads Are Better Than One: Audio-Visual Speech Error Correction with Dual Hypotheses
S. Kim, K. Jang, S. Cho, J. S. Chung, H. Kim, S. Yun
Findings of the Association for Computational Linguistics
PDF
Probing Cross-modal Information Hubs in Audio-Visual LLMs
J. Jung, C. Jung, J. Kim, J. S. Chung
International Conference on Machine Learning
PDF
Deep Understanding of Sign Language for Sign to Subtitle Alignment
Y. Jang, J. Choi, J. Ahn, J. S. Chung
IEEE Transactions on Multimedia
PDF
Cinematic Audio Source Separation Using Visual Cues
K. Zhang, S. Lee, A. Senocak, J. S. Chung
IEEE Conference on Computer Vision and Pattern Recognition
PDF
Seeing Through Touch: Tactile-Driven Visual Localization of Material Regions
S. Kim, S. Lee, H. Ryu, J. S. Chung, A. Senocak
IEEE Conference on Computer Vision and Pattern Recognition
PDF
How Far Can We Go With Synthetic Data for Audio-Visual Sound Source Localization?
A. Senocak, S. Park, T. Oh, J. S. Chung
IEEE Conference on Computer Vision and Pattern Recognition
PDF
Hear you are: Teaching LLMs Spatial Reasoning with Vision and Spatial Sound
H. Ryu, J. S. Chung, D. Harwath
IEEE Conference on Computer Vision and Pattern Recognition
PDF
DiFlowDubber: Discrete Flow Matching for Automated Video Dubbing via Cross-Modal Alignment and Synchronization
N. Nguyen, T. Tran, J. Choi, H. Huynh-Nguyen, T. Hy, V. Nguyen
IEEE Conference on Computer Vision and Pattern Recognition
PDF
EDNet: A Versatile Speech Enhancement Framework with Gating Mamba Mechanism and Phase Shift-Invariant Training
D. Kwak, Y. Jang, S. Kim, J. S. Chung
IEEE Transactions on Audio, Speech and Language Processing
PDF
Hearing and Seeing Through CLIP: A Framework for Self-Supervised Sound Source Localization
S. Park, A. Senocak, J. S. Chung
International Journal of Computer Vision
PDF
LP-CFM: Perceptual Invariance-Aware Conditional Flow Matching for Speech Modeling
D. Kwak, Y. Jang, J. S. Chung
International Conference on Acoustics, Speech, and Signal Processing
PDF
SPADE: Structured Pruning and Adaptive Distillation for Efficient LLM-TTS
T. D. Nguyen, J. Kim, J. Kim, S. Choi, Y. Lim, J. S. Chung
International Conference on Acoustics, Speech, and Signal Processing
PDF
MAGE: A Coarse-to-Fine Speech Enhancer with Masked Generative Model
T. H. Pham, T. D. Nguyen, P. T. Tran, J. S. Chung, D. D. Nguyen
International Conference on Acoustics, Speech, and Signal Processing
PDF
Diffusion-Link: Diffusion Probabilistic Model for Bridging the Audio-Text Modality Gap
K. Nam, J. Choi, H. Lee, J. Heo, J. S. Chung
International Conference on Acoustics, Speech, and Signal Processing
PDF
LAMB: LLM-Based Audio Captioning with Modality Gap Bridging via Cauchy-Schwarz Divergence
H. Lee, J. Choi, K. Nam, J. S. Chung
International Conference on Acoustics, Speech, and Signal Processing
PDF
UNMIXX: Untangling Highly Correlated Singing Voices Mixtures
J. Jung, J. Kim, D. Kwak, J. Lee, J. Nam, J. S. Chung
International Conference on Acoustics, Speech, and Signal Processing
PDF
FastAV: Efficient Token Pruning for Audio-Visual Large Language Model Inference
C. Jung, Y. Jang, S. Lee, J. S. Chung
International Conference on Acoustics, Speech, and Signal Processing
PDF

2025

Toward Interactive Sound Source Localization: Better Align Sight and Sound!
A. Senocak, H. Ryu, J. Kim, T. Oh, H. Pfister, J. S. Chung
IEEE Transactions on Pattern Analysis and Machine Intelligence
PDF
SpoofCeleb: Speech Deepfake Detection and SASV In The Wild
J. Jung, Y. Wu, X. Wang, J. Kim, S. Maiti, Y. Matsunaga, H. Shim, J. Tian, N. Evans, J. S. Chung, W. Zhang, S. Um, S. Takamichi, S. Watanabe
IEEE Open Journal of Signal Processing
PDF
CrossSpeech++: Cross-lingual Speech Synthesis with Decoupled Language and Speaker Generation
J. Kim, H. Yang, Y. Ju, I. Kim, B. Kim, J. S. Chung
IEEE Transactions on Audio, Speech and Language Processing
PDF
AVCD: Mitigating Hallucinations in Audio-Visual Large Language Models through Contrastive Decoding
C. Jung, Y. Jang, J. S. Chung
Conference on Neural Information Processing Systems
PDF
Model-Guided Dual-Role Alignment for High-Fidelity Open-Domain Video-to-Audio Generation
K. Zhang, T. X. Pham, S. Lee, A. Niu, A. Senocak, J. S. Chung
Conference on Neural Information Processing Systems
PDF
Video Diffusion Models Excel at Tracking Similar-Looking Objects Without Supervision
C. Zhang, K. Zhang, J. S. Chung, I. S. Kweon, J. Kim, C. Mao
Conference on Neural Information Processing Systems
PDF
Dub-S2ST: Textless Speech-to-Speech Translation for Seamless Dubbing
J. Choi, J. Kim, J. S. Chung
Findings of Empirical Methods in Natural Language Processing
PDF
AlignDiT: Multimodal Aligned Diffusion Transformer for Synchronized Speech Generation
J. Choi, J. Kim, S. Kim, T. Oh, J. S. Chung
ACM International Conference on Multimedia
PDF
VoiceCraft-Dub: Automated Video Dubbing with Neural Codec Language Models
S. Kim, J. Choi, P. Peng, J. S. Chung, T. Oh, D. Harwath
International Conference on Computer Vision
PDF
MAVFlow: Preserving Paralinguistic Elements with Conditional Flow Matching for Zero-Shot AV2AV Multilingual Translation
S. Cho, J. Choi, S. Kim, S. Yun
International Conference on Computer Vision
PDF
InfiniteAudio: Infinite-Length Audio Generation with Consistency
C. Jung, H. Ki, J. Kim, J. Kim, J. S. Chung
Interspeech
PDF
SEED: Speaker Embedding Enhancement Diffusion Model
K. Nam, J. Heo, J. Jung, G. Park, C. Jung, H. Yu, J. S. Chung
Interspeech
PDF
Accelerating Diffusion-based Text-to-Speech Model Training with Dual Modality Alignment
J. Choi, Z. Niu, J. Kim, C. Wang, J. S. Chung, X. Chen
Interspeech
PDF
The text-to-speech in the wild (TITW) dataset
J. Jung, W. Zhang, S. Maiti, Y. Wu, X. Wang, J. Kim, Y. Matsunaga, S. Um, J. Tian, H. Shim, N. Evans, J. S. Chung, S. Takamichi, S. Watanabe
Interspeech
PDF
Seeing Speech and Sound: Distinguishing and Locating Audio Sources in Visual Scenes
H. Ryu, S. Kim, J. S. Chung, A. Senocak
IEEE Conference on Computer Vision and Pattern Recognition
PDF
From Faces to Voices: Learning Hierarchical Representations for High-quality Video-to-Speech
J. Kim, J. Choi, J. Kim, C. Jung, J. S. Chung
IEEE Conference on Computer Vision and Pattern Recognition
PDF
Lost in Translation, Found in Context: Sign Language Translation with Contextual Cues
Y. Jang, H. Raajesh, L. Momeni, G. Varol, A. Zisserman
IEEE Conference on Computer Vision and Pattern Recognition
PDF
Test-Time Augmentation for Pose-invariant Face Recognition
J. Jung, Y. Jang, J. S. Chung
IEEE International Conference on Automatic Face and Gesture Recognition
PDF
High-Quality Joint Image and Video Compression with Causal VAE
D. M. Argaw, X. Liu, Q. Zhang, J. S. Chung, M. Liu, F. Reda
International Conference on Learning Representations
PDF
AVHBench: A Cross-Modal Hallucination Benchmark for Audio-Visual Large Language Models
S. Kim, H. Oh, J. Lee, A. Senocak, J. S. Chung, T. Oh
International Conference on Learning Representations
PDF
ARLON: Boosting Diffusion Transformers with Autoregressive Models for Long Video Generation
Z. Li, S. Hu, S. Liu, L. Zhou, J. Choi, L. Meng, X. Guo, J. Li, H. Ling, F. Wei
International Conference on Learning Representations
PDF
V2SFlow: Video-to-Speech Generation with Speech Decomposition and Rectified Flow
J. Choi, J. Kim, J. Li, J. S. Chung, S. Liu
International Conference on Acoustics, Speech, and Signal Processing
PDF
LAVCap: LLM-based Audio-Visual Captioning using Optimal Transport
K. Rho, H. Lee, V. Iverson, J. S. Chung
International Conference on Acoustics, Speech, and Signal Processing
PDF
VoiceDiT: Dual-Condition Diffusion Transformer for Environment-Aware Speech Synthesis
J. Jung, J. Ahn, C. Jung, T. D. Nguyen, Y. Jang, J. S. Chung
International Conference on Acoustics, Speech, and Signal Processing
PDF
Accelerating Codec-based Speech Synthesis with Multi-Token Prediction and Speculative Decoding
T. D. Nguyen, J. Kim, J. Choi, S. Choi, J. Park, Y. Lee, J. S. Chung
International Conference on Acoustics, Speech, and Signal Processing
PDF
AdaptVC: High Quality Voice Conversion with Adaptive Learning
J. Kim, J. Kim, Y. Choi, T. D. Nguyen, S. Mun, J. S. Chung
International Conference on Acoustics, Speech, and Signal Processing
PDF

2024

Audio Mamba: Bidirectional State Space Model for Audio Representation Learning
M. H. Erol, A. Senocak, J. Feng, J. S. Chung
IEEE Signal Processing Letters
PDF
Bridging the Gap between Audio and Text using Parallel-attention for User-defined Keyword Spotting
Y. Kim, J. Jung, J. Park, B. Kim, J. S. Chung
IEEE Signal Processing Letters
PDF
Let Me Finish My Sentence: Video Temporal Grounding with Holistic Text Understanding
J. Woo, H. Ryu, Y. Jang, J. W. Cho, J. S. Chung
ACM International Conference on Multimedia
PDF
VoxSim: A perceptual voice similarity dataset
J. Ahn, Y. Kim, Y. Choi, D. Kwak, J. Kim, S. Mun, J. S. Chung
Interspeech
PDF
Lightweight Audio Segmentation for Long-form Speech Translation
J. Lee, S. Kim, H. Kim, J. S. Chung
Interspeech
PDF
ElasticAST: An Audio Spectrogram Transformer for All Length and Resolutions
J. Feng, M. H. Erol, J. S. Chung, A. Senocak
Interspeech
PDF
To what extent can ASV systems naturally defend against spoofing attacks?
J. Jung, X. Wang, N. Evans, S. Watanabe, H. Shim, H. Tak, S. Arora, J. Yamagishi, J. S. Chung
Interspeech
PDF
Disentangled Representation Learning for Environment-agnostic Speaker Recognition
K. Nam, H. Heo, J. Jung, J. S. Chung
Interspeech
PDF
FlowAVSE: Efficient Audio-Visual Speech Enhancement with Conditional Flow Matching
C. Jung, S. Lee, J. Kim, J. S. Chung
Interspeech
PDF
EquiAV: Leveraging Equivariance for Audio-Visual Contrastive Learning
J. Kim, H. Lee, K. Rho, J. Kim, J. S. Chung
International Conference on Machine Learning
PDF
Faces that Speak: Jointly Synthesising Talking Face and Speech from Text
Y. Jang, J. Kim, J. Ahn, D. Kwak, H. Yang, Y. Ju, I. Kim, B. Kim, J. S. Chung
IEEE Conference on Computer Vision and Pattern Recognition
PDF
Scaling Up Video Summarization Pretraining with Large Language Models
D. M. Argaw, S. Yoon, F. C. Heilbron, H. Deilamsalehy, T. Bui, Z. Wang, F. Dernoncourt, J. S. Chung
IEEE Conference on Computer Vision and Pattern Recognition
PDF
Towards Automated Movie Trailer Generation
D. M. Argaw, M. Soldan, A. Pardo, C. Zhao, F. C. Heilbron, J. S. Chung, B. Ghanem
IEEE Conference on Computer Vision and Pattern Recognition
PDF
FreGrad: Lightweight and fast frequency-aware diffusion vocoder
T. D. Nguyen, J. Kim, Y. Jang, J. Kim, J. S. Chung
International Conference on Acoustics, Speech, and Signal Processing
PDF Project page
SlowFast Network for Continuous Sign Language Recognition
J. Ahn, Y. Jang, J. S. Chung
International Conference on Acoustics, Speech, and Signal Processing
PDF
Rethinking Session Variability: Leveraging Session Embeddings for Session Robustness in Speaker Verification
H. Heo, K. Nam, B. Lee, Y. Kwon, M. Lee, Y. J. Kim, J. S. Chung
International Conference on Acoustics, Speech, and Signal Processing
PDF
Speech Guided Masked Image Modeling for Visually Grounded Speech
J. Woo, H. Ryu, A. Senocak, J. S. Chung
International Conference on Acoustics, Speech, and Signal Processing
PDF
VoxMM: Rich Transcription of Conversations in the Wild
D. Kwak, J. Jung, K. Nam, Y. Jang, J. Jung, S. Watanabe, J. S. Chung
International Conference on Acoustics, Speech, and Signal Processing
PDF
From Coarse To Fine: Efficient Training for Audio Spectrogram Transformers
J. Feng, M. H. Erol, J. S. Chung, A. Senocak
International Conference on Acoustics, Speech, and Signal Processing
PDF
VoiceLDM: Text-to-Audio Generation with Linguistic Content
Y. Lee, I. Yeon, J. Nam, J. S. Chung
International Conference on Acoustics, Speech, and Signal Processing
PDF Project page
TalkNCE: Improving Active Speaker Detection with Talking-Aware Contrastive Learning
C. Jung, S. Lee, K. Nam, K. Rho, Y. J. Kim, Y. Jang, J. S. Chung
International Conference on Acoustics, Speech, and Signal Processing
PDF
Seeing Through the Conversation: Audio-Visual Speech Separation based on Diffusion Model
S. Lee, C. Jung, Y. Jang, J. Kim, J. S. Chung
International Conference on Acoustics, Speech, and Signal Processing
PDF
Let There Be Sound: Reconstructing High Quality Speech from Silent Videos
J. Kim, J. Kim, J. S. Chung
AAAI Conference on Artificial Intelligence
PDF Project page
Can CLIP Help Sound Source Localization?
S. Park, A. Senocak, J. S. Chung
Winter Conference on Applications of Computer Vision
PDF

2023

That's What I Said: Fully-Controllable Talking Face Generation
Y. Jang, K. Rho, J. Woo, H. Lee, J. Park, Y. Lim, B. Kim, J. S. Chung
ACM International Conference on Multimedia
PDF Project page
Sound Source Localization is All about Cross-Modal Alignment
A. Senocak, H. Ryu, J. Kim, T. Oh, H. Pfister, J. S. Chung
International Conference on Computer Vision
PDF
FlexiAST: Flexibility is What AST Needs
J. Feng, M. H. Erol, J. S. Chung, A. Senocak
Interspeech
PDF
Disentangled Representation Learning for Multilingual Speaker Recognition
K. Nam, Y. Kim, J. Huh, H. Heo, J. Jung, J. S. Chung
Interspeech
PDF Project page
Curriculum learning for self-supervised speaker verification
H. Heo, J. Jung, J. Kang, Y. Kwon, B. Lee, Y. J. Kim, J. S. Chung
Interspeech
PDF
Self-sufficient framework for continuous sign language recognition
Y. Jang, Y. Oh, J. W. Cho, M. Kim, D. Kim, I. S. Kweon, J. S. Chung
International Conference on Acoustics, Speech, and Signal Processing
PDF Project page
Metric learning for user-defined keyword spotting
J. Jung, Y. Kim, J. Park, Y. Lim, B. Kim, Y. Jang, J. S. Chung
International Conference on Acoustics, Speech, and Signal Processing
PDF Project page
Hindi as a second language: improving visually grounded speech with semantically similar samples
H. Ryu, A. Senocak, I. S. Kweon, J. S. Chung
International Conference on Acoustics, Speech, and Signal Processing
PDF
MarginNCE: Robust Sound Localization with a Negative Margin
S. Park, A. Senocak, J. S. Chung
International Conference on Acoustics, Speech, and Signal Processing
PDF
Advancing the dimensionality reduction of speaker embeddings for speaker diarisation: disentangling noise and informing speech activity
Y. J. Kim, H. Heo, J. Jung, Y. Kwon, B. Lee, J. S. Chung
International Conference on Acoustics, Speech, and Signal Processing
PDF
In search of strong embedding extractors for speaker diarisation
J. Jung, B. Lee, J. Huh, A. Brown, Y. Kwon, S. Watanabe, J. S. Chung
International Conference on Acoustics, Speech, and Signal Processing
PDF
Imaginary Voice: Face-styled Diffusion Model for Text-to-Speech
J. Lee, J. S. Chung, S. Chung
International Conference on Acoustics, Speech, and Signal Processing
PDF

2022

Signing Outside the Studio: Benchmarking Background Robustness for Continuous Sign Language Recognition
Y. Jang, Y. Oh, J. W. Cho, D. Kim, J. S. Chung, I. S. Kweon
British Machine Vision Conference
PDF Project page
Augmentation adversarial training for self-supervised speaker representation learning
J. Kang, J. Huh, H. Heo, J. S. Chung
Journal of Selected Topics in Signal Processing
PDF
Pushing the limits of raw waveform speaker recognition
J. Jung, Y. J. Kim, H. Heo, B. Lee, Y. Kwon, J. S. Chung
Interspeech
PDF
Spell my name: Keyword boosted speech recognition
N. Jung, G. Kim, J. S. Chung
International Conference on Acoustics, Speech, and Signal Processing
PDF
Multi-scale speaker embedding-based graph attention networks for speaker diarisation
Y. Kwon, H. Heo, J. Jung, Y. J. Kim, B. Lee, J. S. Chung
International Conference on Acoustics, Speech, and Signal Processing
PDF
AASIST: Audio Anti-Spoofing using Integrated Spectro-Temporal Graph Attention Networks
J. Jung, H. Heo, H. Tak, H. Shim, J. S. Chung, B. Lee, H. Yu, N. Evans
International Conference on Acoustics, Speech, and Signal Processing
PDF

2021

Adapting Speaker Embeddings for Speaker Diarization
Y. Kwon, J. Jung, H. Heo, Y. J. Kim, B. Lee, J. S. Chung
Interspeech
PDF
Three-class Overlapped Speech Detection using a Convolutional Recurrent Neural Network
J. Jung, H. Heo, Y. Kwon, J. S. Chung, B. Lee
Interspeech
PDF
Look Who's Talking: Active Speaker Detection in the Wild
Y. J. Kim, H. Heo, S. Choe, S. Chung, Y. Kwon, B. Lee, Y. Kwon, J. S. Chung
Interspeech
PDF Project page
Playing a Part: Speaker Verification at the Movies
A. Brown, J. Huh, A. Nagrani, J. S. Chung, A. Zisserman
International Conference on Acoustics, Speech, and Signal Processing
PDF
The ins and outs of speaker recognition: lessons from VoxSRC 2020
Y. Kwon, H. Heo, B. Lee, J. S. Chung
International Conference on Acoustics, Speech, and Signal Processing
PDF
Graph Attention Networks for Speaker Verification
J. Jung, H. Heo, H. Yu, J. S. Chung
International Conference on Acoustics, Speech, and Signal Processing
PDF
Look who's not talking
Y. Kwon, H. Heo, J. Huh, B. Lee, J. S. Chung
IEEE Spoken Language Technology Workshop
PDF
Metric Learning for Keyword Spotting
J. Huh, M. Lee, H. Heo, S. Mun, J. S. Chung
IEEE Spoken Language Technology Workshop
PDF
Cross attentive pooling for speaker verification
S. M. Kye, Y. Kwon, J. S. Chung
IEEE Spoken Language Technology Workshop
PDF
Supervised attention for speaker recognition
S. M. Kye, J. S. Chung, H. Kim
IEEE Spoken Language Technology Workshop
PDF

2020

Perfect Match: Self-Supervised Embeddings for Cross-modal Retrieval
S. Chung, J. S. Chung, H. Kang
Journal of Selected Topics in Signal Processing
PDF
Augmentation adversarial training for self-supervised speaker recognition
J. Huh, H. Heo, J. Kang, S. Watanabe, J. S. Chung
Workshop on Self-Supervised Learning for Speech and Audio Processing, NeurIPS
PDF
FaceFilter: Audio-visual speech separation using still images
S. Chung, S. Choe, J. S. Chung, H. Kang
Interspeech
PDF Video
Seeing voices and hearing voices: learning discriminative embeddings using cross-modal self-supervision
S. Chung, H. Kang, J. S. Chung
Interspeech
PDF
Spot the conversation: speaker diarisation in the wild
J. S. Chung, J. Huh, A. Nagrani, T. Afouras, A. Zisserman
Interspeech
PDF
Now you’re speaking my language: Visual language identification
T. Afouras, J. S. Chung, A. Zisserman
Interspeech
PDF
In defence of metric learning for speaker recognition
J. S. Chung, J. Huh, S. Mun, M. Lee, H. Heo, S. Choe, C. Ham, S. Jung, B. Lee, I. Han
Interspeech
PDF
Self-supervised learning of audio-visual objects from video
T. Afouras, A. Owens, J. S. Chung, A. Zisserman
European Conference on Computer Vision
PDF
BSL-1K: Scaling up co-articulated sign recognition using mouthing cues
S. Albanie, G. Varol, L. Momeni, T. Afouras, J. S. Chung, N. Fox, A. Zisserman
European Conference on Computer Vision
PDF
Delving into VoxCeleb: environment invariant speaker recognition
J. S. Chung, J. Huh, S. Mun
Speaker Odyssey
PDF
ASR is all you need: Cross-modal distillation for lip reading
T. Afouras, J. S. Chung, A. Zisserman
International Conference on Acoustics, Speech, and Signal Processing
PDF
Disentangled Speech Embeddings using Cross-Modal Self-Supervision
A. Nagrani, J. S. Chung, S. Albanie, A. Zisserman
International Conference on Acoustics, Speech, and Signal Processing
PDF
The sound of my voice: speaker representation loss for target voice separation
S. Mun, S. Choe, J. Huh, J. S. Chung
International Conference on Acoustics, Speech, and Signal Processing
PDF

2019

Deep Audio-Visual Speech Recognition
T. Afouras, J. S. Chung, A. Senior, O. Vinyals, A. Zisserman
IEEE Transactions on Pattern Analysis and Machine Intelligence
PDF Project page
You said that? : Synthesising talking faces from audio
A. Jamaludin, J. S. Chung, A. Zisserman
International Journal of Computer Vision
PDF
VoxCeleb: Large-scale speaker verification in the wild
A. Nagrani, J. S. Chung, W. Xie, A. Zisserman
Computer Speech and Language
PDF
Who said that?: Audio-visual speaker diarisation of real-world meetings
J. S. Chung, B. Lee, I. Han
Interspeech
PDF
My lips are concealed: Audio-visual speech enhancement through obstructions
T. Afouras, J. S. Chung, A. Zisserman
Interspeech
PDF Project page
Naver at ActivityNet Challenge 2019--Task B Active Speaker Detection (AVA)
J. S. Chung
International Challenge on Activity Recognition
PDF
Utterance-level Aggregation For Speaker Recognition In The Wild
W. Xie, A. Nagrani, J. S. Chung, A. Zisserman
International Conference on Acoustics, Speech, and Signal Processing
PDF Project page
Perfect match: Improved cross-modal embeddings for audio-visual synchronisation
S. Chung, J. S. Chung, H. Kang
International Conference on Acoustics, Speech, and Signal Processing
PDF Model

2018

Learning to Lip Read Words by Watching Videos
J. S. Chung, A. Zisserman
Computer Vision and Image Understanding
PDF
VoxCeleb2: Deep Speaker Recognition
J. S. Chung, A. Nagrani, A. Zisserman
Interspeech
PDF Project page
The Conversation: Deep Audio-Visual Speech Enhancement
T. Afouras, J. S. Chung, A. Zisserman
Interspeech
PDF Project page
Deep Lip Reading: a comparison of models and an online application
T. Afouras, J. S. Chung, A. Zisserman
Interspeech
PDF Project page

2017

VoxCeleb: a large-scale speaker identification dataset
A. Nagrani, J. S. Chung, A. Zisserman
Interspeech
PDF Project page
You said that?
J. S. Chung, A. Jamaludin, A. Zisserman
British Machine Vision Conference
PDF Project page
Lip Reading in Profile
J. S. Chung, A. Zisserman
British Machine Vision Conference
PDF
Lip Reading Sentences in the Wild
J. S. Chung, A. Senior, O. Vinyals, A. Zisserman
IEEE Conference on Computer Vision and Pattern Recognition
PDF Project page Video

2016

Out of time: automated lip sync in the wild
J. S. Chung, A. Zisserman
Workshop on Multi-view Lip-reading, ACCV
PDF Project page
Lip Reading in the Wild
J. S. Chung, A. Zisserman
Asian Conference on Computer Vision
PDF Project page
Signs in time: Encoding human motion as a temporal image
J. S. Chung, A. Zisserman
Workshop on Brave New Ideas for Motion Representations, ECCV
PDF Video

Publications

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016