Publications


2026

  • Deep Understanding of Sign Language for Sign to Subtitle Alignment
    Y. Jang, J. Choi, J. Ahn, J. S. Chung
    IEEE Transactions on Multimedia
    PDF
  • Cinematic Audio Source Separation Using Visual Cues
    K. Zhang, S. Lee, A. Senocak, J. S. Chung
    IEEE Conference on Computer Vision and Pattern Recognition
    PDF
  • Seeing Through Touch: Tactile-Driven Visual Localization of Material Regions
    S. Kim, S. Lee, H. Ryu, J. S. Chung, A. Senocak
    IEEE Conference on Computer Vision and Pattern Recognition
    PDF
  • How Far Can We Go With Synthetic Data for Audio-Visual Sound Source Localization?
    A. Senocak, S. Park, T. Oh, J. S. Chung
    IEEE Conference on Computer Vision and Pattern Recognition
    PDF
  • Hear you are: Teaching LLMs Spatial Reasoning with Vision and Spatial Sound
    H. Ryu, J. S. Chung, D. Harwath
    IEEE Conference on Computer Vision and Pattern Recognition
    PDF
  • EDNet: A Versatile Speech Enhancement Framework with Gating Mamba Mechanism and Phase Shift-Invariant Training
    D. Kwak, Y. Jang, S. Kim, J. S. Chung
    IEEE Transactions on Audio, Speech and Language Processing
    PDF
  • LP-CFM: Perceptual Invariance-Aware Conditional Flow Matching for Speech Modeling
    D. Kwak, Y. Jang, J. S. Chung
    International Conference on Acoustics, Speech, and Signal Processing
    PDF
  • SPADE: Structured Pruning and Adaptive Distillation for Efficient LLM-TTS
    T. D. Nguyen, J. Kim, J. Kim, S. Choi, Y. Lim, J. S. Chung
    International Conference on Acoustics, Speech, and Signal Processing
    PDF
  • MAGE: A Coarse-to-Fine Speech Enhancer with Masked Generative Model
    T. H. Pham, T. D. Nguyen, P. T. Tran, J. S. Chung, D. D. Nguyen
    International Conference on Acoustics, Speech, and Signal Processing
    PDF
  • Diffusion-Link: Diffusion Probabilistic Model for Bridging the Audio-Text Modality Gap
    K. Nam, J. Choi, H. Lee, J. Heo, J. S. Chung
    International Conference on Acoustics, Speech, and Signal Processing
    PDF
  • LAMB: LLM-Based Audio Captioning with Modality Gap Bridging via Cauchy-Schwarz Divergence
    H. Lee, J. Choi, K. Nam, J. S. Chung
    International Conference on Acoustics, Speech, and Signal Processing
    PDF
  • UNMIXX: Untangling Highly Correlated Singing Voices Mixtures
    J. Jung, J. Kim, D. Kwak, J. Lee, J. Nam, J. S. Chung
    International Conference on Acoustics, Speech, and Signal Processing
    PDF
  • FastAV: Efficient Token Pruning for Audio-Visual Large Language Model Inference
    C. Jung, Y. Jang, S. Lee, J. S. Chung
    International Conference on Acoustics, Speech, and Signal Processing
    PDF

2025

  • Toward Interactive Sound Source Localization: Better Align Sight and Sound!
    A. Senocak, H. Ryu, J. Kim, T. Oh, H. Pfister, J. S. Chung
    IEEE Transactions on Pattern Analysis and Machine Intelligence
    PDF
  • SpoofCeleb: Speech Deepfake Detection and SASV In The Wild
    J. Jung, Y. Wu, X. Wang, J. Kim, S. Maiti, Y. Matsunaga, H. Shim, J. Tian, N. Evans, J. S. Chung, W. Zhang, S. Um, S. Takamichi, S. Watanabe
    IEEE Open Journal of Signal Processing
    PDF
  • CrossSpeech++: Cross-lingual Speech Synthesis with Decoupled Language and Speaker Generation
    J. Kim, H. Yang, Y. Ju, I. Kim, B. Kim, J. S. Chung
    IEEE Transactions on Audio, Speech and Language Processing
    PDF
  • AVCD: Mitigating Hallucinations in Audio-Visual Large Language Models through Contrastive Decoding
    C. Jung, Y. Jang, J. S. Chung
    Conference on Neural Information Processing Systems
    PDF
  • Model-Guided Dual-Role Alignment for High-Fidelity Open-Domain Video-to-Audio Generation
    K. Zhang, T. X. Pham, S. Lee, A. Niu, A. Senocak, J. S. Chung
    Conference on Neural Information Processing Systems
    PDF
  • Video Diffusion Models Excel at Tracking Similar-Looking Objects Without Supervision
    C. Zhang, K. Zhang, J. S. Chung, I. S. Kweon, J. Kim, C. Mao
    Conference on Neural Information Processing Systems
    PDF
  • Dub-S2ST: Textless Speech-to-Speech Translation for Seamless Dubbing
    J. Choi, J. Kim, J. S. Chung
    Findings of Empirical Methods in Natural Language Processing
    PDF
  • AlignDiT: Multimodal Aligned Diffusion Transformer for Synchronized Speech Generation
    J. Choi, J. Kim, S. Kim, T. Oh, J. S. Chung
    ACM International Conference on Multimedia
    PDF
  • VoiceCraft-Dub: Automated Video Dubbing with Neural Codec Language Models
    S. Kim, J. Choi, P. Peng, J. S. Chung, T. Oh, D. Harwath
    International Conference on Computer Vision
    PDF
  • MAVFlow: Preserving Paralinguistic Elements with Conditional Flow Matching for Zero-Shot AV2AV Multilingual Translation
    S. Cho, J. Choi, S. Kim, S. Yun
    International Conference on Computer Vision
    PDF
  • InfiniteAudio: Infinite-Length Audio Generation with Consistency
    C. Jung, H. Ki, J. Kim, J. Kim, J. S. Chung
    Interspeech
    PDF
  • SEED: Speaker Embedding Enhancement Diffusion Model
    K. Nam, J. Heo, J. Jung, G. Park, C. Jung, H. Yu, J. S. Chung
    Interspeech
    PDF
  • Accelerating Diffusion-based Text-to-Speech Model Training with Dual Modality Alignment
    J. Choi, Z. Niu, J. Kim, C. Wang, J. S. Chung, X. Chen
    Interspeech
    PDF
  • The text-to-speech in the wild (TITW) dataset
    J. Jung, W. Zhang, S. Maiti, Y. Wu, X. Wang, J. Kim, Y. Matsunaga, S. Um, J. Tian, H. Shim, N. Evans, J. S. Chung, S. Takamichi, S. Watanabe
    Interspeech
    PDF
  • Seeing Speech and Sound: Distinguishing and Locating Audio Sources in Visual Scenes
    H. Ryu, S. Kim, J. S. Chung, A. Senocak
    IEEE Conference on Computer Vision and Pattern Recognition
    PDF
  • From Faces to Voices: Learning Hierarchical Representations for High-quality Video-to-Speech
    J. Kim, J. Choi, J. Kim, C. Jung, J. S. Chung
    IEEE Conference on Computer Vision and Pattern Recognition
    PDF
  • Lost in Translation, Found in Context: Sign Language Translation with Contextual Cues
    Y. Jang, H. Raajesh, L. Momeni, G. Varol, A. Zisserman
    IEEE Conference on Computer Vision and Pattern Recognition
    PDF
  • Test-Time Augmentation for Pose-invariant Face Recognition
    J. Jung, Y. Jang, J. S. Chung
    IEEE International Conference on Automatic Face and Gesture Recognition
    PDF
  • High-Quality Joint Image and Video Compression with Causal VAE
    D. M. Argaw, X. Liu, Q. Zhang, J. S. Chung, M. Liu, F. Reda
    International Conference on Learning Representations
    PDF
  • AVHBench: A Cross-Modal Hallucination Benchmark for Audio-Visual Large Language Models
    S. Kim, H. Oh, J. Lee, A. Senocak, J. S. Chung, T. Oh
    International Conference on Learning Representations
    PDF
  • ARLON: Boosting Diffusion Transformers with Autoregressive Models for Long Video Generation
    Z. Li, S. Hu, S. Liu, L. Zhou, J. Choi, L. Meng, X. Guo, J. Li, H. Ling, F. Wei
    International Conference on Learning Representations
    PDF
  • V2SFlow: Video-to-Speech Generation with Speech Decomposition and Rectified Flow
    J. Choi, J. Kim, J. Li, J. S. Chung, S. Liu
    International Conference on Acoustics, Speech, and Signal Processing
    PDF
  • LAVCap: LLM-based Audio-Visual Captioning using Optimal Transport
    K. Rho, H. Lee, V. Iverson, J. S. Chung
    International Conference on Acoustics, Speech, and Signal Processing
    PDF
  • VoiceDiT: Dual-Condition Diffusion Transformer for Environment-Aware Speech Synthesis
    J. Jung, J. Ahn, C. Jung, T. D. Nguyen, Y. Jang, J. S. Chung
    International Conference on Acoustics, Speech, and Signal Processing
    PDF
  • Accelerating Codec-based Speech Synthesis with Multi-Token Prediction and Speculative Decoding
    T. D. Nguyen, J. Kim, J. Choi, S. Choi, J. Park, Y. Lee, J. S. Chung
    International Conference on Acoustics, Speech, and Signal Processing
    PDF
  • AdaptVC: High Quality Voice Conversion with Adaptive Learning
    J. Kim, J. Kim, Y. Choi, T. D. Nguyen, S. Mun, J. S. Chung
    International Conference on Acoustics, Speech, and Signal Processing
    PDF

2024

  • Audio Mamba: Bidirectional State Space Model for Audio Representation Learning
    M. H. Erol, A. Senocak, J. Feng, J. S. Chung
    IEEE Signal Processing Letters
    PDF
  • Bridging the Gap between Audio and Text using Parallel-attention for User-defined Keyword Spotting
    Y. Kim, J. Jung, J. Park, B. Kim, J. S. Chung
    IEEE Signal Processing Letters
    PDF
  • Let Me Finish My Sentence: Video Temporal Grounding with Holistic Text Understanding
    J. Woo, H. Ryu, Y. Jang, J. W. Cho, J. S. Chung
    ACM International Conference on Multimedia
    PDF
  • VoxSim: A perceptual voice similarity dataset
    J. Ahn, Y. Kim, Y. Choi, D. Kwak, J. Kim, S. Mun, J. S. Chung
    Interspeech
    PDF
  • Lightweight Audio Segmentation for Long-form Speech Translation
    J. Lee, S. Kim, H. Kim, J. S. Chung
    Interspeech
    PDF
  • ElasticAST: An Audio Spectrogram Transformer for All Length and Resolutions
    J. Feng, M. H. Erol, J. S. Chung, A. Senocak
    Interspeech
    PDF
  • To what extent can ASV systems naturally defend against spoofing attacks?
    J. Jung, X. Wang, N. Evans, S. Watanabe, H. Shim, H. Tak, S. Arora, J. Yamagishi, J. S. Chung
    Interspeech
    PDF
  • Disentangled Representation Learning for Environment-agnostic Speaker Recognition
    K. Nam, H. Heo, J. Jung, J. S. Chung
    Interspeech
    PDF
  • FlowAVSE: Efficient Audio-Visual Speech Enhancement with Conditional Flow Matching
    C. Jung, S. Lee, J. Kim, J. S. Chung
    Interspeech
    PDF
  • EquiAV: Leveraging Equivariance for Audio-Visual Contrastive Learning
    J. Kim, H. Lee, K. Rho, J. Kim, J. S. Chung
    International Conference on Machine Learning
    PDF
  • Faces that Speak: Jointly Synthesising Talking Face and Speech from Text
    Y. Jang, J. Kim, J. Ahn, D. Kwak, H. Yang, Y. Ju, I. Kim, B. Kim, J. S. Chung
    IEEE Conference on Computer Vision and Pattern Recognition
    PDF
  • Scaling Up Video Summarization Pretraining with Large Language Models
    D. M. Argaw, S. Yoon, F. C. Heilbron, H. Deilamsalehy, T. Bui, Z. Wang, F. Dernoncourt, J. S. Chung
    IEEE Conference on Computer Vision and Pattern Recognition
    PDF
  • Towards Automated Movie Trailer Generation
    D. M. Argaw, M. Soldan, A. Pardo, C. Zhao, F. C. Heilbron, J. S. Chung, B. Ghanem
    IEEE Conference on Computer Vision and Pattern Recognition
    PDF
  • FreGrad: Lightweight and fast frequency-aware diffusion vocoder
    T. D. Nguyen, J. Kim, Y. Jang, J. Kim, J. S. Chung
    International Conference on Acoustics, Speech, and Signal Processing
    PDF Project page
  • SlowFast Network for Continuous Sign Language Recognition
    J. Ahn, Y. Jang, J. S. Chung
    International Conference on Acoustics, Speech, and Signal Processing
    PDF
  • Rethinking Session Variability: Leveraging Session Embeddings for Session Robustness in Speaker Verification
    H. Heo, K. Nam, B. Lee, Y. Kwon, M. Lee, Y. J. Kim, J. S. Chung
    International Conference on Acoustics, Speech, and Signal Processing
    PDF
  • Speech Guided Masked Image Modeling for Visually Grounded Speech
    J. Woo, H. Ryu, A. Senocak, J. S. Chung
    International Conference on Acoustics, Speech, and Signal Processing
    PDF
  • VoxMM: Rich Transcription of Conversations in the Wild
    D. Kwak, J. Jung, K. Nam, Y. Jang, J. Jung, S. Watanabe, J. S. Chung
    International Conference on Acoustics, Speech, and Signal Processing
    PDF
  • From Coarse To Fine: Efficient Training for Audio Spectrogram Transformers
    J. Feng, M. H. Erol, J. S. Chung, A. Senocak
    International Conference on Acoustics, Speech, and Signal Processing
    PDF
  • VoiceLDM: Text-to-Audio Generation with Linguistic Content
    Y. Lee, I. Yeon, J. Nam, J. S. Chung
    International Conference on Acoustics, Speech, and Signal Processing
    PDF Project page
  • TalkNCE: Improving Active Speaker Detection with Talking-Aware Contrastive Learning
    C. Jung, S. Lee, K. Nam, K. Rho, Y. J. Kim, Y. Jang, J. S. Chung
    International Conference on Acoustics, Speech, and Signal Processing
    PDF
  • Seeing Through the Conversation: Audio-Visual Speech Separation based on Diffusion Model
    S. Lee, C. Jung, Y. Jang, J. Kim, J. S. Chung
    International Conference on Acoustics, Speech, and Signal Processing
    PDF
  • Let There Be Sound: Reconstructing High Quality Speech from Silent Videos
    J. Kim, J. Kim, J. S. Chung
    AAAI Conference on Artificial Intelligence
    PDF Project page
  • Can CLIP Help Sound Source Localization?
    S. Park, A. Senocak, J. S. Chung
    Winter Conference on Applications of Computer Vision
    PDF

2023

  • That's What I Said: Fully-Controllable Talking Face Generation
    Y. Jang, K. Rho, J. Woo, H. Lee, J. Park, Y. Lim, B. Kim, J. S. Chung
    ACM International Conference on Multimedia
    PDF Project page
  • Sound Source Localization is All about Cross-Modal Alignment
    A. Senocak, H. Ryu, J. Kim, T. Oh, H. Pfister, J. S. Chung
    International Conference on Computer Vision
    PDF
  • FlexiAST: Flexibility is What AST Needs
    J. Feng, M. H. Erol, J. S. Chung, A. Senocak
    Interspeech
    PDF
  • Disentangled Representation Learning for Multilingual Speaker Recognition
    K. Nam, Y. Kim, J. Huh, H. Heo, J. Jung, J. S. Chung
    Interspeech
    PDF Project page
  • Curriculum learning for self-supervised speaker verification
    H. Heo, J. Jung, J. Kang, Y. Kwon, B. Lee, Y. J. Kim, J. S. Chung
    Interspeech
    PDF
  • Self-sufficient framework for continuous sign language recognition
    Y. Jang, Y. Oh, J. W. Cho, M. Kim, D. Kim, I. S. Kweon, J. S. Chung
    International Conference on Acoustics, Speech, and Signal Processing
    PDF Project page
  • Metric learning for user-defined keyword spotting
    J. Jung, Y. Kim, J. Park, Y. Lim, B. Kim, Y. Jang, J. S. Chung
    International Conference on Acoustics, Speech, and Signal Processing
    PDF Project page
  • Hindi as a second language: improving visually grounded speech with semantically similar samples
    H. Ryu, A. Senocak, I. S. Kweon, J. S. Chung
    International Conference on Acoustics, Speech, and Signal Processing
    PDF
  • MarginNCE: Robust Sound Localization with a Negative Margin
    S. Park, A. Senocak, J. S. Chung
    International Conference on Acoustics, Speech, and Signal Processing
    PDF
  • Advancing the dimensionality reduction of speaker embeddings for speaker diarisation: disentangling noise and informing speech activity
    Y. J. Kim, H. Heo, J. Jung, Y. Kwon, B. Lee, J. S. Chung
    International Conference on Acoustics, Speech, and Signal Processing
    PDF
  • In search of strong embedding extractors for speaker diarisation
    J. Jung, B. Lee, J. Huh, A. Brown, Y. Kwon, S. Watanabe, J. S. Chung
    International Conference on Acoustics, Speech, and Signal Processing
    PDF
  • Imaginary Voice: Face-styled Diffusion Model for Text-to-Speech
    J. Lee, J. S. Chung, S. Chung
    International Conference on Acoustics, Speech, and Signal Processing
    PDF

2022

  • Signing Outside the Studio: Benchmarking Background Robustness for Continuous Sign Language Recognition
    Y. Jang, Y. Oh, J. W. Cho, D. Kim, J. S. Chung, I. S. Kweon
    British Machine Vision Conference
    PDF Project page
  • Augmentation adversarial training for self-supervised speaker representation learning
    J. Kang, J. Huh, H. Heo, J. S. Chung
    Journal of Selected Topics in Signal Processing
    PDF
  • Pushing the limits of raw waveform speaker recognition
    J. Jung, Y. J. Kim, H. Heo, B. Lee, Y. Kwon, J. S. Chung
    Interspeech
    PDF
  • Spell my name: Keyword boosted speech recognition
    N. Jung, G. Kim, J. S. Chung
    International Conference on Acoustics, Speech, and Signal Processing
    PDF
  • Multi-scale speaker embedding-based graph attention networks for speaker diarisation
    Y. Kwon, H. Heo, J. Jung, Y. J. Kim, B. Lee, J. S. Chung
    International Conference on Acoustics, Speech, and Signal Processing
    PDF
  • AASIST: Audio Anti-Spoofing using Integrated Spectro-Temporal Graph Attention Networks
    J. Jung, H. Heo, H. Tak, H. Shim, J. S. Chung, B. Lee, H. Yu, N. Evans
    International Conference on Acoustics, Speech, and Signal Processing
    PDF

2021

  • Adapting Speaker Embeddings for Speaker Diarization
    Y. Kwon, J. Jung, H. Heo, Y. J. Kim, B. Lee, J. S. Chung
    Interspeech
    PDF
  • Three-class Overlapped Speech Detection using a Convolutional Recurrent Neural Network
    J. Jung, H. Heo, Y. Kwon, J. S. Chung, B. Lee
    Interspeech
    PDF
  • Look Who's Talking: Active Speaker Detection in the Wild
    Y. J. Kim, H. Heo, S. Choe, S. Chung, Y. Kwon, B. Lee, Y. Kwon, J. S. Chung
    Interspeech
    PDF Project page
  • Playing a Part: Speaker Verification at the Movies
    A. Brown, J. Huh, A. Nagrani, J. S. Chung, A. Zisserman
    International Conference on Acoustics, Speech, and Signal Processing
    PDF
  • The ins and outs of speaker recognition: lessons from VoxSRC 2020
    Y. Kwon, H. Heo, B. Lee, J. S. Chung
    International Conference on Acoustics, Speech, and Signal Processing
    PDF
  • Graph Attention Networks for Speaker Verification
    J. Jung, H. Heo, H. Yu, J. S. Chung
    International Conference on Acoustics, Speech, and Signal Processing
    PDF
  • Look who's not talking
    Y. Kwon, H. Heo, J. Huh, B. Lee, J. S. Chung
    IEEE Spoken Language Technology Workshop
    PDF
  • Metric Learning for Keyword Spotting
    J. Huh, M. Lee, H. Heo, S. Mun, J. S. Chung
    IEEE Spoken Language Technology Workshop
    PDF
  • Cross attentive pooling for speaker verification
    S. M. Kye, Y. Kwon, J. S. Chung
    IEEE Spoken Language Technology Workshop
    PDF
  • Supervised attention for speaker recognition
    S. M. Kye, J. S. Chung, H. Kim
    IEEE Spoken Language Technology Workshop
    PDF

2020

  • Perfect Match: Self-Supervised Embeddings for Cross-modal Retrieval
    S. Chung, J. S. Chung, H. Kang
    Journal of Selected Topics in Signal Processing
    PDF
  • Augmentation adversarial training for self-supervised speaker recognition
    J. Huh, H. Heo, J. Kang, S. Watanabe, J. S. Chung
    Workshop on Self-Supervised Learning for Speech and Audio Processing, NeurIPS
    PDF
  • FaceFilter: Audio-visual speech separation using still images
    S. Chung, S. Choe, J. S. Chung, H. Kang
    Interspeech
    PDF Video
  • Seeing voices and hearing voices: learning discriminative embeddings using cross-modal self-supervision
    S. Chung, H. Kang, J. S. Chung
    Interspeech
    PDF
  • Spot the conversation: speaker diarisation in the wild
    J. S. Chung, J. Huh, A. Nagrani, T. Afouras, A. Zisserman
    Interspeech
    PDF
  • Now you’re speaking my language: Visual language identification
    T. Afouras, J. S. Chung, A. Zisserman
    Interspeech
    PDF
  • In defence of metric learning for speaker recognition
    J. S. Chung, J. Huh, S. Mun, M. Lee, H. Heo, S. Choe, C. Ham, S. Jung, B. Lee, I. Han
    Interspeech
    PDF
  • Self-supervised learning of audio-visual objects from video
    T. Afouras, A. Owens, J. S. Chung, A. Zisserman
    European Conference on Computer Vision
    PDF
  • BSL-1K: Scaling up co-articulated sign recognition using mouthing cues
    S. Albanie, G. Varol, L. Momeni, T. Afouras, J. S. Chung, N. Fox, A. Zisserman
    European Conference on Computer Vision
    PDF
  • Delving into VoxCeleb: environment invariant speaker recognition
    J. S. Chung, J. Huh, S. Mun
    Speaker Odyssey
    PDF
  • ASR is all you need: Cross-modal distillation for lip reading
    T. Afouras, J. S. Chung, A. Zisserman
    International Conference on Acoustics, Speech, and Signal Processing
    PDF
  • Disentangled Speech Embeddings using Cross-Modal Self-Supervision
    A. Nagrani, J. S. Chung, S. Albanie, A. Zisserman
    International Conference on Acoustics, Speech, and Signal Processing
    PDF
  • The sound of my voice: speaker representation loss for target voice separation
    S. Mun, S. Choe, J. Huh, J. S. Chung
    International Conference on Acoustics, Speech, and Signal Processing
    PDF

2019

  • Deep Audio-Visual Speech Recognition
    T. Afouras, J. S. Chung, A. Senior, O. Vinyals, A. Zisserman
    IEEE Transactions on Pattern Analysis and Machine Intelligence
    PDF Project page
  • You said that? : Synthesising talking faces from audio
    A. Jamaludin, J. S. Chung, A. Zisserman
    International Journal of Computer Vision
    PDF
  • VoxCeleb: Large-scale speaker verification in the wild
    A. Nagrani, J. S. Chung, W. Xie, A. Zisserman
    Computer Speech and Language
    PDF
  • Who said that?: Audio-visual speaker diarisation of real-world meetings
    J. S. Chung, B. Lee, I. Han
    Interspeech
    PDF
  • My lips are concealed: Audio-visual speech enhancement through obstructions
    T. Afouras, J. S. Chung, A. Zisserman
    Interspeech
    PDF Project page
  • Naver at ActivityNet Challenge 2019--Task B Active Speaker Detection (AVA)
    J. S. Chung
    International Challenge on Activity Recognition
    PDF
  • Utterance-level Aggregation For Speaker Recognition In The Wild
    W. Xie, A. Nagrani, J. S. Chung, A. Zisserman
    International Conference on Acoustics, Speech, and Signal Processing
    PDF Project page
  • Perfect match: Improved cross-modal embeddings for audio-visual synchronisation
    S. Chung, J. S. Chung, H. Kang
    International Conference on Acoustics, Speech, and Signal Processing
    PDF Model

2018

  • Learning to Lip Read Words by Watching Videos
    J. S. Chung, A. Zisserman
    Computer Vision and Image Understanding
    PDF
  • VoxCeleb2: Deep Speaker Recognition
    J. S. Chung, A. Nagrani, A. Zisserman
    Interspeech
    PDF Project page
  • The Conversation: Deep Audio-Visual Speech Enhancement
    T. Afouras, J. S. Chung, A. Zisserman
    Interspeech
    PDF Project page
  • Deep Lip Reading: a comparison of models and an online application
    T. Afouras, J. S. Chung, A. Zisserman
    Interspeech
    PDF Project page

2017

2016

  • Out of time: automated lip sync in the wild
    J. S. Chung, A. Zisserman
    Workshop on Multi-view Lip-reading, ACCV
    PDF Project page
  • Lip Reading in the Wild
    J. S. Chung, A. Zisserman
    Asian Conference on Computer Vision
    PDF Project page
  • Signs in time: Encoding human motion as a temporal image
    J. S. Chung, A. Zisserman
    Workshop on Brave New Ideas for Motion Representations, ECCV
    PDF Video

KAIST logo