Multimodal AI Lab

School of Electrical Engineering, KAIST


We are looking for motivated students in machine learning, speech processing and computer vision. Please read this page for more information.

Recent highlights

Speech generation from silent videos

Jihoon Kim et al. (2024), "Let There Be Sound: Reconstructing High Quality Speech from Silent Videos", Proc. AAAI

Text-to-Speech with environmental context

Yeonghyeon Lee et al. (2024), "VoiceLDM: Text-to-Speech with Environmental Context", Proc. ICASSP

Talking face synthesis

Youngjoon Jang et al. (2023), "That's What I Said: Fully-Controllable Talking Face Generation", Proc. ACMMM

Audio-visual speech separation

Suyeon Lee et al. (2024), "Seeing Through the Conversation: Audio-Visual Speech Separation based on Diffusion Model", Proc. ICASSP

Audio-visual sound source localization

Sooyoung Park et al. (2024), "Can CLIP Help Sound Source Localization?", Proc. WACV

Audio-visual image search

Arda Senocak et al. (2023), "Sound Source Localization is All about Cross-Modal Alignment", Proc. ICCV

KAIST logo