Ruibo Fu

Temporal Variability and Multi-Viewed Self-Supervised Representations to Tackle the ASVspoof5 Deepfake Challenge

Jan 1, 2024

Temporal Shift for Personality Recognition with Pre-Trained Representations

Jan 1, 2024

Scenefake: An initial dataset and benchmarks for scene fake audio detection

Jan 1, 2024

PPPR: Portable Plug-in Prompt Refiner for Text to Audio Generation

Jan 1, 2024

MisD-MoE: A Multimodal Misinformation Detection Framework with Adaptive Feature Selection

Jan 1, 2024

MINT: a Multi-modal Image and Narrative Text Dubbing Dataset for Foley Audio Content Planning and Generation

Jan 1, 2024

Minimally-supervised speech synthesis with conditional diffusion model and language model: A comparative study of semantic coding

Jan 1, 2024

Mel-Refine: A Plug-and-Play Approach to Refine Mel-Spectrogram in Audio Generation

Jan 1, 2024

Letstalk: Latent diffusion transformer for talking video synthesis

Jan 1, 2024

Learning speech representation from contrastive token-acoustic pretraining

Jan 1, 2024