Shuchen Shi

Mixture of experts fusion for fake audio detection using frozen wav2vec 2.0

Jan 1, 2025

DPI-TTS: Directional Patch Interaction for Fast-Converging and Style Temporal Modeling in Text-to-Speech

Jan 1, 2025

Text Prompt is Not Enough: Sound Event Enhanced Prompt Adapter for Target Style Audio Generation

Jan 1, 2024

PPPR: Portable Plug-in Prompt Refiner for Text to Audio Generation

Jan 1, 2024

MINT: a Multi-modal Image and Narrative Text Dubbing Dataset for Foley Audio Content Planning and Generation

Jan 1, 2024

Mel-Refine: A Plug-and-Play Approach to Refine Mel-Spectrogram in Audio Generation

Jan 1, 2024

ICAGC 2024: Inspirational and Convincing Audio Generation Challenge 2024

Jan 1, 2024

EELE: Exploring Efficient and Extensible LoRA Integration in Emotional Text-to-Speech

Jan 1, 2024

ASRRL-TTS: Agile Speaker Representation Reinforcement Learning for Text-to-Speech Speaker Adaptation

Jan 1, 2024

A multi-speaker multi-lingual voice cloning system based on vits2 for limmits 2024 challenge

Jan 1, 2024