Tao Wang

DPI-TTS: Directional Patch Interaction for Fast-Converging and Style Temporal Modeling in Text-to-Speech

Jan 1, 2025

VQ-CTAP: Cross-Modal Fine-Grained Sequence Representation Learning for Speech Processing

Jan 1, 2024

Text Prompt is Not Enough: Sound Event Enhanced Prompt Adapter for Target Style Audio Generation

Jan 1, 2024

PPPR: Portable Plug-in Prompt Refiner for Text to Audio Generation

Jan 1, 2024

MINT: a Multi-modal Image and Narrative Text Dubbing Dataset for Foley Audio Content Planning and Generation

Jan 1, 2024

Minimally-supervised speech synthesis with conditional diffusion model and language model: A comparative study of semantic coding

Jan 1, 2024

Mel-Refine: A Plug-and-Play Approach to Refine Mel-Spectrogram in Audio Generation

Jan 1, 2024

Learning speech representation from contrastive token-acoustic pretraining

Jan 1, 2024

ICAGC 2024: Inspirational and Convincing Audio Generation Challenge 2024

Jan 1, 2024

Emotion selectable end-to-end text-based speech editing

Jan 1, 2024