Others

The codecfake dataset and countermeasures for the universally detection of deepfake audio

Jan 1, 2025

Mixture of experts fusion for fake audio detection using frozen wav2vec 2.0

Jan 1, 2025

DPI-TTS: Directional Patch Interaction for Fast-Converging and Style Temporal Modeling in Text-to-Speech

Jan 1, 2025

VQ-CTAP: Cross-Modal Fine-Grained Sequence Representation Learning for Speech Processing

Jan 1, 2024

Unlocking the Power of Emotions: Enhancing Personality Trait Recognition Through Utilization of Emotional Cues

Jan 1, 2024

Transferring Personality Knowledge to Multimodal Sentiment Analysis

Jan 1, 2024

Text Prompt is Not Enough: Sound Event Enhanced Prompt Adapter for Target Style Audio Generation

Jan 1, 2024

Temporal Shift for Personality Recognition with Pre-Trained Representations

Jan 1, 2024

PPPR: Portable Plug-in Prompt Refiner for Text to Audio Generation

Jan 1, 2024

MINT: a Multi-modal Image and Narrative Text Dubbing Dataset for Foley Audio Content Planning and Generation

Jan 1, 2024