VQ-CTAP: Cross-Modal Fine-Grained Sequence Representation Learning for Speech ProcessingJan 1, 2024·Chunyu Qiang,Wang Geng,Yi Zhao,Ruibo Fu,Tao Wang,Cheng Gong,Tianrui Wang,Qiuyu Liu,Jiangyan Yi,Zhengqi Wen,Others· 0 min read CiteTypeJournal articlePublicationarXiv preprint arXiv:2408.05758Last updated on Jan 1, 2024 ← Unlocking the Power of Emotions: Enhancing Personality Trait Recognition Through Utilization of Emotional Cues Jan 1, 2024Adaptive fake audio detection with low-rank model squeezing Jan 1, 2023 →