MINT: a Multi-modal Image and Narrative Text Dubbing Dataset for Foley Audio Content Planning and Generation

Jan 1, 2024·

Ruibo Fu

,

Shuchen Shi

,

Hongming Guo

,

Tao Wang

,

Chunyu Qiang

,

Zhengqi Wen

,

Jianhua Tao

,

Xin Qi

,

Yi Lu

,

Xiaopeng Wang

,

Others

· 0 min read

Type

Journal article

Publication

arXiv preprint arXiv:2406.10591

Last updated on Jan 1, 2024

← Minimally-supervised speech synthesis with conditional diffusion model and language model: A comparative study of semantic coding Jan 1, 2024

MisD-MoE: A Multimodal Misinformation Detection Framework with Adaptive Feature Selection Jan 1, 2024 →