Paper
3 January 2025 A systematic research of text-to-audio generation with diffusion models
Yiming Gao
Author Affiliations +
Proceedings Volume 13442, Fifth International Conference on Signal Processing and Computer Science (SPCS 2024); 134421O (2025) https://doi.org/10.1117/12.3053123
Event: Fifth International Conference on Signal Processing and Computer Science (SPCS 2024), 2024, Kaifeng, China
Abstract
Artificial intelligence is used in people’s daily lives, coping with various kinds of work in almost every aspect of our lives nowadays. Among various kinds of AI, generative AI, ranging from ChatGPT to AI painting, has been discussed most frequently. Some models of generative AI have been used in audio areas, such as denoising or generating audios in recent years. Additionally, though different generative AI has its own advantages, the diffusion model is the most outstanding one among all of those models, leaving people with unforgettable impressions and infinite possibilities by taking advantage of its high fidelity compared to other models, such as GAN.
(2025) Published by SPIE. Downloading of the abstract is permitted for personal use only.
Yiming Gao "A systematic research of text-to-audio generation with diffusion models", Proc. SPIE 13442, Fifth International Conference on Signal Processing and Computer Science (SPCS 2024), 134421O (3 January 2025); https://doi.org/10.1117/12.3053123
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Education and training

Autoregressive models

Diffusion

Data modeling

Artificial intelligence

Gallium nitride

Systems modeling

Back to Top