Open Access Paper
12 May 2022 A variational auto-encoder framework with attention mechanism for image captioning (Withdrawal Notice)
Qingqing Sun
Author Affiliations +
Proceedings Volume 12173, International Conference on Optics and Machine Vision (ICOMV 2022); 1217312 (2022) https://doi.org/10.1117/12.2634376
Event: International Conference on Optics and Machine Vision (ICOMV 2022), 2022, Guangzhou, China
Abstract
As two core fields of machine learning, Natural Language Processing and Computer Vision have derived a variety of cross research topics, and image captioning as a key topic of the current cross field has an important relationship with the development of both. Image captioning has been attached importance to by researchers for a long time, and has formed a relatively perfect theoretical model system. At present, this topic has made great progress in accuracy, but this is only one aspect of the evaluation of image content description algorithm, which has new requirements in diversity, stylization and controllability. Based on this, this paper studies the technical routes and modeling methods of various auto-coding models, variational models and attention mechanisms, proposing a new way to solve the inherent contradiction between diversity and accuracy, that is to increase the diversity of each sample while ensuring the upper bound of accuracy index scores in multiple samples. The test results on MS COCO data set show that the model implemented in this paper has good performance in accuracy and diversity.
© (2022) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Qingqing Sun "A variational auto-encoder framework with attention mechanism for image captioning (Withdrawal Notice)", Proc. SPIE 12173, International Conference on Optics and Machine Vision (ICOMV 2022), 1217312 (12 May 2022); https://doi.org/10.1117/12.2634376
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Data modeling

Visual process modeling

Computer programming

Image processing

Data conversion

Analytical research

Image compression

Back to Top