Paper
10 November 2022 Attention based CNN-LSTM network for video caption
XinBo Ai, Qiang Li
Author Affiliations +
Proceedings Volume 12331, International Conference on Mechanisms and Robotics (ICMAR 2022); 123315O (2022) https://doi.org/10.1117/12.2652227
Event: International Conference on Mechanisms and Robotics (ICMAR 2022), 2022, Zhuhai, China
Abstract
Due to the demand and wide application of video caption in various fields such as video retrieval, content recommendation, risk management, etc., how to extract a comprehensive and highly generalized description of the information has been an active research area for many years. In this paper, we propose a new model that includes the fusion of convolutional neural network and attention mechanism. The special features are extracted from multiple perspectives such as scene, target and behavior in the video, and combined with key frame semantic information to reduce the interference of redundant information and complete the feature representation of the information, and the attention-weighted fusion of the input of the above four features is input to LSTM decoding to finally generate the video content title. A multi-baseline comparison of the two public datasets is performed. Multiple evaluation metrics prove that our model outperforms other models and also show that the information representation of this paper.
© (2022) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
XinBo Ai and Qiang Li "Attention based CNN-LSTM network for video caption", Proc. SPIE 12331, International Conference on Mechanisms and Robotics (ICMAR 2022), 123315O (10 November 2022); https://doi.org/10.1117/12.2652227
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Video

Semantic video

Visualization

Data modeling

Computer programming

Feature extraction

Information visualization

Back to Top