Paper
22 December 2022 Text-dominated multimodal text classification with quadruplet attention
Yuanzhuo Li
Author Affiliations +
Proceedings Volume 12508, International Symposium on Artificial Intelligence and Robotics 2022; 1250805 (2022) https://doi.org/10.1117/12.2665438
Event: Seventh International Symposium on Artificial Intelligence and Robotics 2022, 2022, Shanghai, China
Abstract
Visual-linguistic interaction is a problem of addressing the obstacles of Information Deficiency and Weight Deviation in text classification. Information deficiency usually occurs in vision-dominated or modality-balanced tasks, and many multimodal fusion approaches have been proposed (e.g. Gated-based and Contextualized-based methods). However, there is still no remarkable solution to the weight deviation of irrelevant descriptions in textdominated tasks. To solve it, we introduce a novel Quadruplet Attention to adjust the textual-visual weight distribution, where visual-linguistic information interacts with each other and dot product can be represented as a 2 × 2 matrix, referred to as quadruplet. Then, a Multimodal Architecture is further proposed to enhance text-dominated classification. Extensive experiments on Daily Mail have proved the effectiveness of our method, which achieves significant improvements of 1.42 and 3.4 respectively in ROUGE-L F1 and Marco F1.
© (2022) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Yuanzhuo Li "Text-dominated multimodal text classification with quadruplet attention", Proc. SPIE 12508, International Symposium on Artificial Intelligence and Robotics 2022, 1250805 (22 December 2022); https://doi.org/10.1117/12.2665438
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Visualization

Information visualization

Transformers

Feature extraction

Semantics

Image segmentation

Binary data

Back to Top