Text-dominated multimodal text classification with quadruplet attention

Yuanzhuo Li

doi:10.1117/12.2665438

22 December 2022 Text-dominated multimodal text classification with quadruplet attention

Yuanzhuo Li

Proceedings Volume 12508, International Symposium on Artificial Intelligence and Robotics 2022; 1250805 (2022) https://doi.org/10.1117/12.2665438
Event: Seventh International Symposium on Artificial Intelligence and Robotics 2022, 2022, Shanghai, China

Abstract

Visual-linguistic interaction is a problem of addressing the obstacles of Information Deficiency and Weight Deviation in text classification. Information deficiency usually occurs in vision-dominated or modality-balanced tasks, and many multimodal fusion approaches have been proposed (e.g. Gated-based and Contextualized-based methods). However, there is still no remarkable solution to the weight deviation of irrelevant descriptions in textdominated tasks. To solve it, we introduce a novel Quadruplet Attention to adjust the textual-visual weight distribution, where visual-linguistic information interacts with each other and dot product can be represented as a 2 × 2 matrix, referred to as quadruplet. Then, a Multimodal Architecture is further proposed to enhance text-dominated classification. Extensive experiments on Daily Mail have proved the effectiveness of our method, which achieves significant improvements of 1.42 and 3.4 respectively in ROUGE-L F1 and Marco F1.

Citation Download Citation

Yuanzhuo Li "Text-dominated multimodal text classification with quadruplet attention", Proc. SPIE 12508, International Symposium on Artificial Intelligence and Robotics 2022, 1250805 (22 December 2022); https://doi.org/10.1117/12.2665438

ACCESS THE FULL ARTICLE

INSTITUTIONAL
Select your institution to access the SPIE Digital Library.

SELECT YOUR INSTITUTION

PERSONAL
Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.

PERSONAL SIGN IN

No SPIE Account? Create one

PURCHASE THIS CONTENT

SUBSCRIBE TO DIGITAL LIBRARY

50 downloads per 1-year subscription

Members: $195

Non-members: $335 ADD TO CART

25 downloads per 1 - year subscription

Members: $145

Non-members: $250 ADD TO CART

PURCHASE SINGLE ARTICLE

Includes PDF, HTML & Video, when available

Members: $17.00

Non-members: $21.00 ADD TO CART

PROCEEDINGS
9 PAGES

DOWNLOAD PAPER SAVE TO MY LIBRARY

GET CITATION

RIGHTS & PERMISSIONS

Get copyright permission Get copyright permission on Copyright Marketplace

KEYWORDS

Visualization

Information visualization

Transformers

Feature extraction

Semantics

Image segmentation

Binary data

Show All Keywords

Keywords/Phrases

Search In:

Publication Years