In order to solve the problem of insufficient learning ability of mutual representation of two modal information in existing visual question answer models, the Global-Local Attention Network (GLAN) approach to visual question answer model with global-local features is proposed by analyzing the reasons why the model cannot distinguish the relationship between similar targets in images and the reasons for the model's reliance on text information. To supplement the missing local information in the global features, the model is designed with a convolution guided attention module to enhance the image representation capability and increase the bias of image information in the model. Experiments are conducted on VQAv2.0 and GQA datasets, and the accuracy of the proposed model is improved by 0.60% and 0.38%, respectively, compared with Baseline, which can effectively improve the inference ability of the model.
Low-resource machine translation usually uses data augmentation or constrained hidden space representations to improve translation quality, which ignores the knowledge representation divergence between different languages in latent space. We propose a latent space knowledge representation enhancement to improve the translation effect by reducing the divergence of knowledge representation between the source language and target language. Firstly, this method splits knowledge and feature representation from text representation in each language. Then, it promotes the knowledge representation of mutual learning between the two languages to reduce divergence. Next, the optimized knowledge representation and feature representation are re-combined to obtain enhanced text representation. Finally, the enhanced text representation is translated and reconstructed to reduce the differences between knowledge representations further. Through extensive experiments on public low-resource datasets 'English-German' and 'English-Turkish,' The method can achieve better performance on the test set. The results show that the method can effectively improve the ability of low-resource machine translation by reducing the divergence in knowledge representation between languages.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.