Cross-lingual sentence embedding for mining low-resources parallel sentences

Chenhao Zhang; Yongzhong Huang; Yongqing Deng

doi:10.1117/12.2641013

2 December 2022 Cross-lingual sentence embedding for mining low-resources parallel sentences

Chenhao Zhang, Yongzhong Huang, Yongqing Deng

Proceedings Volume 12288, International Conference on Computer, Artificial Intelligence, and Control Engineering (CAICE 2022); 122881P (2022) https://doi.org/10.1117/12.2641013
Event: International Conference on Computer, Artificial Intelligence, and Control Engineering (CAICE 2022), 2022, Zhuhai, China

Abstract

Mining low-resource multilingual parallel sentences would be useful in natural language processing tasks. A large number of high-quality parallel sentences provide necessary data support for tasks such as building bilingual parallel corpora and cross-language information retrieval. In this paper, we present an approach to mine Thai-English parallel sentences in huge documents using cross-lingual sentence embedding. To evaluate the approach, we used two extensive bilingual corpora, which provide golden scores. On TED and Tanzil set, our approach improves nearly 0.73 points in AUC and reaches 96.5%. In the task of mining BUCC parallel corpus, our approach uses less time and space but gets an F1 score similar to the LaBSE model, which has reached state-of-the-art on BUCC. Our model not only solves the problem of sentence alignment with insufficient resources but also uses less time.

Citation Download Citation

Chenhao Zhang, Yongzhong Huang, and Yongqing Deng "Cross-lingual sentence embedding for mining low-resources parallel sentences", Proc. SPIE 12288, International Conference on Computer, Artificial Intelligence, and Control Engineering (CAICE 2022), 122881P (2 December 2022); https://doi.org/10.1117/12.2641013

ACCESS THE FULL ARTICLE

INSTITUTIONAL
Select your institution to access the SPIE Digital Library.

SELECT YOUR INSTITUTION

PERSONAL
Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.

PERSONAL SIGN IN

No SPIE Account? Create one

PURCHASE THIS CONTENT

SUBSCRIBE TO DIGITAL LIBRARY

50 downloads per 1-year subscription

Members: $195

Non-members: $335 ADD TO CART

25 downloads per 1 - year subscription

Members: $145

Non-members: $250 ADD TO CART

PURCHASE SINGLE ARTICLE

Includes PDF, HTML & Video, when available