Paper
10 November 2022 Language identification based on multi-scale and multi-dimensional convolution
Zhigang Song, Dongzhi He, Hongchen Jiang, Jiacheng Chang
Author Affiliations +
Proceedings Volume 12331, International Conference on Mechanisms and Robotics (ICMAR 2022); 1233149 (2022) https://doi.org/10.1117/12.2652306
Event: International Conference on Mechanisms and Robotics (ICMAR 2022), 2022, Zhuhai, China
Abstract
Aiming at the problem of insufficient feature learning of time-delay neural networks, which is widely used in the field of language identification, a new architecture called multi-scale and multi-dimensional convolution is proposed. The structure includes a global inter-frame correlation network, local and global multi-scale network, global channel correlation network, and multi-head attention statistics pooling layer. The global inter-frame correlation network models the global context at the initial frame layer to obtain the dependency characteristics of the global context, which makes up for the natural deficiency of time-delay neural network based on limited context; local and global multi-scale networks aggregate the information within and between layers to extract features on a finer and more complex scale; the global channel correlation network is explicitly modeled from the channel dimension to realize the adaptive correction of the channel dimension characteristics; The attention statistics pool layer is extended to multiple heads so that features can be distinguished from multiple aspects. Through the training of the AP17-OLR data set, it has been improved by 41% compared with the previous excellent model.
© (2022) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Zhigang Song, Dongzhi He, Hongchen Jiang, and Jiacheng Chang "Language identification based on multi-scale and multi-dimensional convolution", Proc. SPIE 12331, International Conference on Mechanisms and Robotics (ICMAR 2022), 1233149 (10 November 2022); https://doi.org/10.1117/12.2652306
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Computer programming

Convolution

Network architectures

Statistical modeling

Neural networks

Head

Performance modeling

Back to Top