20 April 2023 Multi-level feature fusion capsule network with self-attention for facial expression recognition
Zhiji Huang, Songsen Yu, Jun Liang
Author Affiliations +
Abstract

Different from generic image classification, fine-grained classification, such as facial expression classification, in which multiple expressions share inherently similar underlying facial appearances, may show a small difference between facial expression classes. Unlike lab-controlled data, facial expressions from natural scenes have rich forms of the same expression due to the diversity of subjects and the complexity of real-world conditions, and as a result, facial expressions may have large differences among samples within the same class. Moreover, there is little difference between facial expressions, and facial expressions are displayed simultaneously through various facial regions, which require us to encode the feature of multiple key regions, forming high-order interactive information. To address the aforementioned problems, we design an enhanced capsule network based on multi-level feature fusion attention mechanism, which is comprised of four critical components: multi-level feature extraction module (MFEM), multi-level attention module (MAM), multi-level capsule attention fusion module (MCAFM), and reconstruction module (RM). The MFEM collects the low-level, middle-level, and high-level features from the input image, therefore lowering the high-level convolution layer’s susceptibility to blurred image and the problem of pose variation. The MAM directs the network’s attention to the most significant features in different levels of image features and can assist the network in ignoring blurred, occluded, and irrelevant features and incorporating them into our self-attention center loss function to compress the element distribution in the same class. The MCAFM preserves the attributes of each face region (such as location, size, and direction) by transferring the features into capsules in preparation for the eventual creation of the dynamic routing mechanism, which can resolve the problem of image rotation on FER in the wild. Simultaneously, the capsule features of distinct areas are combined to provide higher-order overall feature information, enhancing the model’s capacity to discriminate between different kinds of expressions. The RM reconstructs the image and calculates the difference between the reconstructed image and the original input image. Our model outperforms a large number of current methods on two public datasets, RAF-DB and SFEW.

© 2023 SPIE and IS&T
Zhiji Huang, Songsen Yu, and Jun Liang "Multi-level feature fusion capsule network with self-attention for facial expression recognition," Journal of Electronic Imaging 32(2), 023038 (20 April 2023). https://doi.org/10.1117/1.JEI.32.2.023038
Received: 25 July 2022; Accepted: 29 March 2023; Published: 20 April 2023
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Feature fusion

Feature extraction

Convolution

Image restoration

Image classification

Data modeling

Facial recognition systems

Back to Top