Paper
25 March 2023 A transformer-based semantic segmentation model for street fashion images
Dingjie Peng, Wataru Kameyama
Author Affiliations +
Proceedings Volume 12592, International Workshop on Advanced Imaging Technology (IWAIT) 2023; 125920W (2023) https://doi.org/10.1117/12.2666583
Event: International Workshop on Advanced Imaging Technology (IWAIT) 2023, 2023, Jeju, Korea, Republic of
Abstract
Semantic segmentation is a pixel-level classification problem in computer vision, in which pixels of the same class are grouped into a single category in order to interpret pictures at the pixel level. In this field, semantic segmentation of street fashion images is a challenging task since the clothing items would appear with wide variations in fabrics, layering, occlusion and viewpoint. To help better understanding the street fashion images, we propose a lightweight Semantic Context Aware Transformer (SCAT) to be applied to the semantic segmentation task for street fashion images, which integrates semantic context into the encoding, and models the relationship between multi-level outputs from transformer layers. Extensive experiments and comparisons show that the proposal achieves the state-of-the-art results on ModaNet dataset with relatively small model size, with over 1.1 point improvement compared to Shunted Transformer, and even surpasses other CNNs and Transformers with a large margin of over 2 point in mIoU.
© (2023) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Dingjie Peng and Wataru Kameyama "A transformer-based semantic segmentation model for street fashion images", Proc. SPIE 12592, International Workshop on Advanced Imaging Technology (IWAIT) 2023, 125920W (25 March 2023); https://doi.org/10.1117/12.2666583
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Semantics

Image segmentation

Transformers

Design and modelling

Back to Top