Hand segmentation is usually considered a pixel-wise binary classification problem, where the foreground hand is meant to be recognized in an input image. However, we envision that finger-level hand segmentation is more useful for applications like hand gesture and sign language recognition. Therefore, in this paper, we compare five state-of-the-art (SOTA) real-time semantic segmentation methods for the task of finger-level hand segmentation. To do that, we introduce two subsets consisted of 1,000 images manually annotated pixel-wise selected from new proposed datasets of hand gesture and world-level sign language recognition. With these subsets, we evaluate the accuracy of the recent SOTA methods of DABNet, FastSCNN, FC-HardNet, FASSDNet, and DDRNet. Since each subset has relatively few images (500), we introduce a simple yet effective loss function to train with synthetic data that includes the same annotations. Finally, we present a real-time performance evaluation of the five algorithms on the NVIDIA Jetson family of GPU-powered embedded systems, including Jetson Xavier NX, Jetson TX2, and Jetson Nano.
This paper describes one of the assistance methods for annotation tasks of sign language words using binary action segmentation. The binary action segmentation divides a sign video into binary units, which correspond to during sign and static posture. At this time, the user's annotation tasks can be reduced from the full-manual work to inputting labels and correction of the segmented units. The proposed binary action segmentation is composed of Support Vector Machine and Graphcuts. The trained Support Vector Machine classifies each frame into Motion or Pause, and Graphcuts refines the initial segmentation. We evaluated the proposed method with a Japanese sign language words database. The database includes 92 Japanese sign language words which are signed by ten native signers. The total number of videos is 4,590, and 3,800 videos of 76 words except for recording and sign errors are used for the evaluation. The proposed method achieves comparable result with a smaller amount of training data than the previous method. Moreover, the work reduction ratios of annotation tasks using an annotation interface were 26:17%, 26:34%, and 17:88% for the sets whose the numbers of segmented units were 2, 3, and 4, respectively.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.