In the domain of human pose estimation, graph convolutional networks have exhibited notable performance enhancements owing to their adeptness in naturally modeling the representation of human poses through graph structures. However, prevailing methods predominantly concentrate on the local physical connections between joints, overlooking higher-order neighboring nodes. This limitation curtails their ability to effectively exploit relationships between distant joints. This article introduces a Multiscale Spatio-Temporal Hypergraph Convolutional Network (MST-HCN) designed to capture spatio-temporal information and higher-order dependencies. MST-HCN encompasses two pivotal modules: Multiscale Hypergraph Convolution (MHCN) and Multiscale Temporal Convolution (MTCN). The MHCN module represents human poses as hypergraphs in various forms, enabling the comprehensive extraction of both local and global structural information. In contrast to traditional stride convolutions, MTCN leverages multiple branches to learn important frames based on their significance, thereby filtering out redundant frames. Experimental results underscore that MST-HCN surpasses state-of-the-art methods in benchmark tests such as Human3.6M and MPI-INF-3DHP.In particular, our proposed MST-HCN method boosts performance by 1.5% and 0.9%, compared to the closest latest method, using detected 2D poses and ground truth 2D settings respectively.
|