The aim of this study is to develop a practical tennis swing recognition system that helps beginners to learn the correct swing technique, correct incorrect movements and provide functions for counting and evaluating swings. To achieve this goal, we created a dataset called TSAD that is specifically tailored to tennis swing actions, like the public UCF101 dataset, to support model training and evaluation. We used an extended PP-TSMv2 model with local temporal attention (LTA) replacing the original global attention mechanism in the training dataset. The model was trained and evaluated on both the public UCF101 dataset and TSAD and showed significantly improved performance over the original model. The results of this study indicate that the tennis swing recognition system based on the improved PP-TSMv2 model has potential practical value, providing effective training and guidance for tennis players and forming the basis for further research and applications.
Continuous acquisition of the latest information about the shape of the object allows for more efficient and robust classification, as well as accurate estimation of the target state. However, previous methods have often overlooked this problem and used only the target information from the first frame in tracking. In this paper, we propose three specific and practical guidelines aimed at updating the target state, enabling the development of an anchor-free generic object tracker without requiring any prior knowledge. These guidelines offer a clear path and direction for the development process. Using these guidelines, we develop our Dynamic Update Template (DUT) tracker that includes a template, a dynamic template, and a search branch, ensures unambiguous classification scores, provides estimation quality scores, and multiplies them to obtain the pcore, which serves as the basis for updating the dynamic template. By conducting thorough analyses and ablation studies, we validate the efficacy of our proposed guidelines. Our DUT tracker achieves better performance on challenging benchmarks (LaSOT) without excessive modifications. On the extensive TrackingNet dataset, DUT attains an impressive AUC score of 82.1 while maintaining a swift frame rate exceeding 90 FPS, surpassing the threshold for real-time performance.
In recent years, methods based on deep convolutional neural networks (CNNs) have gradually become the focus of research in the field of hyperspectral image (HSI) classification. It is well known that hyperspectral data itself contains spatial and spectral information. While CNN-based methods have advantages in extracting local spatial features, they are not good at handling spectral features and global information. Therefore, this paper proposes a multi-attention network that fuses local and key channel information to complete the task of HSI classification. First, the principal component analysis (PCA) is used to pre-process the HSI data. Second, a feature information fusion module based on the SE module and 2D convolution is constructed to fuse local spatial information and enhanced feature channel information. Third, the global covariance pooling function accelerates the convergence rate of the network. Finally, the fused features are sent to the Vision Transformer (ViT) module for position encoding to capture global sequential information and improve the hyperspectral image classification results. Experiments carried out on several typical three public datasets demonstrate that the proposed network method can provide competitive results compared to the other state-of-the-art HSI networks.
Predicting accurate location of Protein Subcellular is conductive to acknowledging the function of protein and finding the cancer biomarkers. Unfortunately, many experimental approaches for classifying the location of protein subcellular are still high-cost and time-consuming. However, deep convolutional neural network has achieved significant advances in many fields, such as image classification, object detection and segmentation, it’s driving us to use the deep convolutional neural network to classify the protein subcellular images. Because of unavoidable differences between bioimages and natural images, for instance, the biological subcellular image texture information is not as clear as natural images. That’s means if we use a deep model to train bio-images for finishing classification task directly, its result of this experiment will be not as good as what we expected. Therefore, we utilize Partial Parameter Transfer Strategy (PPTS) and Spatial Pyramid Pooling (SPP) algorithm for achieving bio-images classification task. Using the partial parameter transfer strategy to optimize the training process of the deep model is the first step, and the second procedure is to use spatial pyramid pooling layer to optimize the architecture of deep convolutional neural network. To solve the task of bioimages classification via jointly with the above two algorithms, the performance shows that our approach can acquire better results than traditional deep learning methods.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.