Currently, there are issues of sample imbalance and insufficient sample quantity in graph-based Alzheimer's disease prediction methods. This can lead to classifiers being biased towards the majority class samples and result in overfitting. To address this problem, a graph-based data augmentation node expansion algorithm is proposed. Firstly, graph representation learning is used to reduce the original feature vectors to a low-dimensional space. This message aggregation method ensures that the low-dimensional vectors contain the potential structural information of the data, preventing structural damage that may arise from direct expansion on the original data. Secondly, in the low-dimensional space, an adaptive-weight node expansion algorithm is employed to generate new nodes, overcoming the boundary fuzziness issue of traditional oversampling algorithms. This weight expansion algorithm adjusts the priority of each expansion node to control the generation position and quantity of new nodes. Finally, the expanded graph is fed into a Graph Neural Network classifier for prediction. Quantitative experiments on the Tadpole dataset and NACC dataset demonstrate that the proposed graph-based data augmentation model achieves the highest accuracy. The average accuracy was 93.84% vs 92.8% on the Tadpole dataset and 90.11% vs 88.29% on the NACC dataset. In addition, additional ablation experiments have demonstrated the effectiveness of node expansion in graph structures.
In the domain of human pose estimation, graph convolutional networks have exhibited notable performance enhancements owing to their adeptness in naturally modeling the representation of human poses through graph structures. However, prevailing methods predominantly concentrate on the local physical connections between joints, overlooking higher-order neighboring nodes. This limitation curtails their ability to effectively exploit relationships between distant joints. This article introduces a Multiscale Spatio-Temporal Hypergraph Convolutional Network (MST-HCN) designed to capture spatio-temporal information and higher-order dependencies. MST-HCN encompasses two pivotal modules: Multiscale Hypergraph Convolution (MHCN) and Multiscale Temporal Convolution (MTCN). The MHCN module represents human poses as hypergraphs in various forms, enabling the comprehensive extraction of both local and global structural information. In contrast to traditional stride convolutions, MTCN leverages multiple branches to learn important frames based on their significance, thereby filtering out redundant frames. Experimental results underscore that MST-HCN surpasses state-of-the-art methods in benchmark tests such as Human3.6M and MPI-INF-3DHP.In particular, our proposed MST-HCN method boosts performance by 1.5% and 0.9%, compared to the closest latest method, using detected 2D poses and ground truth 2D settings respectively.
Positron Emission Tomography (PET) images suffer from low spatial resolution, resulting in suboptimal visual effects and the inability to effectively display subtle pathological areas, hindering the early detection of potential issues. To address this problem, we propose a cascade multi-output super-resolution reconstruction method based on multi-channel input. Firstly, we construct a cascade super-resolution model by introducing degradation functions to refine the LR-to- HR mapping range. Through gradual super-resolution reconstruction, we comprehensively consider information loss and deformation issues during the image resolution reduction process, providing more accurate and higher-quality superresolution reconstruction results. Secondly, we introduce high-resolution CT images to accurately restore image texture details by incorporating additional high-frequency detail information and maintain overall image consistency by the integration of extra structural information. Finally, we incorporate region-based super-resolution detection information to adaptively reconstruct different areas of the image, avoiding distortion caused by excessive super-resolution and blurriness resulting from insufficient super-resolution. Experimental results demonstrate that our approach outperforms other methods, with SSIM, PSNR, and RMSE metrics reaching 0.9607, 34.9438, and 0.0201, respectively, achieving state-of-the-art performance. Furthermore, visual experiments demonstrate a significant improvement in the resolution of the reconstructed PET images using the method proposed in this paper. This effectively compensates for the deficiencies in the original images, providing strong support for the early detection of potential issues.
Aiming at the problems of blurred edge structure, loss of texture details, distortion and slow operation speed in medical image fusion, this paper proposes a medical image fusion model based on residual network. The network mainly consists of an encoder, fuser and decoder. A feature extraction module MSDN consisting of residual attention mechanism and dense blocks is designed in the encoder for extracting multi-scale deep features of the source image. A learnable fusion network is used in the fuser to replace the manually designed fusion rules, eliminating the adverse effects of manually designed fusion strategies on the fusion effect. The decoder obtains the fused image by layer-bylayer decoding and up-sampling. We use a two-stage training strategy to train the fusion model; in the first stage, the image reconstruction task is used to drive the training of the encoder-decoder; in the second stage, the trained encoder-decoder is fixed, and the residual fusion network is trained using an appropriate loss function. The experimental results show that the subjective visual effect of the fused image contains rich texture details and color information, and the comprehensive performance of the objective evaluation index is better than that of the comparison algorithm.
In positron emission tomography (PET) studies, iterative method is usually used for iterative reconstruction of PET data, in which the system matrix reflects the mapping between image space and projection space, which is the key of iterative reconstruction algorithm. The previous orthogonal distance ray tracing algorithm is computationally complex and inefficient. To improve its computational speed and imaging quality, we propose a new algorithm. Firstly, incremental thinking was introduced on the basis of siddon algorithm to directly solve the neighborhood where the current voxel and the upper voxel do not repeat, and accelerate the calculation of voxel coordinate index. Secondly, the distance between the neighborhood voxel and LOR line was iteratively solved based on the distance between the voxel and LOR line, which further improved the calculation speed. Finally, the probability value of the voxel which is completely covered by the detector is set as a constant value, while the probability value of other voxels decreases with the distance from LOR line, which improves the imaging quality of the algorithm. A large number of evaluation experiments were performed on the resolution prosthesis model and the line-derived prosthesis model to verify the effectiveness of our method.
KEYWORDS: 3D modeling, 3D image processing, Video, 3D image reconstruction, Facial recognition systems, Data modeling, 3D video compression, Video processing, Head, Drug discovery
Existing 3D face alignment and face reconstruction methods mainly focus on the accuracy of the model. When the existing methods are applied to dynamic videos, the stability and accuracy are significantly reduced. To overcome this problem, we propose a novel regression framework that strikes a balance between accuracy and stability. First, on the basis of lightweight backbone, encoder-decoder structure is used to jointly learn expression details and detailed 3D face from video images to recover shape details and their relationship to facial expression, and dynamic regression of a small number of 3D face parameters, effectively improve the speed and accuracy. Secondly, in order to further improve the stability of face landmarks in video, a jitter loss function of multi-frame image joint learning is proposed to strengthen the correlation between frames and face landmarks in video, and reduce the difference amplitude of face landmarks between adjacent frames to reduce the jitter of face landmarks. Experiments on several challenging datasets verify the effectiveness of our method.
Nuclear Magnetic Resonance Imaging(MRI) is the mainstream way to predict Alzheimer's disease, but the accuracy of traditional machine learning method based on MRI to predict Alzheimer's disease is low. Although Convolutional Neural Network(CNN) can automatically extract image features, convolution operations only focus on local regions and lose global connections. The attention mechanism can focus on local and global information at the same time, and improve the performance of the model by strengthening the key information to suppress invalid information.Therefore, this paper constructs a deep CNN based on multiple attention mechanisms for Alzheimer's disease prediction. Firstly, the MRI image is enhanced by cyclic convolution to enhance the feature information of the original image, so as to improve the prediction accuracy and stability. Secondly, multiple attention mechanisms are introduced to re-calibrate features and adaptively learn feature weights to identify brain regions that are particularly relevant for disease diagnosis. Finally, an improved VGG model is proposed as the backbone network. The maximum pooling is adjusted to average pooling to retain more image information and the network efficiency is improved by reducing the number of neurons in the fully connected layer to suppress over-fitting merging. The experimental results show that the prediction accuracy, sensitivity and specificity of Alzheimer's disease prediction method based on multiple attention mechanism are 99.8%, 99.9% and 99.8%, respectively, which is superior to the existing mainstream methods.
In recent years, machine learning methods have been extensively studied in Alzheimer's disease (AD) prediction. Most existing methods extract the handcraft features from images and then train a classifier for prediction. Although it has good performance, it has some deficiencies in essence, such as relying too much on image preprocessing, easily ignoring the latent lesion features. This paper proposes a deep learning network model based on the attention mechanism to learn the latent features of PET images for AD prediction. Firstly, we design a novel backbone network based on ResNet18 to capture the potential features of the lesion and avoid the problems of gradient disappearance and gradient explosion. Secondly, we add the channel attention mechanism so that the model can learn to use global information to selectively emphasize information features and suppress low-value features, which is conducive to the extraction of semantic features. Finally, we expand the data by horizontal flipping and random flipping, which reduces the over-fitting phenomenon caused by the limited medical data set and improves the generalization ability of the model. This method is evaluated on 238 brain PET images collected in the ADNI database, and the prediction accuracy is 94.2%, which is better than most mainstream algorithms.
Multimodal medical image fusion can provide comprehensive and rich information for doctors' diagnosis. Aiming at the problems of traditional fusion algorithms, such as PET/SPECT color distortion, insufficient MR texture and timeconsuming, this paper proposes a new multi-modal medical image fusion algorithm. Firstly, a nonsubsampled shearlet transform (NSST) was introduced to perform multi-scale decomposition of the source image to obtain low frequency and high frequency subbands. Then, since the low frequency subband image contains most of the intensity energy of the source image, which is divided into high energy region and low energy region according to the maximum between-class variance method, and the adaptive weighted fusion rule is proposed, which is beneficial to the high fidelity of the fused image and the visual effect is better. High-frequency subband have strong sparsity characteristics, adopting the maximum value fusion rule, and the image texture after fusion is clear. Finally, inverse NSST is performed on the fused low-frequency and high-frequency subbands to obtain the fused image. Compared with the representative medical image fusion algorithms in recent years, good results have been obtained in evaluation and computational efficiency.
Digital image quality is disturbed by noise to some extent. Researchers proposed a series of wavelet transform, non-local mean, and partial differential equation denoising algorithms to obtain high-quality images for subsequent research. Removing noise and preserving the edges and details of the image has attracted wide publicity. Methods based on anisotropic diffusion models have recently gained popularity, but these lead to over-smooth the image details. In this paper, we propose an improved denoising algorithm based on the anisotropic diffusion model. Our method further modifies the diffusion coefficient of the denoising model based on fractional differential operator and Gauss curvature (FDOGC). We use the edge-preserving characteristic of bilateral filtering to recover the image texture and adjust the diffusion coefficient given the characteristics of local variance. To balance the performance of denoising and edge-preserving, we add a regularization term to the diffusion model. We conduct ablation studies to verify the effectiveness of the innovation points. Our method can adjust the counterpoise between noise removal and edge preservation. Extensive experiments on public standard datasets indicate the superiority of our algorithm, in terms of not only quantitative and qualitative evaluation but also better visual effects.
Orientation feature is one of the most important features of palmprint images. At present, palmprint recognition methods based on orientation features have achieved promising recognition performance. However, most of these methods neglect the relationships between the orientation features, which can not effectively describe the structure of palm lines, and are sensitive to the translation and rotation. In this paper, a palmprint recognition method based on threeorientation joint features is proposed. Firstly, Gabor filter is adopted to extract the orientation features. Secondly, by analyzing the characteristics of palm lines, two sets of feature vectors are constructed by using three orientation features, which are maximum and two minimum orientation. Finally, the weighted Manhattan distance metric is used to measure the similarity between two palms. Further, in order to improve the recognition performance, a feature fusion scheme is proposed for fusing different features obtained from multispectral palmprints. Experiments on PolyU MSpalmprint Database demonstrate that the proposed method can achieve better recognition accuracy than some state-of-the-art methods.
In the existing one-factor cancelable biometric template protection scheme, the hashing function used in the transformation of biometrics can’t preserve the original biometric features, which leads to low recognition rate. To make full use of biometric features by replication and extension, but too long feature vectors can cause low computational efficiency. Therefore, a one-factor cancelable fingerprint template protection based on feature enhanced hashing is proposed. Firstly, the extended binary biometric vectors are combined by sliding and extracting window, then converted into decimal system, in order to make full use of biometric features and increase non-invertibility. Secondly, the permutation factor is calculated by the feature enhanced hashing function and the random sequence is reordered, it can embed the information of the original biometric features into the random sequence better. Finally, a cancelable template is generated by reducing the same length of the first and last of reordered random sequence, in this way, some elements can be deleted to improve the computational efficiency and non-invertibility. The experimental results show that the recognition rate of the algorithm is improved on FVC2002 and FVC2004 fingerprint databases, which meets the design standards of cancelable biometric recognition and can defend against security attacks.
Correlation filter-based trackers exploit large numbers of cyclically shifted samples to train the object classifier, which can achieve good results in tracking accuracy and speed. However, when in complex scenes such as occlusion or deformation, tracking drift or loss will occur. In this paper, a kernel correlation filter tracker base on scale adaptive and occlusion detection is proposed to strengthen the tracker robustness. Firstly, a robust appearance model combine the gradient feature and color feature is proposed to enhance the features representation ability; Secondly, a scale adaptive mechanism is introduced to handle the problem of the fixed template size, and the Newton method is used to find the maximum response value to more accurate predict the center position of the target and estimate the target scale; Finally, the occlusion detection scheme adopted when update model to avoid tracking failure due to appearance model pollution. Experiments are performed on the OTB2013 Benchmark Dataset, the results show that, compared to the basic tracker, we obtain an absolute gain of 6.6% and 13.4% respectively in mean distance precision and mean overlap precision.
Stacked hourglass (HG) networks have been successfully applied to face alignment. However, due to the complex geometry of the facial appearance, the HG model still lacks the robustness of aligning faces in large poses. In this paper, a two-step method is proposed for robust face alignment. First, by using a convolutional neural network (CNN) to directly output the transformation parameters, the conventional procedure of normalizing the face region by performing Procrustes analysis based on the detected landmarks and the mean shape is simplified. In this way, faces with different poses can be converted to a canonical state, which is more advantageous for subsequent face alignment. Second, motivated by recent deformable convolutional networks, we propose a modulated deformable residual block and replace the plain counterparts in the HG model, resulting in deformable hourglass networks (DHNs). The DHN yields large performance improvements over original HG model while having the almost same amount of parameters and bringing minor additional computation costs. Depending on the synergistic effect of two innovations, the proposed method achieves better performance in comparison to the state-of-the-art methods on challenging benchmark datasets.
Aiming at the characteristics of edge gradient features and orientation features of palmprint images, a new Local Joint Edge and Orientation Patterns (LJEOP) method is proposed to extract palmprint features. Firstly, the Kirsch operator utilizes calculate the edge response values of palmprint images in 8 different orientations and the Local Maximum Edge Pattern(LMEP) is proposed to represent the edge features. The orientation features of the palmprint image are extracted by using a Gabor filter or a Modified Finite Radon Transform (MFRAT). Then the joint analysis of edge features and orientation features is carried out to construct a two-dimensional feature matrix. Compared with some existing palmprint recognition methods, our experimental results on the MSpalmprint library achieve higher recognition rate ,lower equal error rate and faster recognition speed.
The finger vein feature extraction algorithm based on global or local features is sensitive to rotation, translation and scaling. Convolutional neural networks have higher robustness, but fewer finger vein samples are prone to over-fitting. Therefore, this paper designs a network architecture FingerveinNet for finger vein recognition. Firstly, based the Inception-resnet[1] module, the design of the finger vein network architecture is used to extract the multi-scale finger vein features while slowing down the gradient disappearance problem without increasing the parameters. Secondly, the center-loss is used as the loss function to optimize the network model and improve. The discriminability of feature vectors for better detail discrimination. Experiments on three international finger vein databases FV-TJ, FV-USM and PolyU show that the proposed method is robust to rotation and translation, and the effectiveness of the proposed method is verified.
Difference summation is utilized to calculate the differential excitation of Weber Local Descriptor (WLD), however, this method is sensitive to noise and may offset positive and negative difference; at the same time, the orientation operator of WLD solely calculates the ratio of the gray difference between the vertical and horizontal orientations, that cannot portray the orientation features of the finger vein effectively. According to the features of finger vein images, this paper based on the differential excitation and orientation operator of Weber descriptors, several improvements are as below: 1)firstly, find the edge area in the image, and optimize the gradient magnitude according to the position of the pixel to increase the discrimination, 2)and then differential excitation is represented by the ratio of the optimized gradient magnitude to the current pixel; 3)double Gabor orientations are leveraged to replace the original gradient orientation to reduce the influence of translation and rotation on recognition; 4)finally, in order to better measure the similarity among features, a cross-matching algorithm is used to improve the recognition rate. In this work, the proposed local descriptor and the cross-matching algorithm are combined for finger vein image recognition, the recognition rates reached 100% and 99. 458% on the two finger vein databases at home and abroad, respectively.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.