PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.
This PDF file contains the front matter associated with SPIE Proceedings Volume 13540, including the Title Page, Copyright information, Table of Contents, and Conference Committee information.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In a rapidly advancing technological landscape, integrating deep learning in autonomous vehicles presents a pivotal challenge and opportunity. Our study briefly introduces some concepts in machine learning and its network structures. We aim to perform road segmentation using a U-Net image segmentation model trained on the Lyft Udacity dataset. We trained the model under different settings by adjusting parameters such as the initial channel, pooling method, activation function, and batch size. The performance of models is evaluated using the IoU metric. By comparing the performance of the models, we identify which settings yield higher performance for road segmentation task. Additionally, we observe the validation loss per epoch to determine which configuration provided the best training stability. This comprehensive evaluation allows us to identify optimal model setup for robust and reliable road segmentation models.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Data collection with the mine vehicle monitoring system to all vehicles entering and leaving the mine, which contains a variety of vehicle pictures in different orientation, how to efficiently distinguish the truck’s head and tail is a major research challenge. Therefore, in this paper, we propose a head and tail recognition method based on CA attention improved Efficient-Net algorithm. We improve the accuracy of model feature extraction by introducing the coordinate attention mechanism (CA) to average the spatial feature information in two dimensions, X-axis and Y-axis, respectively, and changing the input and output of the residual edge of the inverse residual module to shorten the high-dimensional channel. Meanwhile, this paper introduces depth-separable convolution neural network and agent normalized activation in the mobile flip-flop convolution module to offset the different non-normalized sources of X-axis and Y-axis two dimensions between each convolution layer, and then improve the model detection rate. Combining the two methods, we achieve 99% val_Acc on the improved Efficient-Net network test set, which proves the algorithm is efficient.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Traffic sign detection and recognition (TSDR) is a pivotal component of contemporary transportation systems, designed to enhance road safety and operational efficiency. In an era where automation is indispensable, TSDR stands out as a critical tool for optimizing driving experiences. This study presents the development of a TSDR model employing advanced deep learning techniques, with a particular emphasis on convolutional neural networks (CNNs). Utilizing the German Traffic Sign Recognition Benchmark (GTSRB) dataset, which comprises annotated images of various traffic signs captured under diverse environmental conditions, we achieved remarkable results. Our CNN models demonstrated a 99% detection rate, underscoring their efficacy in traffic sign recognition tasks. Furthermore, transfer learning models yielded impressive accuracies, with VGG19 achieving 99.64%, EfficientNetB7 reaching 99%, and ASNet attaining 99.72%. Despite these promising results, we identified several limitations, including dataset-specific dependencies and the need for further enhancements in model design and training methodologies. Addressing these challenges is necessary for ensuring the robustness and reliability of intelligent transportation systems, thereby significantly improving road traffic management, safety, and efficiency.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Over the past few decades, scholars and academics from various disciplines have been motivated to develop automated emotion detection systems. In this pursuit, audio data and especially prosodic features, holds the most promise to deliver satisfying results. Therefore, main aim of this approach was to evaluate standard machine learning algorithms on the task of emotion recognition from audio data. We evaluate the effect of training dataset size on model performance by means of incremental fine-tuning after conducting zeroshot testing on a range of widely-used datasets in the literature, such as CREMA-D, RAVDESS, TESS, SAVEE, MELD, eNTERFACE, EmoDB, and IEMOCAP. To improve model generalizability, we used data augmentation approaches, and for robust emotion detection, we used feature extraction techniques as MFCC, ZCR, and RMS. On CREMA-mixed datasets, experimental results show great initial accuracy with CNN model. Cross-corpus validation highlights the importance of diverse datasets, showing significant accuracy improvements with incremental fine-tuning. Our research opens the door to more potent emotion detection systems in practical applications by highlighting the necessity of varied training data for robust, generalizable SER models.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
QR code is a commonly used two-dimensional barcode employed in a variety of applications. To simplify the detection and segmentation of QR code, its structure incorporates three finder patterns. Sometimes, code sides are cropped due to printing issues or overlapping. However, most recognition methods refuse to work if at least one of three finder patterns is missing or damaged. This paper proposes an approach that allows reading such codes. It is based on utilizing alignment patterns alongside with finder patterns and the RANSAC scheme to estimate a projective transform that maps the symbol modules to the input image. The effectiveness of the proposed approach is evaluated using the generated dataset called SE-QR-SYN-500. It simulates cropped QR codes, both with and without projective distortions, and is published for the open use. The existing open-source solutions show zero accuracy on this dataset. In comparison, our implementation of the proposed approach demonstrates 77.8% accuracy for straight QR codes and 76.4% accuracy for QR codes with projective distortions.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
It is important to study coffee plants, and help agronomists to detect diseases, such as rust, with resources of computer science. In this work, it is described experiments using image segmentation algorithm JSEG, which is capable to segment images in multi-scale. Using a coffee tree image database RoCoLe (Robusta Coffee Leaf Images), the JSEG algorithm is used to segment these images in four scales. It is selected typical segments in each scale and they are clustered in similarity of normalized color histograms, using Near Sets theory. In this way the several scales segmentations are compared. It is concluded that smaller segments, scales 1 and 2, in which the colors are more homogeneous than in greater segments, scales 3 and 4, are adequate to use as training samples for the detection of rust diseases using artificial neural networks. The proposed approach can be applied for counting the segments affected by rust, to estimate the disease severity in coffee leaves.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Machine Learning and Computational Models in Intelligent Image Processing
In this paper, we present a survey of deep learning-based methods for the regression of gaze direction vector from head and eye images. We describe in detail numerous published methods with a focus on the input data, architecture of the model, and loss function used to supervise the model. Additionally, we present a list of datasets that can be used to train and evaluate gaze direction regression methods. Furthermore, we noticed that the results reported in the literature are often not comparable one to another due to differences in the validation or even test subsets used. To address this problem, we re-evaluated several methods on the commonly used in-the-wild Gaze360 dataset using the same validation setup. The experimental results show that the latest methods, although claiming state-of-the-art results, significantly underperform compared with some older methods. Finally, we show that the temporal models outperform the static models under static test conditions.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Plant disease detection is a one of the most studied subjects in precision agriculture which aims to protect and improve agricultural crops. Commonly, intelligent systems based on CNN (Convolutional Neural Networks) are employed to identify multiple plant diseases by analyzing leaf images. In this work, we propose a novel deep feature fusion to improve tomato disease classification. Precisely, deep features extracted from the Compact convolutional transformer are fused with CNN features generated by using the MobileNetV2 and the DenseNet201. Fused features are trained by a SVM classifier which achieves the classification into multiple disease classes. Experiments conducted on a set of 10 tomato disease categories highlight the effectiveness of the proposed system which outperforms individual CNN by 2% in the overall classification accuracy. Also, it achieves an accuracy of 99.46% which is better than several state of the art results.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Nowadays, the Hough (discrete Radon) transform (HT/DRT) has proved to be an extremely powerful and widespread tool harnessed in a number of application areas, ranging from general image processing to X-ray computed tomography. Efficient utilization of the HT to solve applied problems demands its acceleration and increased accuracy. Along with this, most fast algorithms for computing the HT, especially the pioneering Brady-Yong algorithm, operate on power-of-two size input images and are not adapted for arbitrary size images. This paper presents a new algorithm for calculating the HT for images of arbitrary size. It generalizes the Brady-Yong algorithm from which it inherits the optimal computational complexity. Moreover, the algorithm allows to compute the HT with considerably higher accuracy compared to the existing algorithm. Herewith, the paper provides a theoretical analysis of the computational complexity and accuracy of the proposed algorithm. The conclusions of the performed experiments conform with the theoretical results.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This study delves deep into the key factors affecting the likelihood of NCAA basketball players getting drafted into the NBA. The study highlights the importance of offensive metrics such as points scored and offensive ratings in predicting an NCAA player’s chances of being drafted into the NBA by utilizing an unsupervised learning clustering model and a supervised decision tree model. This underscores the significance of offensive statistics in a player’s skill set and suggests that players and coaches should prioritize improving these metrics to enhance a player’s draft potential. The study found that defensive metrics like defensive ratings and blocks have less impact on overall draft potential than offensive metrics. A crucial point to note is that a team’s success often relies on having its top players actively participating on the court. This research enhances our understanding of the factors influencing the draft prospects of NCAA basketball players. It underscores the advancement of basketball analytics and paves the way for further research on player performance metrics and their influence on the scouting and selection of professional athletes.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Computer-aided Detection and Medical Image Analysis
Dose reduction in computed tomography is one of the most urgent questions with a deep research history. One of the rising potentially powerful tools for dose reduction is monitored tomographic reconstruction. This promising concept heavily relies on utilizing real-time reconstruction from currently available data, which is the main topic of the present paper. The main result of achieving a real-time reconstruction on a commercially available micro-CT setup utilizing SmartTomoEngine SDK is highlighted and presented in more detail. The new acquisition order is proposed, allowing for faster convergence of intermediate reconstruction results. An optimization of reconstruction from partial data is proposed, which allowed the production of 385 and 440 reconstructions for standard and proposed acquisition orders correspondingly during a single acquisition of 512 projections, achieving a real-time reconstruction. The results are produced with the use of only a single previous-generation consumer-grade graphical processing unit. The study results demonstrate that with proposed optimizations monitored tomographic reconstruction can be effectively utilized for practical applications using the current generation of existing setups in the micro CT application field.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Reducing image artifacts in modern computed tomography is one of the most important directions, which also receives a research boost from the rapid development of artificial intelligence and machine learning. The neural network based methods often allow the achievement of the high artifact reduction quality unattainable by classical approaches. Simultaneously, these models tend to be less robust and generalize worse than algorithmic approaches. The output result may degrade severely under the use in cases not represented in learning datasets. In the present work, the robustness and generalization properties of neural network models are studied in the task of metal-like artifact reduction for computed tomography reconstruction. The pipeline for data augmentation is developed for the metal artifacts severity and geometry variations parameters. The pipeline is validated by augmentation of DICDNet model test dataset and evaluating of the corresponding pre-trained DICDNet model. The generated data consisted of 192000 samples for 12×8 combinations of experiment parameters. It was demonstrated that DICDNet is reasonably robust to geometry variations with a maximum deviation of PSNR≈1.5%. At the same time, even small artifact severity growth leads to rapid quality degradation and blurring of studied model output. The degradation reaches ≈22% PSNR deterioration from the baseline algorithmic approach, which is also one of the inputs for the studied model.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
X-ray computed tomography provides information about the internal structure of the object under study. Standardizing the orientation of the reconstructed volume is a critical stage in many volumetric data processing and analysis pipelines that require a certain (strictly specified) orientation of the reconstructed volume. Since microelectronic devices are mostly planar in structure, virtual two-dimensional cross sections of the reconstructed volume must be located along planar layers. However, it is not always possible to accurately orient the object to be scanned, so post-processing algorithms are used for automatic alignment. In some tasks, the strict orientation of the digital image of a 3d object is a necessary condition for the operation of post-processing algorithms. An example of a post-processing algorithm with stringent object orientation requirements is the algorithm for automatically unfolding a digital copy of a collapsed object (scroll). Orientation requirements also arise when solving segmentation problems, searching for special points, stitching parts of the image to improve the quality of the algorithm result. For learning methods such as artificial neural networks, using standardized volume orientation can dramatically reduce the variability of the data, which allows for achieving high performance with less computational overhead and data preparation. In this paper, we proposed an automatic orthotropic alignment method to achieve the desired object orientation. The algorithm implements the optimization of the orientation model parameters using the RANSAC method. The numerical implementation of the method is done in Python language. The performance of the method is demonstrated on three datasets: a digital model of a test 3d object, a flash drive tomographic reconstruction result, and a scroll tomographic reconstruction result. For the same objects, calculations were carried out using previously used methods based on the inertia tensor and structural tensor. The comparison of the obtained results showed that our proposed method is the most robust compared to the methods using the inertia tensor and the structural tensor. A detailed description of the comparison methodology is also presented in the paper. In medical research, all objects belong to one class and have a similar internal structure, so anatomical features are used to standardize orientation. In industrial and laboratory computed tomography, the variability of objects is much higher, so more general features, such as the presence of dedicated orientations, must be used. To our knowledge, this study is the first dedicated to the problem of automatic geometric normalization of tomographic reconstruction results of objects with orthotropic features. In this study, we propose and compare three different orthotropic alignment methods: inertia tensor-based, structural tensor-based, and RANSAC-based. Our experiments on synthetic data and real object reconstructions show the advantages of the RANSAC-based approach for automatic orthotropic alignment of reconstructed volumes.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Computed tomography (CT) imaging is a non-invasive technique for analyzing internal structures slice by slice. Nowadays, CT is being influenced by the rise of neural network (NN) approaches that address limitations of traditional CT methods. NN applications in CT have shown promise in reducing radiation exposure, compensating instabilities of the X-ray source operation mode, and improving image reconstruction quality. Nevertheless, on the path towards development, approbation and testing of NN-based CT methods, pitfalls and challenges still remain. Many publicly available datasets offered to developers to validate their methods are non-physical and do not correlate with real-life measurements. This, in turn, leads to non-scalability and instability of the developed NN methods. The observed gap in validation database emphasizes the need for comprehensive benchmarking against suitable medical datasets. This paper highlights the importance of modeling realistic data and refining quality measurement methodologies to foster the development of human-reliable NN-based CT solutions. Herewith, the authors investigate peculiarities of the ICASSP-2024 3D-CBCT Challenge dataset, one of the well-known CT datasets, and in details indicate its anomalies to take into consideration while modeling CT data and validating CT methods.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The low-resolution effect of satellite and drone remote sensing images of landslide and debris flow disasters under adverse weather or complex scenarios, as well as the issues of gradient explosion, mode collapse, and artifacts in deep learning-based generative adversarial network (GAN) super-resolution models, present significant challenges. This paper proposes an optical remote sensing super-resolution reconstruction method for landslide and debris flow disasters based on a denoising diffusion probabilistic model (DDPM). The method adds Gaussian noise in the forward process to convert the image into a Gaussian noise distribution, and in the reverse process, generates high-resolution images using a reverse process, thus achieving high-quality super-resolution reconstruction of remote sensing images of landslides and debris flows. The reconstructed images demonstrate enhanced texture, terrain, and landform details. Additionally, this method addresses the problem of insufficient training datasets for precise super-resolution images in disaster intelligence recognition. Experiments were conducted on landslide and debris flow image datasets from regions such as Sichuan and Guizhou in China. The experimental results demonstrate that the super-resolution reconstruction results based on the denoising diffusion probabilistic model exhibit significant improvements in peak signal-to-noise ratio (PSNR) and structural similarity index (SSIM), outperforming traditional deep learning methods. This study provides a new technical solution for obtaining accurate high-resolution remote sensing image data.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Accurate trajectory estimation and 3D reconstruction in Simultaneous Localization and Mapping (SLAM) applications are highly dependent on the choice of optimization method, particularly with complex datasets. This study evaluates the performance of the MonoGS system using three different optimizers: Adaptive Moment Estimation (Adam), AdamW, and SGD with Momentum. Originally utilizing Adam for its rapid convergence and reliability, we aimed to improve the system's accuracy and reconstruction quality by replacing Adam with AdamW and SGD with Momentum. Experiments conducted using the TUM-Mono dataset revealed that AdamW, which incorporates a weight decay parameter, significantly enhances generalization and reduces overfitting. Our results indicate that AdamW achieved a lower Absolute Trajectory Error (ATE) Root Mean Squared Error (RMSE) of 0.02588, compared to Adam's 0.03594, leading to more accurate and structured 3D reconstructions. In contrast, SGD with Momentum yielded the highest error of 0.05911. These findings highlight AdamW’s superior efficacy in improving SLAM robustness and accuracy, demonstrating its potential as the optimal choice among the tested optimizers.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The role of barcodes in industrial processes and product labeling is rapidly growing. So, the problem of their recognition using mobile cameras in uncontrolled environments becomes very acute. The development of neural networks and computer vision provides an opportunity to solve it with high accuracy, given that there is enough representative data to train and test algorithms. However, the barcodes can contain sensitive information, and huge amounts of data cannot be published. Synthetic image generation can solve this problem, and in this paper, we propose a method to generate semi-synthetic natural-looking 2D barcodes with illumination changes, blur, and projective distortion, which can be used to create training or testing data for localization problems. We introduce the SE-DMTX-SYN-1000 dataset, designed for Data Matrix fine localization. We validate localization accuracy using Zxing, Zxing-cpp and libdmtx barcode reading libraries and demonstrate that this benchmark is quite challenging and can help to improve Data Matrix localization.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this study, we propose a new secret sharing scheme that is based on multivariable polynomials. This novel scheme consists of general access sets which allow dealers to construct access sets of predesignated sizes. This scheme provides flexibility to dealers and due to its nature; dealers can easily set hierarchy among access sets as well. Further, the scheme provides another important flexibility that is; dealers can update their secret keys without contacting shareholders in private. Hence, dealers do not have to reset the whole process for another key which is a great burden. Moreover, the scheme also provides the option for dealers to remove and add new shares without changing former shareholders. This scheme combines all aforementioned features in one that makes it distinguishable among its current counterparts. We compare and discuss the security and complexity of the scheme and we conclude by presenting an application on a very moderate image that reflects application of the scheme.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.