PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.
Proceedings Volume Sixth International Conference on Image, Video Processing, and Artificial Intelligence (IVPAI 2024), 1322501 (2024) https://doi.org/10.1117/12.3048811
This PDF file contains the front matter associated with SPIE Proceedings Volume 13225, including the Title Page, Copyright information, Table of Contents, and Conference Committee information.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Sixth International Conference on Image, Video Processing and Artificial Intelligence (IVPAI 2024)
Shen Li, Ke Du, Min Lin, Zhouyan Li, Ning Li, Cen Xiong
Proceedings Volume Sixth International Conference on Image, Video Processing, and Artificial Intelligence (IVPAI 2024), 1322502 (2024) https://doi.org/10.1117/12.3046342
This study aims to explore the design and function of an electrical grid security event analysis system. It analyzes the weaknesses and deficiencies in the current management of grid security events and proposes a design for an intelligent analysis system for these events. The grid security event analysis system creates a long-term, stable, and effective database by standardizing the input, reporting, review, and summarization of grid security event data, thereby achieving digital management of grid security events. Utilizing technologies such as artificial intelligence and big data analysis, the system conducts multi-dimensional analysis of extensive, long-term grid security event data. This enables research into the characteristics of events and the logical relationships between them. Through the grid security event analysis system, coupling with external influencing factors is realized, facilitating the analysis of characteristics of various types of grid security events. The system effectively integrates and fully exploits grid security event data, presenting information from multiple perspectives, thus enhancing the management of safety incidents for power companies. It provides data support and a theoretical basis for transitioning from traditional, passive, post-event control to proactive, dynamic, and targeted prevention and control.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Sixth International Conference on Image, Video Processing, and Artificial Intelligence (IVPAI 2024), 1322503 (2024) https://doi.org/10.1117/12.3046156
Interactive information fault diagnosis technology is a new type of fault diagnosis technology which is integrated by information fusion, artificial intelligence, computer science and other disciplines. It can extract interactive information data of equipment in real time, analyze the characteristics of fault information, and then find the change and trend of equipment operating status. However, there are still some shortcomings in the practical application of this technology, for example, it is difficult to deal with a large number of complex interactive information data, and it is difficult to carry out effective fault diagnosis. Therefore, an intelligent fault diagnosis method with improved grey relational degree is proposed in this paper. This method can effectively extract the interactive information feature data of the equipment, and carry out the correlation analysis of the fault feature data through the grey correlation analysis algorithm, which provides a new idea for the fault diagnosis
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Sixth International Conference on Image, Video Processing, and Artificial Intelligence (IVPAI 2024), 1322504 (2024) https://doi.org/10.1117/12.3046163
Urban investment bonds (Chengtou bonds) refers to bonds issued for local economic and social development, are also a major component of implicit local government debt, which is one of the main financing channels for local governments in China and playing a crucial role in the risk of default on local government debt. So, the paper suggests studying the risks evolution process through unexpected events by employing thousands of observations with data from municipal issuers’ and bonds’ level from the original year 2002 to 2022, employing RBF (radial basis function) neural networks and PCA (principal components analysis) to construct dual fixed effect model set by year and municipal level. Then it finds out there is a significant negative relationship between the total amount of land grant premium and this risk reduction effect caused by the amount of land grant premium, which was weakened during the pandemic period, thus a new perspective on the risks process is proposed by observing the operating methods of its companies. Finally, the most suitable RBF fusion methods are obtained for risks prediction, with which empirical learning has obtained various control variables and parameters, where the experiments indicate that our fitting results are efficient and valuable for accidents.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Yolanda D. Austria, Hubert Q. Temprosa, Adrien Kyle C. Balane, Margueritte Ann C. Arevalo, Martin Timothy M. Olano, Jan Lemuel C. Retanan
Proceedings Volume Sixth International Conference on Image, Video Processing, and Artificial Intelligence (IVPAI 2024), 1322505 (2024) https://doi.org/10.1117/12.3046947
The Solar Powered IoT Enhanced Portable Rice Grain Dryer project aims to revolutionize traditional rice drying methods by leveraging solar energy and Internet of Things (IoT) technology to enhance efficiency and grain quality. This study focuses on creating a sustainable, efficient solution for small-scale farmers by integrating solar panels, sensors, and a pulley system within a portable dryer coupled with IoT connectivity for real-time monitoring and control. The research combines experimental testing and data analysis, encompassing unit, integration, and acceptance testing to evaluate performance metrics comprehensively. The results reveal that the solar-powered dryer significantly reduces drying time from three to four days to just two hours and decreases the moisture content of rice grains from 18-25% to an optimal 12-14%. This marked improvement ensures a higher quality of dried rice and minimizes post-harvest losses, contributing to better food security. The IoT-enabled system allows for automated temperature regulation, ensuring optimal drying conditions and offering farmers real-time insights and control over the drying process. This innovative approach has proven effective and reliable under varying environmental conditions, validating its potential as a practical and impactful solution for the agricultural sector. Overall, the Solar Powered IoT Enhanced Portable Rice Grain Dryer stands out as a pioneering advancement in agricultural technology, promising to support sustainable farming practices and enhance the livelihood of farmers by improving drying efficiency and grain quality.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Sixth International Conference on Image, Video Processing, and Artificial Intelligence (IVPAI 2024), 1322506 (2024) https://doi.org/10.1117/12.3046648
In order to solve the problems of vehicle and goods mismatch and non-ideal path planning in the distribution link of power metering material supply chain, this paper introduces an intelligent dispatching algorithm of power metering material supply chain. This algorithm mainly uses genetic algorithms for path optimization and uses genetic algorithms to ensure suitable allocation of vehicles. By combining the best distribution path and loading scheme, the algorithm can conduct intelligent dispatching of power grid materials based on the actual material dispatching requirements.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Amar Kumar Verma, Afroz A. Saad, Anurag Choudhary, S. Fatima, B. K. Panigrahi
Proceedings Volume Sixth International Conference on Image, Video Processing, and Artificial Intelligence (IVPAI 2024), 1322507 (2024) https://doi.org/10.1117/12.3046499
This paper investigates the severity of stator winding insulation failures by combining deep learning with time-frequency-based features. These time-based features include rms, peak, skew, crest, and kurtosis, while frequency-based features dominant frequency and wavelet energy. The initial phase of stator winding insulation degradation is identified as a Stator Turn-Turn Fault (STTF). It is critical to monitor industrial machines to prevent catastrophic failures. This work develops an experimental test-rig setup that mimics various STTFs to imitate real-time industry conditions. The proposed time-frequency-based deep learning model achieved 96.4% accuracy using raw experimental data to identify the six fault conditions, including two early stages, two intermittent stages, and two severity stages within the same winding phase from a squirrel cage induction motor while achieving 100% on featured data. The robustness of the proposed model was validated using unseen data from a different machine with unknown STTF conditions with an accuracy of 91.78%. This implies the reliability and scalability of the model, which can be adapted for industry use.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Yongjin Xu, Wenjia Shi, Shen Ye, Lixin Wang, Wangping Shen, Lan Jiang, Xuanhuai Xu
Proceedings Volume Sixth International Conference on Image, Video Processing, and Artificial Intelligence (IVPAI 2024), 1322508 (2024) https://doi.org/10.1117/12.3046581
This article studies the integrated scheduling problem and optimization method of order batching in the warehouse logistics environment. For the integrated scheduling problem of order batching and job allocation with dynamic order arrival, a perturbed self-learning iterative local search algorithm is studied. To accelerate the search speed of local optimal solutions in the algorithm, a rolling time window batch algorithm is designed that integrates order features; To address the problem of difficulty in tuning disturbance parameters during online solving, a reinforcement learning mechanism integrating multiple evaluation criteria is adopted to design a disturbance parameter self-learning mechanism, which adaptively adjusts the disturbance type and intensity to balance global and local search; To balance algorithm performance and efficiency, this mechanism is applied through offline training and online updates to reduce the computational complexity of online optimization.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Khaled M Mohamed, Mervat Medhat, Manar Daher, Layal W Ayoub
Proceedings Volume Sixth International Conference on Image, Video Processing, and Artificial Intelligence (IVPAI 2024), 1322509 (2024) https://doi.org/10.1117/12.3046205
This study examines the impact of AI-generated design (AI-GD) tools on graphic designers' creativity, professional identity, and design process efficiency. The research evaluates the integration of AI tools (images from text generators) into traditional design practices using a sample of 40 undergraduate graphic design students from Ajman University. Findings reveal that AI-GD tools significantly enhance creativity and professional development, particularly in idea generation and layout design. Additionally, AI increased confidence and created a stronger professional identity. The study emphasizes that AI tools improve the integration of designs with cultural and contextual aspects; however, the study believes human oversight remains crucial for capturing historical and cultural nuances. The preference for a hybrid approach combining AI-GD and traditional human H-GD tools in design was evident, in new teaching methods and future projects. The results also indicate strong students’ interest in further AI training integration in graphic design education. This research underscores the need for educational curricula to adapt, equipping students with the skills to leverage both (AI-GD) and (H-GD) design methods.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Jokie Blaise, Fatima Umar Zambuk, Badamasi Imam Ya'u
Proceedings Volume Sixth International Conference on Image, Video Processing, and Artificial Intelligence (IVPAI 2024), 132250A (2024) https://doi.org/10.1117/12.3046376
Large Language Models (LLMs) have garnered widespread attention due to their impressive performance, particularly in text generation. There is a strong inclination to apply these models across various fields. However, these models are not without their vulnerabilities, with hallucination being a prominent concern. Hallucination occurs when the model generates outputs that are either non-factual or lack grounding in the source input, while maintaining fluency and grammatical correctness. This drawback diminishes the trustworthiness of these models, rendering them unsuitable for deployment in sensitive fields and susceptible to misuse for disinformation. As the size of these models continues to grow, and the competition to unveil the next breakthrough model persists, developers often release them in a closed format, keeping their internal workings opaque. The lack of transparency underscores the importance of black box approaches to hallucination detection. In response to recent developments in zero-resource, black-box hallucination detection approaches, this work aims to enhance these methods, contributing valuable tools to address hallucination issues in large language models. We introduce a zero-resource, black-box framework named HalluciCheck, which operates based on an ensemble of similarity metric scores. We find that this framework has an improved performance in detecting factual sentences while its performance in detecting false sentences is comparable to previous methods. This framework serves as an initial step toward mitigating the hallucination problem.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Sixth International Conference on Image, Video Processing, and Artificial Intelligence (IVPAI 2024), 132250B (2024) https://doi.org/10.1117/12.3046179
Generative artificial intelligence technology has a wide range of applications in data authentication, including scene simulation, artistic and creative generation, data repair, anomaly detection, etc. However, there are still many problems, such as privacy protection, complex model interpretation, low degree of user data protection, and discrimination and bias in the generated data. Based on the perspective of user data value, this article proposes to reconstruct the generative artificial intelligence data ownership model from five aspects: user data usage description, user rights management of data, user data value return, data contribution recognition, and partner transparency. Corresponding development strategies are proposed from the legal, ethical, technical and user data value levels. In the future, with the development of generative artificial intelligence data authentication, users will be able to share data in a more transparent and fair manner and receive rewards or other potential value from it.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Sixth International Conference on Image, Video Processing, and Artificial Intelligence (IVPAI 2024), 132250C (2024) https://doi.org/10.1117/12.3046308
The detection and counting of small objects are pivotal in computer vision, particularly within complex, cluttered environments. Traditional methods like Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs) show progress yet struggle in such settings. Inspired by the VMamba model, we propose a new State Space Model Based Small Object Counting Neural Network (SSOCNet). This model uses the VMamba architecture as its backbone, known for its global receptive field and computational efficiency, ideal for small object detection and counting. SSOCNet leverages VMamba's strengths and includes a versatile loss function derived from unbalanced optimal transport theory, optimizing performance in diverse settings. Experimental results validate SSOCNet's effectiveness in managing high-density small objects across various datasets and its competitive or superior performance in detection accuracy compared to current state-of-the-art technologies. This study underscores the VMamba architecture's applicability in visual counting tasks and presents a novel approach to counting small objects in complex environments.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Bui Hai Phong, Le Thao Nguyen Dang, Phuc Nhan Nguyen, Minh Quoc Bao Phan, Van Khuong Pham
Proceedings Volume Sixth International Conference on Image, Video Processing, and Artificial Intelligence (IVPAI 2024), 132250D (2024) https://doi.org/10.1117/12.3046200
Rice plants have shown the crucial importance for our food supply. However, the diseases have significantly affected the rice productivity every year. The early and accurate detection of the diseases allows us to diagnose and treat the diseases efficiently. In the past few years, the advances in artificial intelligence allow us to detect and recognize the rice diseases with high performance. The paper presents a solution for the detection and recognition of rice diseases using deep neural networks. The detection module is performed using the YOLOv8 network. The recognition module is performed using the Denesenet-121 network. We have evaluated the proposed method on two datasets: the Kaggle and Vietnamese rice disease datasets. The obtained accuracy of 93% and 96% for the recognition of diseases of rice plants on Kaggle and Vietnamese datasets have shown the effectiveness of our proposed method.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Sixth International Conference on Image, Video Processing, and Artificial Intelligence (IVPAI 2024), 132250E (2024) https://doi.org/10.1117/12.3046181
As one of the most popular vision applications, face verification systems can be needed in all kinds of scenarios, including resource-constrained environments. In this paper, we propose a trimmed variant of MobileFaceNet, which can perform as well as MobileFaceNet but with fewer memory and computational requirements. We essentially make two changes to the bottlenecks: reversing the expansion-projection process and adding one more depth-wise convolution layer. While the first one significantly reduces the number of parameters and calculations, the second one increases them. We try to strike a balance between the two to have a complex enough network architecture that can perform face verification as well as the original one. We could cut down the model size by almost a quarter in this manner, and we name the resultant network TriMoFaceNet.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Sixth International Conference on Image, Video Processing, and Artificial Intelligence (IVPAI 2024), 132250F (2024) https://doi.org/10.1117/12.3046180
With its successful application to areas like speech and audio recognition, the Multivariate Time Series Classification (MTSC) has become one of the most essential problems in the time series analysis domain. However, existing solutions are suboptimal as they mainly focus on modeling the temporal information of Multivariate Time Series (MTS) but fall short of capturing the relationship among the univariate time series (UTS). To address this problem, we propose the Relational-Temporal Classification Network (RTNet) to combine the spatial relationship and temporal information of MTS. RTNet employs a graph attention layer to learn the relationship between the UTS and a bottleneck module along with GRU to extract the temporal dependency. We conducted extensive experiments on fifteen public datasets of UEA. RTNet outperforms seven state-of-the-art baseline models, demonstrating its superiority.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Sixth International Conference on Image, Video Processing, and Artificial Intelligence (IVPAI 2024), 132250G (2024) https://doi.org/10.1117/12.3046541
This paper deals with fish vitality prediction using the proposed deep-learning pipeline framework based on water temperature data and synchronized underwater video segments. Fish behaviors indicate welfare and health but deploying an aquaculture Internet of Things (AIoT) system to observe them is challenging. Given a time window t, we design a multimode sensing system to capture the water temperature sequence of the target environment, the underwater RGB fish video, and a sonar video simultaneously. A deep learning-based optical flow model captures the optical flow maps of individual frames of the RGB video, where each map represents a snapshot of the swimming behaviors of the target fish school. With each optical flow map, the swimming speed of the fish school is computed to represent the video as a sequence of swimming speeds, which is then cascaded with the sequence of water temperature. With the resulting sequence as the input, a transformer-like neural network is trained to synthesize a feature sequence for annotating the fish vitality level of the next time window using a classifier. The predicted fish vitality is finally used to trigger a warning message to send to the user. Our system, based on the sonar video, uses a 3D point cloud reconstruction method to estimate the distribution of swimming positions of fish in the target environment, which is augmented to the warning message for completing the fish vitality report. Experimental results show that timely warnings when sea temperatures are low can reduce fish death losses by 50%.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Sixth International Conference on Image, Video Processing, and Artificial Intelligence (IVPAI 2024), 132250H (2024) https://doi.org/10.1117/12.3047050
Social media has become a primary source of communication, bringing together people with diverse opinions and backgrounds. Identifying these mixed emotions, especially in code-mixed English-Bangla and Banglish texts, presents unique challenges. This thesis explores machine learning techniques to accurately detect and analyze positive and negative sentiment in such multilingual environments, contributing to a safer and more constructive online community. Sentiment analysis, or opinion mining, is the study of how people’s views, sentiments, assessments, attitudes, and emotions are conveyed in writing. It is one of the most active research areas in Natural Language Processing (NLP) due to its relevance for business and society, expanding beyond computer science into management and social sciences. This study provides an overview of the challenges associated with sentiment analysis, focusing on its methods and procedures. It reviews the latest research using deep learning techniques to address sentiment analysis issues. Specifically, this research applied word embedding techniques and models to datasets. A comparative analysis of experimental results for various models and input features was conducted, highlighting the effectiveness of advanced machine learning approaches in sentiment analysis.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Abduljabbar S. Ba Mahel, Fahad Mushabbab G. Alotaibi, Nini Rao
Proceedings Volume Sixth International Conference on Image, Video Processing, and Artificial Intelligence (IVPAI 2024), 132250I (2024) https://doi.org/10.1117/12.3046225
Deep learning models applied to arrhythmia classification often encounter substantial difficulties due to the presence of class imbalance in the dataset, wherein certain types of arrhythmias appear substantially less frequently than others. This disparity can result in biased models that excel in the majority classes but underperform in the minority classes, leading to reduced overall accuracy and potentially dangerous misclassifications that could endanger lives. The Synthetic Minority Over-Sampling Technique (SMOTE) resolves these difficulties by creating artificial samples for the minority classes, thus achieving a balanced dataset. This article examines the effectiveness of using synthetic data generated by the SMOTE method to improve the performance of deep models in classifying arrhythmias from ECG signals and their scalograms. We conduct an experimental study that applies SMOTE to balance arrhythmia classes and compare it with the performance of models trained on unbalanced data. In addition, we perform a detailed comparative analysis of ten distinct deep-based methods, particularly Convolutional Neural Networks (CNNs), and evaluate their ability to extract features from scalograms. Our results show that using synthetic data obtained by SMOTE significantly improves classification accuracy, especially when applying Convolutional Neural Networks (CNNs). These findings have important implications for medical diagnosis and monitoring of cardiovascular diseases. The article also discusses related problems and promising directions for further research.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Zhichao Liu, Yongjun Pan, Hongsheng Zhang, Yi Wu, Jingdun Pang
Proceedings Volume Sixth International Conference on Image, Video Processing, and Artificial Intelligence (IVPAI 2024), 132250J (2024) https://doi.org/10.1117/12.3046310
In the slow-change degradation process of rolling bearings, there is a coupling relationship between the evolution of bearing failure and the onset of degradation, which directly affects the timeliness of bearing degradation prediction. Aiming at the difficulty of detecting the onset of degradation in the slow-changing degradation process of rolling bearings, this paper proposes a joint research framework of "onset of degradation detection-degradation prediction" for rolling bearings. Specifically, a health indicator construction method is proposed, and the detection of the onset of degradation of rolling bearings is realized. In addition, this indicator is very sensitive to the early failure of bearings. On this basis, a deep learning-based fault diagnosis and remaining useful life prediction (DFRP) model is established to realize the prediction of the degradation trend. We first divide the process of the vibration devices into common stage, degradation stage, and fault stage. The key features are extracted from the pre-processing signals and put into the prediction model together. Sufficient experimental results verify that under the condition of limited training samples, the proposed DFRP model obtains better prediction results compared with the Relevance Vector Machine (RVM) and Recurrent Neural Network (RNN) methods.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Yolanda D. Austria, Jhon Kenneth A. Acerado, Aloysius Atheos L. Butac, Caryll Franz M. Cariño, Carlos Miguel T. Marquez, Ma. Concepcion A. Mirabueno
Proceedings Volume Sixth International Conference on Image, Video Processing, and Artificial Intelligence (IVPAI 2024), 132250K (2024) https://doi.org/10.1117/12.3046953
This research addresses urban parking challenges by allowing users to reserve parking spaces via a mobile app. The system integrates automated barriers and AI-powered cameras for accurate license plate recognition, ensuring secure and seamless parking access. It includes user registration, slot selection, plate number entry, and payment, with all data securely stored in a cloud database. Real-time notifications and a robust database management system enhance user experience and operational efficiency. Additionally, the system features automated bollards and a payment system to further streamline parking management. Experimental results show a 95% accuracy in license plate recognition, significantly improving the efficiency and security of parking reservations. This innovative approach combines mobile technology, machine learning, and automated solutions to provide a stress-free and secure parking experience.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Sixth International Conference on Image, Video Processing, and Artificial Intelligence (IVPAI 2024), 132250L (2024) https://doi.org/10.1117/12.3047555
We explore the problem of two-dimensional parallel beam tomography, where both the object under analysis and the projection directions are not specified. The scenario and assumptions require that the unknown nonnegative 2-D function, denoted as f, has rectangular compact support in the first quadrant of the xy-plane. Radon projections are assumed to be accessible for all angles within the range [0, π]. Despite its primarily theoretical nature, this problem is amenable to numerical implementation, producing precise results, as illustrated in our experiments with synthetic and real images. The approach is characterized by simplicity in terms of its mathematical tools, yet it remains accurate and efficient.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Muhammad Bin Sanaullah, Korel Yildirim, Robert Lederman, Esin Ozturk-Isik
Proceedings Volume Sixth International Conference on Image, Video Processing, and Artificial Intelligence (IVPAI 2024), 132250M (2024) https://doi.org/10.1117/12.3046211
Accurate localization of interventional devices like guidewires and catheters is crucial for patient safety and successful cardiovascular procedures. X-ray fluoroscopy, the most common imaging modality, provides real-time device visualization but lacks soft tissue contrast and 3D spatial information, limiting precision in complex procedures. This study addresses the need for 3D visualization by developing a deep learning-based algorithm to estimate the 3D geometry of interventional devices during cardiac catheterization using biplanar x-ray fluoroscopic images. The methodology involves extracting individual frames from biplane x-ray DICOM images, normalizing and preprocessing them for input into a U-Net segmentation model that predicts device masks. A geometric model of the biplanar x-ray system is created using DICOM information embedded in a Cartesian coordinate system space, mapping 2D projection points of segmented masks onto detector planes and tracing X-ray vectors between sources and projection points. Intersections of these vectors determine the 3D points for the device, by minimizing residual distances between closest points on ray vectors. The 3D model is back-projected to both source points for comparison with original projections, to account for lack of present volumetric data. Results show Mean Absolute Error values of 2.08-3.29mm between original and back-projected 2D points, corresponding to 3-5mm differences between predicted and actual 3D points, depending on radiographic magnification. In comparative analysis of projections, F1 scores range from 0.71-0.98, with precision of 0.83-0.94, accuracy of 0.75-0.97, and recall of 0.91-0.89, indicating the algorithm’s promise in reconstructing 3D geometries from biplanar x-ray images.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Jing Ma, Junhai Ma, Chengzhen Wang, Weihua He, Xueliang Bao, Yi Li
Proceedings Volume Sixth International Conference on Image, Video Processing, and Artificial Intelligence (IVPAI 2024), 132250N (2024) https://doi.org/10.1117/12.3046072
This paper aims to assess the reliability of a gesture recognition method combining Mediapipe and lightweight deep learning networks as a potential replacement for the EMG-based gesture recognition method in EMG-controlled FES systems. In this paper, the Mediapipe framework was employed to construct a dataset, wherein the 3D coordinates and left/right hand identifiers of 21 key points on the hand were extracted from preprocessed images. The data in the dataset were classified into two categories of data that contained and did not contain left/right hand labelling. A lightweight Dense Neural Network (DNN) and a lightweight Convolution Neural Network (CNN) model were constructed and initially trained and tested using the first type of data in the dataset. Subsequently, the lightweight DNN was trained and tested using the second type of data in the dataset, in accordance with the results. The results demonstrated that the lightweight DNN and CNN have a 99.91% and 100% recognition rate, respectively, on the test set of the first type of data. Additionally, the DNN has a 100% recognition rate on the test set of the second type of data. It is noteworthy that the DNN network has a smaller number of model parameters compared to the CNN network. Furthermore, the DNN model demonstrated superior action recognition precision, F1 score, and recall on the test set of the second type of data in comparison to the DNN and CNN models on the first type of data. The hand labelling was beneficial in enhancing the evaluation metrics.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Sixth International Conference on Image, Video Processing, and Artificial Intelligence (IVPAI 2024), 132250O (2024) https://doi.org/10.1117/12.3046578
In order to solve the problems of unsupervised monocular depth estimation algorithm output scale blur, inconsistent depth estimation scale between frames, poor depth estimation accuracy under strong light and simple-texture scenes, this article combines the Extended Kalman Filter (EKF) with IMU attitude information, which can provide information about real scale constraints for the training phase of the monocular depth estimation network. The minimization of the IMU photometric loss function and the cross-sensor photometric consistency loss function are introduced to solve the scale ambiguity problem of the unsupervised monocular depth estimation algorithm and ensures that the scale of depth estimates between frames is consistent. Based on the IMU's insensitivity to light, the algorithm also improves the accuracy of depth estimation in outdoor strong light scenes.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Sixth International Conference on Image, Video Processing, and Artificial Intelligence (IVPAI 2024), 132250P (2024) https://doi.org/10.1117/12.3047000
This paper shows that the Bone-Conducted (BC) speech is more suitable than the corresponding Air-Conducted (AC) speech for detecting the pitch of speech in noisy environments. To utilize the robustness against the background noise of BC speech, a bone conduction filter being an Infinite Impulse Response filter (IIR) is used to produce BC speech from AC speech, and then pitch detection is performed based on the autocorrelation approach. Several types of noise characteristic of white, babble, factory, Volvo, pink, HF channel, train, and military vehicle noises are considered and the effect of the bone-conduction filter for pitch detection is examined and discussed. Experiments demonstrate that pitch detection based on BC speech outperforms that on AC speech at both low and high signal-to-noise ratios, especially in HF channel noise.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Sixth International Conference on Image, Video Processing, and Artificial Intelligence (IVPAI 2024), 132250Q (2024) https://doi.org/10.1117/12.3046177
This study investigated the relationship between the pitch, intensity, and physiological respiratory reset amplitude parameters of different sentence types by collecting speech and respiratory signals of loaded sentences. At the acoustic level, when the focus word is at the end of the sentence, the highest pitch value is at the beginning of the sentence. The overall trend is downward; however, there is a fluctuation in pitch at the end of the sentence. When the focus word is in or at the beginning of the sentence, loaded sentences will have at least one pitch fluctuation in the sentence. In terms of sound intensity in declarative sentences and rhetorical questions, whether the focus is at the beginning, middle, or the end of a sentence, the maximum amplitude-duration integral intensity is above the focus. In definite, interrogative, and guessing questions, when the focus word is at the end or the beginning of the sentence, the maximum amplitude-duration integral intensity is on the focus word or at the end of the sentence, respectively. In terms of the acoustic and physiological correspondence, the chest and abdomen superimposed signal respiration reset amplitude has a certain influence on the pitch; nevertheless, the effect is not absolute. In terms of sound intensity, the chest and abdomen superposition signal respiration reset amplitude has different effects on the sound intensity (amplitude-duration integral intensity) of different sentence types, which differ with the different positions of the focus words in the sentence.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Yolanda D. Austria, Dexter James L. Cuaresma, Ian Noel M. Banta, John Paul B. Pabelico, Mark Adrian P. Santander, Vince Kazer M. Villasor
Proceedings Volume Sixth International Conference on Image, Video Processing, and Artificial Intelligence (IVPAI 2024), 132250R (2024) https://doi.org/10.1117/12.3046938
Bitter gourd (Momordica charantia) is an essential ingredient used across diverse industries, valued not only for culinary purposes but also recognized for its various health and medical benefits. This vine is known for its unique bitter taste. However, given the manual labor required to determine when the gourds are mature and to carefully harvest them, subjectivity and inconsistency are also inherent in the existing practice of relying on human judgment to determine the optimal time for harvest. This study suggests developing an autonomous, computer vision-based bitter gourd harvesting robot to close the gap in farming methods. The objective of this study is to build a computer vision-based autonomous bitter gourd robot harvester, specifically to implement real-time maturity detection utilizing the bitter gourd's physical characteristics such as size, color, and pattern, to integrate robotic arms that facilitate precise cutting and dropping of matured bitter gourds onto a basket, and to use criteria to assess the accuracy and reliability of the computer vision-based maturity detection system. This study follows the concept of a 3D printer that normally has X, Y, and Z axes, together with machine learning, that is object detection which is trained using the YOLOv8 nano model. It passed the manual testing and integration testing wherein overall; it resulted in an 82 percent success rate in deployment testing. The study achieved its objectives with the current approach of implementation of the researchers. However, it is recommended to consider alternative methods in testing and other approaches in doing the study.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Sixth International Conference on Image, Video Processing, and Artificial Intelligence (IVPAI 2024), 132250S (2024) https://doi.org/10.1117/12.3047211
This paper presents a multi-band Magnitude Complementary (MC) crossover network by using Q-Bernstein filter (QBF). The prototype low-pass filter is used the Q-Bernstein Polynomial (QBP) to approximate a desired function. The QBF gives parameters which can be changing stop-band attenuation of the filter. The magnitude responses show that the filter gives high attenuation values in the stop-band of degree two. The two-band MC crossover network can be realized by cascading an all-pass filter and a constant voltage of degree one, which sums up to the All-Pass Filter (APF) of degree two. The multiple-bands crossover network can be realized by cascading two identical all-pass filter stages of degree two. The simulation results show that the proposed crossover network has a constant summed magnitude response and also an in-phase response over an audio band.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Sixth International Conference on Image, Video Processing, and Artificial Intelligence (IVPAI 2024), 132250T (2024) https://doi.org/10.1117/12.3046207
Processing images using object detection, image restoration, and generative adversarial networks to directly convert real-world images into high-quality anime-style background images is one of today's research hotspots in computer vision. Input real-world images, object detection using the cutting-edge target detection algorithm DETR and generation of masks for the detected objects. The image restoration algorithm LaMa is then used to erase areas of the image with masked portions, generating a real-world background image. Finally, AnimeGAN generative adversarial network is used to convert the real-world background image into anime style background image. Aiming at the current popular AnimeGAN's problems such as color distortion in image migration, a new AnimeGAN-SE is proposed by introducing SE-Residual Block (Squeeze Excitation Residual Block) to solve the problem of low color of the migrated image of AnimeGAN. The experimental results show that the network works well for animated pictures.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Sixth International Conference on Image, Video Processing, and Artificial Intelligence (IVPAI 2024), 132250U (2024) https://doi.org/10.1117/12.3046210
Earthquake is one of the natural disasters that Japan often suffers from, and it may cause cracks and other damages on roads, which seriously affects transportation safety and economic development. This study aims to develop a roadway crack recognition system to cope with road damage after earthquakes in a specific environment. The system first uses the YOLOv5 algorithm to identify cracks on the road, then intercepts the parts of the road identified as "cracks" and uses the U-Net algorithm to perform semantic segmentation. Finally, the area of the semantically segmented crack area is calculated to evaluate the damage degree of the crack. We use the Stable Diffusion algorithm to generate a context-specific crack dataset. Furthermore, in order for the model to perform well on camera devices, we use knowledge distillation to compress the trained model. Experimental results show that the system performs well in terms of crack detection accuracy and speed, and is able to quickly and accurately identify and assess cracks on roads. The development of this system is important for improving the efficiency of road crack monitoring and is expected to be widely used in road maintenance after earthquake disasters in the future.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Salma Nurhaliza Pribadi, Igi Ardiyanto, Nahar Taufiq
Proceedings Volume Sixth International Conference on Image, Video Processing, and Artificial Intelligence (IVPAI 2024), 132250V (2024) https://doi.org/10.1117/12.3046875
Segmentation approaches with 3D representation data face significant challenges, particularly in terms of contrast and resolution, which tend to decrease when the image is enlarged. These challenges can negatively impact segmentation performance, which requires high performance and accuracy for medical applications. To address this, this study proposes the use of image segmentation from 3D representation data to a 2D slice representation approach that offers higher sharpness and resolution because it can capture details at the slice level. Type B aortic dissection is a very dangerous disease of the aorta. In the context of medical image segmentation for cases of type B aortic dissection, previous studies have often only examined two main components: False Lumen (FL) and True Lumen (TL). However, the presence of a False Lumen Thrombus (FLT) is a critical factor that can endanger patients for the treatment and diagnosis of type B aortic dissection. Therefore, this study proposes a more comprehensive segmentation with 2D slice representation, including TL, FL, and FLT in type B aortic dissection. Through 2D slice representation, this study successfully maintained good image resolution and contrast and achieved more optimal segmentation accuracy compared to previous works by using the 2D fine-tuning U-Net method and appropriate activation function settings. The final results of this study show the highest dice score (DSC) with the use of the ReLU activation function, reaching 96% for the false lumen, 91% for the true lumen, and 92% for the false lumen thrombus.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.