PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.
Proceedings Volume Third International Conference on Communications, Information System, and Data Science (CISDS 2024), 1351901 (2025) https://doi.org/10.1117/12.3059864
This PDF file contains the front matter associated with SPIE Proceedings Volume 13519, including the Title Page, Copyright information, Table of Contents, and Conference Committee information.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Communications, Information System, and Data Science (CISDS 2024), 1351902 (2025) https://doi.org/10.1117/12.3058111
With the development of drone technology, its application in intelligent scenic areas provides a new solution for tourist flow monitoring. To enhance detection accuracy and satisfy real-time demands, this study proposed a low-altitude target detection algorithm of intelligent scenic areas based on improved YOLOv10, and developed an intelligence scenic areas tourist flow monitoring and statistic system accordingly. By introducing the Large Separable Kernel Attention (LSKA) mechanism, the algorithm optimizes the Spatial Pyramid Pooling Fast (SPPF) module and effectively capturing long-range dependencies in images. In addition, we added a Small Target Detection Layer(STDL) to the YOLOv10 network structure to retain more location information and detailed features about small targets. Results from experiments conducted on the VisDrone2019 dataset show that, compared to the original YOLOv10 model, the enhanced version demonstrates an improvement in Recall by 2.0% and an increase in mAP@0.5 by 1.7%. Compared with other mainstream models, our proposed algorithm has improved on many evaluation metrics, and fulfills the requirements for real-time detection. It has been successfully applied to Tsingtao Beer Museum and has achieved good results. The results of the experiments indicate that the algorithm performs well in detecting low-altitude aerial photography images of drones, and provides effective technical assistance for the safety management of intelligent scenic areas.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Communications, Information System, and Data Science (CISDS 2024), 1351903 (2025) https://doi.org/10.1117/12.3057716
Currently, high-resolution radar is increasingly widely used for target detection. However, the higher range resolution causes the target energy to be dispersed over multiple range cells, creating the range-spread targets rather than the point targets, which leads to a degradation of the detection capability of traditional point-target detection methods. In this paper, an improved dual-threshold generalized likelihood ratio test detector is proposed for high-resolution radar. The concept of the “effective scatterer” is introduced to extract the strong scatterer cells of the targets. Then the Lilliefors test is adopted for pre-judgement and based on the pre-judgement result, the calculation of the first threshold and the second threshold is optimized so that the proposed detector can improve the detection performance at a constant false alarm rate. The simulation results show that the proposed detector outperforms traditional detection methods for range-spread targets.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Communications, Information System, and Data Science (CISDS 2024), 1351904 (2025) https://doi.org/10.1117/12.3057881
Accurately predicting the lifespan of lithium-ion batteries is crucial for reducing maintenance costs and advancing clean energy technologies. Traditional prediction methods often to accurately estimate the lifespan due to the diverse volatility characteristics of lithium-ion battery degradation. This paper proposes a Kurtosis-driven SMA-ARIMA-LSTM method to the lifespan of lithium-ion batteries. First, the data is decomposed into low and high volatility components using a moving average (SMA). Then, the I model is applied to the low volatility part, while the LSTM network handles the high volatility part. Finally, the parallel prediction results of these two parts are combined to the remaining lifespan of the lithium-ion battery. The model is validated using four sets of CS2 series lithium-ion battery degradation data provided by the CALCE center at University of Maryland. The results show that the hybrid model significantly improves prediction accuracy compared to standalone LSTM or ARIMA models, achieving near-perfect determination coefficients while significantly mean and root mean square errors. This effectively captures the overall degradation trend of the battery and the capacity regeneration phenomenon. The experimental results demonstrate that the proposed SMA-IMA-LSTM method achieves a fitting rate of over 95%, with MAE kept within 0.9% and RMSE kept within 1.4%, thus realizing precise prediction of the remaining lifespan of lithium-ion batteries.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Jianhong Gan, Xi Lin, Changyuan Fan, Youming Qu, Peiyang Wei, Yaoran Huo, Zhibin Li
Proceedings Volume Third International Conference on Communications, Information System, and Data Science (CISDS 2024), 1351905 (2025) https://doi.org/10.1117/12.3057853
The widespread application of photovoltaic power generation in smart grids makes accurate generation forecasting essential for grid management and planning. Aiming at the randomness and unpredictability of solar energy in PV power generation prediction, this paper proposes a short-term PV power generation prediction model with higher accuracy based on BiTCN, multi-focus mechanism, and BiGRU model. In addition, DRSN is introduced to improve the residual block of BiTCN so as to extract the important features of PV power generation and reduce the redundant features. In addition, the combination of multiple attention mechanisms enables the model to analyze all aspects of the data in parallel, ensuring a comprehensive examination of the information. The BiGRU model accurately captures the long-term dependencies and inherent characteristics of time series data. In order to further improve the prediction accuracy, the AR model was used to optimize the linear extraction ability of the model. The experimental results show that the MAE, RMSE, and R2 of the proposed model are superior to the traditional model in complex data sets, including weather forecast data, station weather data, and power data. MAE, RMSE and R2 evaluation indexes show that the model has good prediction accuracy.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Communications, Information System, and Data Science (CISDS 2024), 1351906 (2025) https://doi.org/10.1117/12.3057725
The distributed hybrid flow shop scheduling problems (DHFSP) widely exist in various industrial production processes, and thus have received widespread attention. However, studies on HFSP considering green objective in distributed production environment are quite limited. Therefore, this paper investigated a distributed hybrid flow shop scheduling problem with objectives of minimization the makespan and total energy consumption (TEC). To solve it, an iterated greedy algorithm based on NSGA-II (NSGAIG) is developed. In the proposed algorithm, a random initialization strategy is used to generate the initial solution. Then, starting from the population initialization, a multi-objective local search is carried out for the current optimal solution in the population to obtain the global optimal solution. Next, random variation method is used to increase the exploration space of the algorithm. Finally, the proposed NSGAIG algorithm is compared with other multi-objective optimization algorithms. Experimental results indicate that the proposed NSGAIG outperforms its compared algorithms in solving this problem.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Kwang Rim Song, Song Ho Kim, Chol Guk Han, Un Sim Ri
Proceedings Volume Third International Conference on Communications, Information System, and Data Science (CISDS 2024), 1351907 (2025) https://doi.org/10.1117/12.3057800
In CBR for industrial process control, it is an important issue for improving the efficiency of reasoning to adapt the case base using the online knowledge acquired during operation. In this paper, a new method for online updating and addition of case-base is proposed and applied to set-point control of the rolled cake production process. First, we present an approach for quantitatively evaluating the rolled cake based on sensory quality. Second, we propose an interactive method to acquire knowledge from the operator's experience and adapt the case base according to the behavior of the process and the quantitatively evaluated quality level. The comparative experiments for the validation show that the proposed method results in the 4.1% improvement in the ratio of good products.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Communications, Information System, and Data Science (CISDS 2024), 1351908 (2025) https://doi.org/10.1117/12.3058052
This paper introduces an LLM-based method for generating vulnerable code equivalents to enhance software system security. With the rapid growth in software development, code security has become a significant concern. Traditional defense mechanisms are inadequate against unknown threats stemming from unidentified vulnerabilities. The proposed method leverages Large Language Models (LLMs) to generate executables that are functionally equivalent but structurally diverse, allowing for prompt replacement of vulnerable code and ensuring system stability. By integrating fuzz testing, the approach validates the functional equivalence of generated code through code coverage tracking, reducing the number of input sets needed while increasing coverage. The method aims to address three key questions: ensuring software system operation under attack, generating executables efficiently, and testing functional equivalence effectively. The study demonstrates that LLMs can improve executable generation efficiency, combine with fuzz testing for thorough validation, and maintain code correctness. The experiments show the method’s effectiveness in patching vulnerabilities and producing functionally equivalent executables, offering a potential defense against new threats.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Communications, Information System, and Data Science (CISDS 2024), 1351909 (2025) https://doi.org/10.1117/12.3057629
Static bug detection techniques have advanced significantly in identifying issues such as null pointer dereferences, memory leaks, and use-after-free vulnerabilities. However, existing methods that rely on pre-computed points-to analysis often struggle with scalability and precision, especially when handling complex pointer manipulations and deep call contexts. To address the scalability challenges of precise points-to analysis, we propose a fused approach for bug detection. Initially, we utilize an inexpensive Andersen points-to analysis to construct a sparse yet coarse program memory model. High-precision analysis is then applied selectively, only when necessary, reducing redundant computations and enhancing accuracy. This combination of coarse modeling and on-demand precision enables efficient and scalable bug detection. Experimental results across five real-world benchmarks show that our demand-driven flow-, context- and path-sensitive approach achieves up to a 4.55x speedup in analysis time compared to traditional eager flow-sensitive analysis. Notably, our approach successfully completes the analysis of large-scale programs such as sqlite3, which time out under traditional approaches. Additionally, our approach reduces false positives by over 70%, maintaining the detection of all true positive bugs. These results demonstrate the effectiveness of our approach in improving the efficiency and precision of static bug detection.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Communications, Information System, and Data Science (CISDS 2024), 135190A (2025) https://doi.org/10.1117/12.3058096
Federated learning (FL) is a widely adopted distributed machine learning paradigm, individual clients train local models by using their private datasets and then send model updates to a central server. While its decentralized training process can protect data privacy, it is vulnerable to attacks such as model poisoning attack and backdoor attack. The effect of malicious clients can be mitigated by applying robust FL methods. However, most existing solutions ignored the client dependability. This paper explores a method for quantitatively assessing the client dependability in FL framework. Firstly, based on semi- Markov process (SMP), we build a multi-dimensional evaluation model for understanding how the client's behaviors under attack and its recovery behaviors affect the client dependability. Then, we deduce the formulas of calculating the availability, security risk and reliability in order to analyze the quantitative relationship between different factors and the client dependability from these three perspectives. Furthermore, we perform numerical analysis to investigate how different system parameters impact the client dependability.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Communications, Information System, and Data Science (CISDS 2024), 135190B (2025) https://doi.org/10.1117/12.3057771
To address the challenges associated with the YOLOv4-Tiny algorithm's complex structure, high computational resource requirements, and extensive parameter count—which collectively impede efficient FPGA deployment—we propose a hardware-software co-optimization strategy. This approach replaces the YOLOv4-Tiny backbone with the MobileNetv1 network and integrates the Convolutional Block Attention Module (CBAM) into the enhanced feature extraction network. Channel pruning is applied to streamline the network structure, and weights and biases are quantized to 16-bit fixed-point representations. Compared to the original YOLOv4-Tiny, this optimized network reduces parameters by 40% while retaining nearly identical recognition accuracy. Using high-level synthesis tools, we generate FPGA IP cores, design a parallel pipelined convolutional architecture, and implement inter-layer blocking between convolutional layers to enhance computational efficiency. This improved algorithm is deployed on a Zynq-7020 FPGA chip. Experimental results show that the optimized algorithm achieves a computational performance of 43.4 GOP/s, offering a speedup of 1.6 to 4.1 times compared to existing studies, with an energy efficiency ratio 4.8 to 10.7 times greater than current implementations. These findings indicate that the proposed strategy significantly improves algorithm deployment efficiency on resource-limited FPGA platforms.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Communications, Information System, and Data Science (CISDS 2024), 135190C (2025) https://doi.org/10.1117/12.3057528
Clustering is crucial for analyzing heterogeneous information networks (HINs). Mainly state-of-the-art algorithms often focus on single-type node clustering, overlooking the clustering of multiple node types and requiring manual setting of cutoff distances and clustering centers. This paper introduces a clustering algorithm integrating multi-type objects, named MTOClus, which generates multiple similarity matrices for different node types and assigns weights to each matrix to indicate their significance. By aggregating weighted similarity matrices, it creates a distance matrix. MTOClus automates cluster center selection through node sorting and mitigates the impact of outliers by utilizing the K-Nearest Neighbors approach to compute the cutoff distance. Experimental results across four datasets consistently show that MTOClus outperforms six other algorithms, underscoring its superior clustering efficacy.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Communications, Information System, and Data Science (CISDS 2024), 135190D (2025) https://doi.org/10.1117/12.3058016
In recent years, the increasing demand for data privacy has positioned federated learning (FL) as a promising approach for collaborative machine learning, allowing participants to preserve privacy while jointly training models. Vertical federated learning (VFL), where participants hold distinct feature sets for the same data cohort, introduces unique privacy challenges. While VFL prevents the sharing of raw data, intermediate results such as the gini coefficient can still reveal sensitive information. We presents a privacy-preserving feature selection framework tailored for VFL, designed to prevent the server from inferring the client’s feature distribution through intermediate computation parameters. By integrating homomorphic encryption (HE) and Differential Privacy (DP), the framework enables collaborative computation without exposing raw data, while the added noise enhances data privacy protection. Experimental evaluations demonstrate that the proposed feature selection method FPFS consistently achieves superior accuracy and efficiency across various datasets, particularly excelling in effective feature selection and privacy preservation. Compared to other methods, FPFS maintains higher model performance across multiple experimental scenarios and effectively mitigates both internal and external attacks.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Communications, Information System, and Data Science (CISDS 2024), 135190E (2025) https://doi.org/10.1117/12.3058690
The kidney is a crucial organ in the human body, co-operating billions of pipelines to cleanse the body's water. Kidney failure, renal tumor happens when cells divide uncontrollably and form an aberrant collection of cells surrounding or within the kidney. This cell type has the ability to disrupt regular kidney activity and destroy healthy cells. The prompt diagnosis of renal tumor is critical since they can be fatal if left untreated. Because it is dependent on the skill of the person analyzing the images, the traditional approach of manually checking the MR image may not be very accurate. The study concentrated on the diagnosis of normal and normal renal tumor. To enhance accuracy and expedite diagnosis, using publicly accessible individual records, the present research used methodologies for machine learning, involving support vector machine learning (SVM), adaptive optimization (AO), as well as gradient enhancement (GE). It uses an approach for making the dataset reduced multidimensional. Image feature extraction is a data preprocessing method that minimizes the time required to train the proposed algorithm. The accuracy rates of the algorithms for diagnosing normal and up normal are reported to be 88.8 % for GB, 83.8 % for ADA, 86.1 % for SVM and 93.3% for KNN and for up normal diagnosis tumor are reported to be 55.3 % for GB, 55.4 % for SVM, 55.3 % for ADA and 51.0% for KNN.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Dazhi Ren, Shengli Lv, Jinlin Li, Naining Li, Lin Dang
Proceedings Volume Third International Conference on Communications, Information System, and Data Science (CISDS 2024), 135190F (2025) https://doi.org/10.1117/12.3058329
Few-shot action recognition predicts new classes without labels and has received widespread attention for practical systems. The skeleton is a sparse representation of human actions, and existing spatiotemporal based models by training a strong encoder network could make the skeleton graph very dense with edges, which may lead to the over-smoothing problem. To address this issue, we propose the Spatio-Temporal Aggregation Transformer Network (STAT-Net) as a general backbone for skeleton-based few-shot action recognition. In the spatiotemporal aggregation transformer modules, the spatial multi-head self attention for modeling the connection of different joints in the same frame, while the temporal multi-head self attention for modeling the skeleton sequence between two adjacent frames. The extracted features between the three parts are aggregated by Adaptive Fusion technique to obtain a high dimensional embedding. Extensive experiments on two benchmarks demonstrate that our proposed model achieves better recognition results com- pared with other existing methods.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Communications, Information System, and Data Science (CISDS 2024), 135190G (2025) https://doi.org/10.1117/12.3058050
This paper introduces a dynamic feedback-based vulnerability mining method tailored for highly closed terminal protocols, addressing the limitations of traditional fuzz testing methods which struggle with closed-source protocols due to the lack of accessible code or protocol specifications. The proposed method overcomes these barriers by generating test cases using Large Language Models (LLMs) and optimizing them through real-time execution feedback without a deep understanding of the protocol. The primary contributions include a balanced training set construction method for LLMs, integration of LLMs with fuzz testing to generate test cases without relying on protocol knowledge, and a real-time feedback mechanism from a state machine to LLMs for continuous test case optimization. The method’s effectiveness is validated through experiments on a closed-source protocol, MQTT, and SSH, demonstrating significant improvements over conventional AFL fuzz testing. The results show that the proposed method can identify up to 4.34 times more valid cases in closed-source protocols, highlighting its efficiency in vulnerability detection.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Zirong Su, Yongbing Gao, Xiaoang Chen, Lidong Yang
Proceedings Volume Third International Conference on Communications, Information System, and Data Science (CISDS 2024), 135190H (2025) https://doi.org/10.1117/12.3057733
In the event extraction task, the existing models use trigger words as a bridge to extract structured information, but the extraction effect is not ideal when faced with police texts without trigger words or fixed trigger words. To solve this problem, an end-to-end trigger-free word overlapping event extraction model was proposed—TFOEE. In this model, the task of extracting overlapping events without triggering words is transformed into a task of identifying relationships based on grid filling strategy, event types and word fragments. Experiments show that the accuracy, recall rate and F1 value of TFOEE model are better than those of baseline model on police text dataset. And the F1 value of the TFOEE model reached 94.1%.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Metouole Mwinbe Yves Ghislain Somda, Samuel Ouya, Gervais Mendy
Proceedings Volume Third International Conference on Communications, Information System, and Data Science (CISDS 2024), 135190I (2025) https://doi.org/10.1117/12.3057590
The evolution of the monetary system has been historically catalyzed by technological advancements, socioeconomic shifts, and evolving consumer needs. With the rise of technology-driven payment systems like cryptocurrencies, instant payments, and blockchain, central banks globally have increasingly explored the potential benefits of issuing digital forms of their fiat currency, known as CBDC. Despite central banks’ traditional roles in ensuring financial stability, managing currency circulation, and supporting state financing needs, they have sometimes failed to prevent macroeconomic crises and ensure price stability, leading to a decline in trust in national currencies. This has been evidenced by the emergence of private cryptocurrencies, particularly in developing countries where traditional economic systems have faltered. A well-designed CBDC holds the promise of addressing multiple goals, enhancing financial stability, promoting inclusivity, and fostering innovation in economic systems. While numerous studies have explored CBDC design since 2020, there remains a significant demand for knowledge due to the ongoing exploration and implementation of CBDCs worldwide. This paper presents a systematic literature review utilizing text mining to analyze CBDC design architecture. By examining abstracts, white papers, and conference articles, we identify common design features and topics, offering insights into the evolving landscape of CBDC design and implementation particularly for developing countries.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Communications, Information System, and Data Science (CISDS 2024), 135190J (2025) https://doi.org/10.1117/12.3058011
Cracks in concrete are a major hazard to the safety and durability of buildings. Efficient detection and timely repair of these cracks have become pressing issues in the field of civil engineering. Existing research suffers from shortcomings such as insufficient data, difficulties in feature extraction, and an imbalance between accuracy and computational cost, hindering the practical application of models. This paper presents an automated crack detection model based on the Revisiting Vision Transformer (RevVIT), with deep learning optimizations tailored to the complexity and diversity of crack images. Various data augmentation techniques were applied to preprocess raw images from highways, bridges, dams, and other structures to create a high-quality crack dataset. The RevVIT model was then introduced to address the inadequacies of traditional models in feature extraction under complex environments, and its performance was compared against VGG19 and three other baseline models on this dataset. Experimental results show that the RevVIT model achieved a classification accuracy of 99.03% in crack detection tasks, demonstrating high robustness and significantly outperforming existing methods, while also delivering superior efficiency in training and inference times.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.