PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.
Proceedings Volume International Conference on Computer Vision, Application, and Algorithm (CVAA 2022), 1261301 (2023) https://doi.org/10.1117/12.2681448
This PDF file contains the front matter associated with SPIE Proceedings Volume 12613, including the Title Page, Copyright information, Table of Contents, and Committee Page.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume International Conference on Computer Vision, Application, and Algorithm (CVAA 2022), 1261302 (2023) https://doi.org/10.1117/12.2673660
Pneumonia is one of the most fatal and common diseases in the world, with millions of people suffering from it, especially the young and the aged whose immune systems are relatively fragile, and it is also known for its various symptoms. With a large amount of research being done to detect pneumonia through machine learning models, few are focused on analyzing the pathology of pneumonia. Therefore, in this paper, two relatively simple deep learning models are built to separate pneumonia images from normal ones and analyze the pathology of pneumonia as well. The first model is a light-weighted CNN model and the other is also a CNN model but with several residual blocks inside its architecture. After the training process and being tested by test sets, the performances of both models on both tasks are favorable. The overall accuracy for detecting pneumonia images is over 93% and that of analyzing the pathology of pneumonia is over 80%. This outcome proves that pneumonia can be diagnosed, and pathology can be analyzed through simple deep learning models.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume International Conference on Computer Vision, Application, and Algorithm (CVAA 2022), 1261303 (2023) https://doi.org/10.1117/12.2673212
Recent research has shown that attention mechanisms can help convolutional networks train and infer more efficiently and accurately. However, the current attention mechanism mainly focuses on the relationship between global features and ignores the relationship between local features. A Feature Filtering Module (FFM) for convolutional neural networks is proposed in this paper. The FFM module uses attention mechanisms in both spatial and channel dimensions and fuses the attention feature maps of the two branches into a 3D attention feature map to help effective feature information flow more efficiently. Extensive tests on the CIFAR-100, and MS COCO show that FFM improves baseline network performance under various models and tasks, demonstrating FFM's versatility.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume International Conference on Computer Vision, Application, and Algorithm (CVAA 2022), 1261304 (2023) https://doi.org/10.1117/12.2673214
Video description has become a research hotspot in recent years because of its wide application value. Single visual feature information can not accurately guide the generation of accurate video description, resulting in the mismatch between the generated description text and video content. To solve this problem, a video description text generation algorithm combining visual and voice features is proposed, which enhances the accuracy of the generated description text by combining visual and voice features. First, the vision transformer model is used to extract the visual feature vector, and the Mel-Frequency Cepstral Coefficients is used to extract the audio feature vector. After the two feature vectors are spliced, the average pooling process is performed to obtain the global feature information; Secondly, the processed feature information is sent to the transformer encoder for encoding. Finally, the encoded results are sent to the transformer decoder to finally generate the video description text. The transformer framework contains a multi head self-attention mechanism, which can focus on more important video feature information while acquiring temporal feature information, making the generated text description more accurate. The method proposed in this paper has been tested on the public data sets MSRVTT and MSVD and has achieved good results in four different evaluation standards.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume International Conference on Computer Vision, Application, and Algorithm (CVAA 2022), 1261305 (2023) https://doi.org/10.1117/12.2673495
Image target detection technology using deep learning has been fully developed in recent years, but in harsh and complex environments such as rain, snow, and darkness, the information collected by the image sensor will have a lot of noise, and it is easy to lose information about small distant targets and fuzzy targets, resulting in false detection and missed detection. Millimeter-wave(mmWave) radar uses electromagnetic waves for obstacle detection, and the detection performance in harsh environments basically does not degrade, in order to solve the situation that the detection accuracy is significantly reduced when single sensor environmental sensing technology is applied to complex scenes. In this paper, an enhanced neural network model for feature fusion is proposed by fusing radar and camera sensor features and combining them with an improved YOLO-S neural network. Firstly, the model is designed to convert the radar point cloud into image form while applying the BiFPN structure with integrated CA attention mechanism to the YOLOX-S neural network. By connecting radar features with image features and combining radar features with spatial attention weights, the useless information in the extracted features is suppressed while the key information is enhanced, and the sensor features are complementary. The experimental results show that the fusion-enhanced network designed in this paper fully integrates the features of both sensors and provides better detection results and stronger robustness in detecting small distant objects and objects in dark and low-light environments.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume International Conference on Computer Vision, Application, and Algorithm (CVAA 2022), 1261306 (2023) https://doi.org/10.1117/12.2673629
Human pose estimation has been a significant task in computer vision in the last few decades and had a wide variety of applications in the scientific field and in people’s daily life. People usually use cameras to obtain RGB images or videos to compose an integrated dataset and then use different deep learning methods or some other combinations of methods to try to estimate the proper posture of the human body. Although scientists have thought of various kinds of technical ways to estimate human pose, they still cannot achieve 100 percent accuracy because of some unavoidable factors, for example, the complex environmental changes of everyday life, flexible body, and a variety of body shapes make prediction accuracy more difficult because it affects the confirmation of key points, and limitations of some methods. For better research, this review will focus on advanced technical methods, datasets, and metrics for 3D and 2D human pose estimation with single or multiple individuals.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Xianda Chen, Shuai Shao, Xiaojing Liu, Jie Yang, Nan Chen
Proceedings Volume International Conference on Computer Vision, Application, and Algorithm (CVAA 2022), 1261307 (2023) https://doi.org/10.1117/12.2673512
Aiming at the problem of a significant error in the current tree obstacle location of transmission lines, the point cloud 3D mapping technology is introduced to study the tree obstacle identification and location method of transmission lines. The three-dimensional point cloud information of the power transmission channel is obtained through the three-dimensional mapping of the point cloud, and the mapping from "two-dimensional picture" to "three-dimensional stereo" is realized. Using the image vision recognition algorithm in artificial intelligence, the position of the tree obstacle in the image is detected. Relying on three-dimensional mapping technology, the distance measurement model of hidden trouble of transmission line tree obstacle is established. Combined with tree obstacle distance monitoring results, the degree of hidden danger crisis is predicted, and the crisis level is quantified. By comparing the three ranging methods, it is concluded that the ranging error of the new ranging method is less than 1m, and it has higher ranging accuracy.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume International Conference on Computer Vision, Application, and Algorithm (CVAA 2022), 1261308 (2023) https://doi.org/10.1117/12.2673341
Facing the problems of low working efficiency, poor accuracy and insufficient stability of traditional motion detection algorithms and machine learning algorithms in pedestrian detection, this paper will take artificial intelligence technology as the core, adopt R-CNN series deep learning models, combine Support Vector Machine (SVM) classifier, and rely on TensorFlow deep learning framework to realize pedestrian detection and recognition in video and picture files with the support of OpengCV Open Source Visual Library. Combined with ASP.NET framework, C# language is used to complete the design and development of each functional module, improve the deployment of the corresponding API interface, and build a front-end interactive interface to form a pedestrian detection system based on Web. The overall design of the system chooses B/S architecture, which allows users to access the Web Server through simple request operation to complete the detection of video image data in the database and return the detection results to the front page for display. The system will greatly improve the precision and accuracy of pedestrian detection and make a positive attempt to further expand the application scenarios of pedestrian detection in video images.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume International Conference on Computer Vision, Application, and Algorithm (CVAA 2022), 1261309 (2023) https://doi.org/10.1117/12.2673744
Although the original Simple Linear Iterative Clustering (SLIC) superpixel segmentation algorithm is simple and efficient, the segmentation effect has a strong dependence on the number of initial clusters, so it is not free of human-computer interaction when dealing with real images. Aiming at the problem of low automation above, this paper introduces the Gray-level Cooccurrence Matrix (GLCM) and give the formula for calculating the number of initial clustering centers by evaluating the image complexity. An adaptive SLIC algorithm is designed for the characteristics of forward-view images in UAV low-altitude flight scenes, combined with the trade-off of image segmentation accuracy and speed under obstacle avoidance requirements. The experiments show that the method can quantify the image complexity effectively, having a great performance on both classical BSDS500 dataset and the forward-view images in the UAV low-altitude flight scenes.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume International Conference on Computer Vision, Application, and Algorithm (CVAA 2022), 126130A (2023) https://doi.org/10.1117/12.2673291
Image registration technology has important application value in Image Guide Radiation Therapy (IGRT), Deformable image registration with 2D/3D in IGRT has been developed for a long time, and although the current methods perform well, they are not accurate enough to meet the growing demand for patient care. In this paper, a fast unsupervised learning registration model based on U-net is proposed. The proposed model directly estimates the spatial transformation from float images and fixed images and applies a regularization term to constrain the model to learn local matching. The training process does not require supervised information such as the ground truth deformation field, and the entire deformation field can be predicted at once. Through the data set, the method is compared with the existing popular registration methods in terms of accuracy, The preliminary results showed that for approximately 96% cases, the learning-based method could obtain improved or comparable registration in comparison with the baseline method.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume International Conference on Computer Vision, Application, and Algorithm (CVAA 2022), 126130B (2023) https://doi.org/10.1117/12.2673316
Aiming at the characteristics of low illumination image, such as low brightness, fuzzy detail information and obvious noise, we propose an adaptive RetinexNet low illumination image enhancement method with the fusion strategy. Use the mutual independence of each channel in the HSV color space model to enhance the brightness component; then use the correlation coefficient to make the saturation component adaptively adjust with the brightness component; on the basis of UNet, combine different areas of the illumination enhancement image to carry different levels of Noise, build a reflectivity noise reduction model; a coarse enhanced image is generated through the above three steps; The camera response model is obtained by combining the camera response function and the brightness conversion function, and the optimal exposure rate is found to obtain the exposure rate map; finally, the overexposed image and the coarsely enhanced image are combined with the original low-light image, and the image block decomposition is used for reconstruction and Fusion to get the final low-light enhanced image. The experimental results show that the algorithm we adopt can ensure the texture and details of the image while enhancing the contrast of the image.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume International Conference on Computer Vision, Application, and Algorithm (CVAA 2022), 126130C (2023) https://doi.org/10.1117/12.2673247
Cross-view geo-localization usually requires matching a ground-view image with the corresponding aerial-view image. This task has attracted researchers' attention in recent years due to the huge differences in perspectives and the occlusion of information between the two views. Previous work has focused more on alignment conditions, where aerial and ground views are aligned north. The explicit alignment information is beneficial to model learning. However, in this work, we extend this task to unaligned conditions and expect the model to perform well even with random offsets in the ground-view image. To solve this problem, we propose a new multi-branch offset (MBO) architecture. Specifically, we divide the feature extracted by the CNN into N parts and shift the feature along the horizontal direction. Features of different offset sizes are then fed into subsequent multi-branch Transformer blocks, and the relationships between channels are further expressed by using channel attention. Finally, the enhanced features are fused with the output of the original CNN. We validated our results on popular dataset, where 78.13% of the Top-1 recalls and 33.78 FPS were made on CVUSA, which is higher than the existing model and achieves efficient inference speed.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume International Conference on Computer Vision, Application, and Algorithm (CVAA 2022), 126130D (2023) https://doi.org/10.1117/12.2673628
In human history, pneumonia has resulted several disasters, including the COVID-19. How to diagnosis a pneumonia timely and efficiently becomes a problem. A Computer-Aided Diagnosis (CAD) system with classification algorithms based on convolutional neural network are considered to be the most potential methods. But lack of data to train the network is a great challenge in its popularity. To help relieve the phenomenon, this paper is to explore some traditional methods in data augmentation to let a Convolutional Neural Network (CNN) trained well with less data. In this paper, two different models are applied, and both will employ five datasets with different way of data augmentation. The five datasets include the original one, the one with oversampling, the one with geometric augmentation, the one with color space transformation and the one with both geometric augmentation and color spacer transformation (it will be called the mixed method in the paper). The results show that oversampling has a limited improvement in accuracy, geometric augmentation has a great effect in the accuracy, color space transformation has a bad result, the mixed method performs as awful as color space transformation does and the second model with batch normalization perform well in all five datasets. Finally, an 80.34% accuracy is achieved. It can be concluded that, oversampling can balance the dataset and provide a limited increase, geometric augmentation is a good strategy while color space transformation is not and model with batch normalization can exclude the influence of extreme data and perform well and stable.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume International Conference on Computer Vision, Application, and Algorithm (CVAA 2022), 126130E (2023) https://doi.org/10.1117/12.2673725
Most previous surveys were conducted under ideal conditions, and the data was mostly in contribution. In reality, however, the condition of out-of-distribution dataset is more common and cannot be avoided. For example, in a dataset, the label only contains cats and dogs, but a new horse image appears in the data image. A model that does not take the OOD problem into account will only be able to classify it under a known label, even if this results in incorrect results; however, the OOD model should alert the researcher to the emergence of a new label class. As a result, this type of partial setting is gaining popularity. Furthermore, the problem of out-of-distribution is exacerbated in the medical environment. The availability of annotated training examples remains a significant barrier to advancement in medical imaging processing. Because experts spend so much time on annotation, the entire process is prohibitively expensive. Furthermore, once a new model has been trained, the acquisition parameters will be altered. As a result, out-of-distribution detection strategy is a critical technique in the field of medical images. Our research focuses on the OOD model in the context of medical images. First, we summarized the OOD models that have been proposed in recent years. Some of them proposed new metrics to set the standard for the model's good or bad performance, while others proposed new methods to continuously increase the model's precision, so that the model could correctly identify the OOD situation. Following that, we will concentrate on using the OOD model in medical images. We are convinced that medical images outside of the original dataset are extremely important for future research into this disease. Finally, we classify the evaluation criteria, methods, and datasets that are commonly used in OOD model training. It is hoped that our research will be beneficial to future researchers.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume International Conference on Computer Vision, Application, and Algorithm (CVAA 2022), 126130F (2023) https://doi.org/10.1117/12.2673290
Sign language plays an important role in information transmission and emotional communication between deaf-mute people and the outside world. With the development of artificial intelligence technology, the recognition, translation and generation of sign language based on digital image processing have attracted worldwide attention. In the field of sign language recognition, effective hand division and gesture extraction are the first and key steps, which directly affect the accuracy of sign language recognition. In this paper, a hand information extraction method based on depth image processing is proposed to solve the problem of sign language gesture extraction in complex background. For sign language speakers,hands are at the front of their bodies, so the depth images of sign language speakers can be collected by depth camera, and the complex background can be removed and hand information can be extracted by segmenting different color objects in the depth images. In this paper, the D435i camera of Inter is used to capture the depth image of the sign language speaker, and the HSV color space model based on the digital image is used for threshold processing of the fusion of hue components and brightness components to achieve the division of the hand position; through median filtering and mathematical morphology of digital image, division noise is removed and interference is reduced. Through skeleton extraction algorithm, the gesture gesture can be obtained. Experiments show that the proposed acquisition scheme and algorithm flow in this paper can effectively realize hand position division and gesture extraction in complex background conditions, and provide a good foundation for subsequent gesture recognition and expression.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume International Conference on Computer Vision, Application, and Algorithm (CVAA 2022), 126130G (2023) https://doi.org/10.1117/12.2673671
Using a deep learning process called semantic segmentation, each pixel in an image is given a label or classification. The groups of pixels that make up the distinct categories are identified using it. In several fields, including autonomous driving, imaging in the medical field, and industrial inspection, it is now widely employed. The passage will be focusing on the three main categories way of 3D Point Cloud semantic segmentation in nowadays, which are multi-view, point cloud, and voxels. The passage will introduce the basic concepts of multi-view, 3D points cloud and voxel, and show the advantages and disadvantages of the different methods under each category by the tables. Lastly, the passage will introduce the common dataset within a table for semantic segmentation. This article will discuss deep learning based on comparing the methods of different general under the semantic segmentation and show the expectation of semantic segmentation could bring in the future.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume International Conference on Computer Vision, Application, and Algorithm (CVAA 2022), 126130H (2023) https://doi.org/10.1117/12.2673312
Text image aberration correction, as a pre-processing step in Chinese character image processing techniques, has an important impact on its final results. The existing methods are difficult to obtain satisfactory correction results for text images of handwritten Chinese characters due to the non-uniform types of aberrations. To address this problem, a new correction method is proposed in this paper. The method uses Pearson correlation coefficients to classify text lines based on text line trends and combines document distortion models to unify text lines with different distortion types; meanwhile, a data expansion method based on historical text information is proposed to transform the irregular text line correction problem into a model-based text block optimization problem to deal with the situation where the number of text images is small, resulting in insufficient information for distortion models. situation. The comparison experiments with existing methods on synthetic and real datasets show that the method in this paper has better visual correction effect and higher values of PSNR, SSIM and MS-SSIM evaluation indexes.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume International Conference on Computer Vision, Application, and Algorithm (CVAA 2022), 126130I (2023) https://doi.org/10.1117/12.2673278
Target distance estimation based on optical image sensors has important application potential in high level autonomous driving. However, traditional monocular ranging techniques are susceptible to lens stance angles, and slight jitter can cause significant ranging errors. In order to tackle this problem, we propose a novel target distance estimation method based on a front and rear binocular vision system, which was inspired by the head-nodding mechanism of birds. Two front and rear cameras are set in the motion direction of the moving carrier to capture the front image simultaneously. The targets in the image were extracted using the deep learning object detection framework YOLOv3. Target depth can be achieved using the established mathematical model. Further, the effect of the horizontal distance and vertical distance variation of the front and rear cameras on the distance measurement accuracy was investigated. The Experimental results showed that the system has a range error of less than 5%. The novel method proposed in this paper is not only high in measurement accuracy, but also independent of the lens pose angle to meet the demand of real-time ranging and has certain practical value in the field of vision perception of autonomous walking robots.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume International Conference on Computer Vision, Application, and Algorithm (CVAA 2022), 126130J (2023) https://doi.org/10.1117/12.2673626
Under the development of society, labor problems are increasingly concerning people. Workers try to figure out whether the salary and work hours set by their employers are fair. Additionally, companies also need to offer reasonable salaries and work hours to their workers to avoid disputes. Although there are a lot of studies conducted to do the prediction of salary and explore the best algorithm for the prediction, the research about work hours prediction and detecting other factors influencing the prediction are still deficient. This research applies linear regression to a dataset to predict work hours. Meanwhile, different methods of data pre-processing are also utilized to detect their effect on the regression. It can be discovered from the study that the prediction of work hours is feasible and linear regression can be leveraged for doing so. Meanwhile, different data pre-processing also influence the result of the regression, and one-hot encoding performs better than label encoding. Also, dropping features of the dataset seems to affect a lot when the features are not too many. The model using one-hot encoding and not dropping columns has the lowest Mean Square Error (MSE) of about 98.2, while other models all have MSE over 100. In conclusion, the study provides a baseline for work hours prediction and gives other methods to improve the accuracy of prediction besides finding the best algorithm.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume International Conference on Computer Vision, Application, and Algorithm (CVAA 2022), 126130K (2023) https://doi.org/10.1117/12.2673294
Serious feature heterogeneity and semantic gaps exist between multi-source heterogeneous data, and existing cross-modal retrieval methods cannot effectively extract common semantic and complementary information between multi-source heterogeneous data. In this regard, an adversarial cross-modal retrieval method that fuses collaborative attention networks is proposed. The method addresses two major challenges in the cross-modal retrieval process, firstly, an information extraction algorithm based on the cooperative attention mechanism is designed, and secondly, the cooperative attention network is combined with the adversarial subspace learning algorithm to enhance the complementary capability of information in the feature subspace. The experimental results show that the proposed method has better retrieval results than similar cross-modal retrieval methods in terms of MAP metrics.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume International Conference on Computer Vision, Application, and Algorithm (CVAA 2022), 126130L (2023) https://doi.org/10.1117/12.2673275
With the development of image technology and artificial intelligence, the detection and recognition of image text is increasing. Aiming at the different intensity of text size and complex background in natural scenes, this article uses an improved text detection method based on ResNet-50. ResNet is used to implement identity mapping to solve the problem of redundant layers in deep networks, avoiding gradient descent or gradient explosion, thereby improving the overall performance of the algorithm. Compared with the EAST algorithm, this method improves the F value by 1.79 % on the ICDAR2015 dataset.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume International Conference on Computer Vision, Application, and Algorithm (CVAA 2022), 126130M (2023) https://doi.org/10.1117/12.2673219
In order to improve the effectiveness of the comprehensive evaluation of passenger riding comfort of high-speed trains, and simplify the complexity of the evaluation process, it can guide the optimization design of high-speed trains to a certain extent. In this paper, a comprehensive evaluation comfort degree model based on improved Drosophila algorithm and back propagation neural network (IFOA-BPNN) is proposed. The Drosophila algorithm is improved by using chaos mapping the initial position of Drosophila population and changing the search step of flies in different periods to search for the optimal initial weight value and threshold of BPNN. According to the index evaluation data collected, the weight of the index was comprehensively calibrated from both subjective and objective aspects, and the ride comfort of the high-speed train was pre-analyzed by fuzzy comprehensive evaluation. Then the IFOA-BPNN model is used to evaluate the riding comfort of high-speed trains. The results of experiments and comparison with other different models show that the IFOA-BPNN model has reduced the mean square error (MSE), mean absolute error (MAE) and other indexes, and can effectively evaluate the comfort degree of passengers.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume International Conference on Computer Vision, Application, and Algorithm (CVAA 2022), 126130N (2023) https://doi.org/10.1117/12.2673401
Lane detection is an important technology in autonomous driving. Currently, the mainstream lane detection methods are based on deep learning. However, most of deep learning-based lane detection models are trained on images captured on sunny days with enough light. It is concerned about their performance when being used to detect lane lines in low-light environments. In this study, images of lane lines captured in the evening and on rainy days are used to evaluate the performance of a representative LaneNet model. The result shows that the model performed badly when detecting lane lines on rainy days or in the evening. When there is enough lamplight in the evening, the performance of the model is better with part of lane lines being detected, but it still cannot detect correctly as it does on sunny days. Two potential future directions are also proposed in this study.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume International Conference on Computer Vision, Application, and Algorithm (CVAA 2022), 126130O (2023) https://doi.org/10.1117/12.2673540
In the growing market of mobile applications, users generate large amounts of mobile log data every day, which can be used for mobile users’ demographic prediction. However, the complexity of raw log data increases the difficulty of predicting demographics, and noises in these data also negatively affect the predictions in terms of accuracy. To address these issues, we construct features based on the raw log data to obtain mobile users’ behavioral data. We also propose a two-stage model using Logistic Regression (LR), Random Forest (RF), XGBoost, and ensemble methods (Averaging, Majoring, and Weighted) to predict users’ gender groups in Stage I and then age groups in Stage II. Results show that RF achieves the highest prediction accuracy in both gender prediction and age prediction, and Averaging Ensemble achieves the highest Area Under the Curve (AUC) score. Moreover, our research is significant because the two-stage model is interpretable and applicable.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume International Conference on Computer Vision, Application, and Algorithm (CVAA 2022), 126130P (2023) https://doi.org/10.1117/12.2673328
Semantic segmentation networks require huge computing resources, and the main reason is that they have too many feature fusions. In this paper, we propose a multi-scale feature fusion network with global and local fusion. The network is a lightweight network with strong competitiveness in speed and model size, and the segmentation accuracy is also improved to a certain extent.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume International Conference on Computer Vision, Application, and Algorithm (CVAA 2022), 126130Q (2023) https://doi.org/10.1117/12.2673220
The implementation of the "double reduction" and "double increase" policy has been reflected in the physical education examination. Physical education exams add more difficult sports, such as basketball and soccer, which are more difficult than traditional sports, require higher posture, and require professional guidance. However, there is a shortage of physical education personnel in China, which is far from meeting the needs of schools. In response to the above problem this book proposes, a sports posture assessment system based on open pose algorithm. By comparing the assessor's pose with the standard pose, the matching of key frames is performed, and the method of calculating vector similarity is used to define the standardized evaluation criteria of sports pose for differentiated comparison. The system is proven to be effective in evaluating the motion stance and providing users with improvement suggestions to improve their performance.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume International Conference on Computer Vision, Application, and Algorithm (CVAA 2022), 126130R (2023) https://doi.org/10.1117/12.2673685
Diabetes is a very common disease, and many people in the world are poisoned by it. Diabetes will also cause some complication, such as diabetic foot, diabetic retinopathy, and even permanent loss of vision. At present, the most effective way to fight diabetes is early detection and early intervention. Restricting sugar intake as early as possible is the most economical approach. However, early diabetes often has no obvious symptoms, so it is very time-consuming and laborious for doctors to check one by one. Therefore, it is particularly important for an automated algorithm to assess the risk of diabetes change. This paper uses two machine learning methods and an artificial neuron network, popular frameworks to handle big data and built by python toolkits, to automatically make predictions of diabetes, and compares their performances of them in order to provide some insights into future diabetes prevention and treatments. The machine learning models in the paper are Support Vector Machine (SVM) and GaussianNB, both of which possess relatively good interpretability and classification ability. The deep learning model is a classical artificial neural network (ANN). The result shown by confusion matrices presents that the artificial neuron network has the best accuracy in prediction among the three models with a limited dataset, which is 0.736, and the SVM and GaussianNB are 0.72 and 0.735 respectively.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Wenlei Chi, Chenhong Zheng, Cong Zeng, Ying Wang, Meihua Zou
Proceedings Volume International Conference on Computer Vision, Application, and Algorithm (CVAA 2022), 126130S (2023) https://doi.org/10.1117/12.2673520
This paper designs a multidimensional visualization framework for business management resource input. The architecture is divided into four layers: database layer, support layer, service layer and client layer. Integrate the input data of multi-dimensional business operation management resources to achieve three kinds of processing, portrait and in-depth analysis of resource elements. According to the analysis results, a resource input visualization panel is designed to realize the visualization of the result display, and a power management chart is established to clarify the changes in the input and output of all elements of power grid enterprises. The results show that the comprehensive evaluation index of visual display effect designed in this paper is between 4 and 5, and the application performance is good.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume International Conference on Computer Vision, Application, and Algorithm (CVAA 2022), 126130T (2023) https://doi.org/10.1117/12.2673485
The existing supervised learning methods can only use labelled samples to train the classifier, which is difficult and costly to obtain labels. To solve the problem and enhance the effectiveness of intrusion detection models, a semi-supervised learning method is proposed in this study in terms of intrusion detection based on Fuzzy-Long Short-Term Memory (Fuzzy-LSTM). The model uses long short-term memory to generate labels for unlabeled samples, while classifying samples based on fuzzy entropy. The low fuzzy entropy samples from them were merged into the original training set, and the classifier was trained again. The results showed that the proposed model had the accuracy of 84.53% for the above data sets, 2.45% higher than that of the classical CNN-BiLSTM, respectively, and the improvement of the detection accuracy for a few classes of samples was significant.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Advanced Algorithm and Geometric Calculation Model
Proceedings Volume International Conference on Computer Vision, Application, and Algorithm (CVAA 2022), 126130U (2023) https://doi.org/10.1117/12.2673321
Word embedding is widely used in various downstream tasks in the field of Natural Language Processing (NLP). Recent studies reveal that the embedding models trained by corpus contains gender bias in society. In this paper, we propose a new Go/Not-go Embedding Association Test (GNEAT) which is mainly used to analyze gender bias in embedding models by the principle of variance analysis. It is used to calculate the gender bias of a single group of target words other than two groups of words together and also to analyze the interaction between the two groups of target words which is important to analyze effect between different words. In addition, we verify that the projection test, Word Embedding Association Test (WEAT) and clustering analysis method are also applicable in Chinese embedding models and gender bias exist in Chinese embedding models which makes up for the lack of research and calculation of gender bias in Chinese embedding models. The results show that there is gender bias introduced by corpus training in Chinese embedding models. GNEAT method is able to measure the gender bias of a single group of target words and analyze the interaction effect of two groups of target words which is more flexible and comprehensive in measurement.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume International Conference on Computer Vision, Application, and Algorithm (CVAA 2022), 126130V (2023) https://doi.org/10.1117/12.2673240
Capsule network has shown excellent performance in many fields, but there is not much exploration of capsule network in face detection and recognition. For traditional CNN on face detection identification problem such as characteristic information lost and too dependent on sample, a capsule network with an iterative routing mechanism is used and improved upon it. Based on this, we introduce a two-layer feature extraction layer, and then add a spatial attention mechanism in the pooling process, so that the network pays more attention to the contour information of the face and retains the features of key areas of the face to the greatest extent. In order to increase the accuracy of the recognition algorithm, the open-source tool called “MaskTheFace” is employed to create a dataset of one-to-one correspondence between faces and occluded faces. Through the experiment, it can be seen that the improved algorithm has better performance than the traditional algorithm. Although the study is exploratory, it provides some insights into masked face identification.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume International Conference on Computer Vision, Application, and Algorithm (CVAA 2022), 126130W (2023) https://doi.org/10.1117/12.2673329
Aiming at the problem that traditional image rendering methods are time-consuming and complex and cannot meet the application scenarios of modern design. In this paper, the depth self-encoder and capsule network in artificial intelligence (AI) technology are used to study the automatic rendering of images. Firstly, Scharr filter is used to reconstruct the image in Stack Capsule Autoencoder (SCAE) model to enhance the accuracy of image target detection and reduce the loss of reconstructed image. Then the loss function of the model is improved. Finally, in the Modified National Institute of Standards and Technology (MNIST) data set, Canadian Institute for Advanced Research-10 (CIFAR-10) data set, the performance of the improved model is tested on the Canadian Institute for Advanced Research-100 (CIFAR-100) data set and the data set made by the author. The test results show that the accuracy of the improved model is higher than that of the traditional K-means, Autoencoder network (AE) unsupervised algorithm and the improved depth clustering unsupervised model. The stacked capsule self-encoder using Scharr filter can make the classification effect of the model more accurate. The improved algorithm in this paper provides some references for improving the efficiency of image rendering in virtual environment.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume International Conference on Computer Vision, Application, and Algorithm (CVAA 2022), 126130X (2023) https://doi.org/10.1117/12.2673211
At present, in the analysis task of hundreds of video streams, the general processing method is to pull the video streams to the data center and apply a graphics processing unit (GPU) cluster for centralized analysis and processing. However, the cost and power consumption of establishing GPU clusters are relatively high, and GPU clusters are not good at processing large quantities of video streams. For the above scenarios, this paper proposed an edge real-time video analysis system based on distributed Artificial Intelligence (AI) cluster. A distributed software platform based on distributed Advanced RISC Machine (ARM) cluster and Akka cluster technology was established. It achieves distributed cluster state management, two-level horizontal capacity expansion, three-level task scheduling mechanism and complete data link processing. The core functions of the whole system include distributed cluster state management, two-level horizontal expansion, etc. The three-level task scheduling mechanism and the processing of complete data links are realized. RK3399 Pro was applied as the carrier of real-time computing. The measured data shows that compared with the embedded single board, the system can improve the AI computing speed by more than 20 times, which is comparable to the computing effect of GPU. It is suitable for the application scenarios of recalculation power such as real-time AI computing.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume International Conference on Computer Vision, Application, and Algorithm (CVAA 2022), 126130Y (2023) https://doi.org/10.1117/12.2673738
With the Covid-19's effects on the economy, corporate earnings have decreased, and job seekers have experienced increased pressure. Individual job applicants often need to spend a considerable amount of time and energy to research and their ability to match the specific salary of the position, which is time-consuming and laborious, but also because of the collection of incomplete information and lead to wrong salary expectations. Therefore, an automated algorithm for salary evaluation is particularly important. In this paper, the salary prediction model is studied mainly from three aspects: MLP model, RF model and GBDT model, analyzing the characteristics of each model and making predictions, and finally comparing various evaluation metrics of the three models, it is concluded that the GBDT algorithm works best, and finally the GBDT algorithm is chosen to build a prediction algorithm for salary prediction. The GBDT algorithm designed for the prediction of salary can provide a reasonable and scientific reference basis for recruiters to publish job information and job seekers to search for suitable positions, which can greatly improve the success rate of recruitment and job searching.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume International Conference on Computer Vision, Application, and Algorithm (CVAA 2022), 126130Z (2023) https://doi.org/10.1117/12.2673248
As the number of the measurement target sensors continues to grow and complicated, and the original track data association algorithm is complexity of the target increases, the process of track association is gradually difficult to meet the actual demand, so it is important to further analyze the multiple target tracking data association algorithm. Machine learning has become a research hotspot in various fields in recent years. In the field of data association, machine learning algorithms can also be used to achieve the processing of track data. Machine learning algorithms can effectively improve the correct association rate compared to traditional methods. Firstly, the basic content of the track data association problem is described; secondly, two common types of research under the multiple target tracking data association problems based on machine learning are analyzed and summarized; after that, the future development trend is predicted given the current research status; finally, the overall research content is summarized.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume International Conference on Computer Vision, Application, and Algorithm (CVAA 2022), 1261310 (2023) https://doi.org/10.1117/12.2673280
We proposed a hybrid digital watermarking scheme in wavelet domain with double encryption of silicon photonic microcavity chaotic encryption and Arnold transform. In particular, the SPM-Chaos is four-dimensional dynamic chaos generated by silicon-based photonic crystal optomechanical microcavity, which brings higher randomness and larger keyspace. In this article, the hybrid encryption of SPM-Chaos and Arnold transform can greatly enhance security. Meanwhile, using DWT due to wavelet’s time-frequency localization and multi-resolution characteristics which are applicable for image processing. The experiment results show that proposed scheme has good visual invisibility, and robustness against various attacks such as noise, filter, rotation and compression.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume International Conference on Computer Vision, Application, and Algorithm (CVAA 2022), 1261311 (2023) https://doi.org/10.1117/12.2673750
Stroke is an acute cerebrovascular disease with high morbidity, mortality, and disability rate. The survivors could suffer from long-term motor impairment. Due to the persistent lack of effective treatments, prevention is currently considered the best measure. In this study, two objectives are investigated. The former attempts to analyze which characteristics would cause people prone to have a stroke. The latter explores machine learning algorithms with satisfactory performances for stroke risk prediction. To achieve this goal, five machine learning algorithms are validated, including K-Nearest Neighbors (KNN), Logistic Regression (LR), Support Vector Machine (SVM), Random Forest (RF), and XGBoost (XGB). Afterward, an ensemble learning algorithm, namely weighed voting algorithm is implemented to further improve performances. Various metrics are leveraged to comprehensively evaluate the results, including accuracy, precision, recall, F1-score, and the Area Under Curve (AUC). Finally, an acceptable result is achieved with an AUC of 0.711 and a recall of 57.9%. This work demonstrates that the ensemble learning method could improve performances and be further exploited as a reliable classifier for stroke prediction.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume International Conference on Computer Vision, Application, and Algorithm (CVAA 2022), 1261312 (2023) https://doi.org/10.1117/12.2673317
Visual SLAM systems are well established and the use of point clouds as a data source is also increasing. There are also higher demands on the registration of point cloud data. Traditional ICP algorithms are prone to fall into the problem of local optimal solutions in 3D point cloud data registration. To this end, the ICP algorithm was further improved by optimizing the objective function and reducing the density of the point cloud, with the aim of improving the accuracy and speed of point cloud registration.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume International Conference on Computer Vision, Application, and Algorithm (CVAA 2022), 1261313 (2023) https://doi.org/10.1117/12.2673307
It is known that one-dimensional variable-sized bin packing problem and network optimization problem are classical combinatorial optimization problems. Inspired by this, we consider a new problem of variable-sized materials constructing some tree-form structures in undirected graph, where all edges spliced in such a tree-form structure are assembled from some pieces of m types of materials. More precisely, that is defined as follows: given a weighted graph G = (V, E; w; b), a tree-form structure S and some pieces of m types of materials, where w : E → R+ is a length function , b : E → R+ is a cost function , we will attempt to construct a subgraph G' from G , having the structure S , such that each edge in G' is further constructed by given m types of materials , the new objective is to minimize the total cost to construct subgraph G' , where the total cost is sum of the cost to purchase materials and the cost to construct all edges in such a subgraph G'. For this new problem, we obtained the following two main results. (1) When the structure S is a spanning tree, we design a two- approximation algorithm to solve the problem of variable-sized materials constructing spanning tree. (2) When the structure S is a single-source shortest paths tree, we design a two- approximation algorithm to solve the problem of variable-sized materials constructing single-source shortest paths tree.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume International Conference on Computer Vision, Application, and Algorithm (CVAA 2022), 1261314 (2023) https://doi.org/10.1117/12.2673228
Image segmentation technology is an important branch of visual understanding system, with contributing to promote the accuracy of classification. In recent years, the integration of deep learning and image processing has produced a new generation of image segmentation algorithms. Compared with traditional methods, the performance has been improved and reached high accuracy. This work mainly introduces traditional or based on deep learning and compares the performance of related algorithms. To solve the problems of existing algorithms, the future development trend is prospected.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume International Conference on Computer Vision, Application, and Algorithm (CVAA 2022), 1261315 (2023) https://doi.org/10.1117/12.2673698
Weather forecasting of great importance in people's daily lives as global warming occurs and more and more people receive the effects of high temperatures, with some even suffering chronic illnesses or dying as a result of the heat. In this research, Linear Regression and Neural Network (LSTM) were used to set up a model for the summer temperatures in Seoul, Korea from 2013 to 2017 and some auxiliary variables, with the average temperature, solar radiation, average relative humidity, average wind speed and average latent heat flux of the day as inputs and the average temperature of the following day as outputs, by running two models with The two models are compared and the best model is selected for weather prediction using evaluation criteria such as mean absolute error. The results showed that the mean absolute error of Linear Regression was 0.89 and that of LSTM was 0.19, and that the LSTM model generalize well than Linear Regression and no overfitting problems were observed after outputting Loss Curve. Overfitting and other related problems.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume International Conference on Computer Vision, Application, and Algorithm (CVAA 2022), 1261316 (2023) https://doi.org/10.1117/12.2673276
As an important research field of machine learning, clustering aims to partition the samples of unsupervised data into clusters. Traditional clustering methods, like K-means, are concise and simple to be used. But they have very limited performance on various real-world data. Recently, clustering methods based on neural networks have been proposed to improve the processing of complicated and large-scale data. However, these methods are time-consuming and still difficult to extract effective features, which limits this application. Besides, most of the clustering methods cannot be applied to online clustering. To solve these problems, we propose Representation Contrast Clustering (RCC) method in this paper. To enhance the ability to extract features, we propose to introduce contrast learning into clustering, which makes it possible to extract clustering-friendly and effective features from complex data. Without labels, contrast learning is comparable to supervised learning in extracting features. Moreover, it designed a “pre-training & fine-tuning” structure, which can save clustering time and be used for online clustering. In the pre-training phase, a contrast learning framework combining data augmentation and neural networks is used to extract a cluster-friendly representation. In the fine-tuning phase, the representations are clustered by the strategy of “label-as-representation”. Experimental results show our proposed RCC method achieves state-of-the-art performance on most datasets. For example, RCC’s NMI of 0.764 on the cifar10 dataset exceeds the known best methods by 6 percentage points, while the ACC of 0.855 is at least 6 percentage points higher than others. In summary, the RCC method does not require data integrity in either pre-training or fine-tuning stages, so RCC can run on large data and be used for online clustering. This method is very efficient and requires very little training time for a downstream task, i.e. clustering after pre-training is completed, which also achieves state-of-the-art performance on most datasets in the clustering experiments.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume International Conference on Computer Vision, Application, and Algorithm (CVAA 2022), 1261317 (2023) https://doi.org/10.1117/12.2673524
In recent years, the world's attention has made UAV surveillance a vital tool for combat reconnaissance. Target detection has taken the place of manual interpretation and has grown into a significant factor limiting UAV reconnaissance. Therefore, it is essential for battlefield reconnaissance to figure out how to increase the precision and speed of target detection. The fundamental issue addressed in this work is enhancing target detection accuracy while maintaining speed. YOLOv3 is a popular network structure in the industry because it is quick and precise compared to other network architectures. The attention mechanism, on the other hand, has a better detection impact on small and medium stimuli, whereas it just has a general detection effect on larger targets. The attention mechanism is implemented in YOLOv3 in this study.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume International Conference on Computer Vision, Application, and Algorithm (CVAA 2022), 1261318 (2023) https://doi.org/10.1117/12.2673277
In this article, aiming at the defects of Tuna Swarm Optimization algorithm (TSO), which is easy to fall into local optimum and slow convergence speed, an improved tuna swarm optimization algorithm based on Levy flight (LTSO) is proposed. This algorithm updates the position of the optimal individual and its followers in the tuna swarm by using the characteristics of the long and short jump of Levy flight. This method can effectively enhance the global exploration effect of the algorithm and avoid the algorithm being attracted by local extreme values. In this paper, 12 benchmark functions are used to design two groups of experiments to test the performance of LTSO on high-dimensional and low-dimensional problems respectively, and then compared with TSO and other popular algorithms. It can be seen from the test results that the speed of finding the optimal value, the accuracy of the answer and the robustness of the algorithm are better than those of TSO and other algorithms. In addition, the test results of these algorithms in these benchmark functions are analyzed using Wilcoxon statistical method and Friedman method. The statistical analysis results indicate that LTSO has significant advantages over other algorithms.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume International Conference on Computer Vision, Application, and Algorithm (CVAA 2022), 1261319 (2023) https://doi.org/10.1117/12.2673584
Taxis are part of the important components of urban transportation, and the demand for taxis is essential to build an efficient transportation system in smart cities. Accurate taxi demand forecasts can guide vehicle scheduling, improve vehicle utilization, ease traffic congestion, and improve passenger ride experience. Aiming at the complex spatiotemporal and spatial dependence of taxi demand, how to accurately predict taxi demand is a current research hotspot. This paper proposes a new taxi usage demand forecasting model, namely CGRU model, which uses spectral domain graph convolutional networks(ChebNet) to encode the topology of taxi usage requirements to obtain topological correlations, while modelling spatial correlations with reference to usage requirements of other regions of functional similarity in that region, using gated recurrent units(GRU) model the temporal correlation, combine the spatial correlation with the temporal correlation, and complete the analysis of the temporal and spatial correlation of the taxi demand. The proposed model is assessed on the NYCTAXI_DYNA open-source dataset, and the results show that the CGRU model outperforms the baseline model on evaluation metrics such as MAE and RMSE.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.