PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.
This PDF file contains the front matter associated with SPIE Proceedings Volume 12742, including the Title Page, Copyright information, Table of Contents and Conference Committee list.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The rapid evolution of AI technology is well on its way to being able to support and even replace humans in various tasks. Exploiting the application potential, with simultaneous mitigation of the (systemic) risks to our society is proving to be a major challenge. How do we tell (autonomous) AI systems effectively what to do? How do we make them function in a responsible and morally acceptable way? To what extent can we demand explainability?
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Recent developments in chip technology have led to the integration of smart Artificial Intelligence (AI) capabilities in surveillance cameras such as vehicle and pedestrian detection, license plate recognition, and face detection (FD). FD cameras are being widely deployed in various public locations to identify wanted individuals. However, the large size of identity databases and the need for personal information security make it necessary to perform face recognition (FR) on servers rather than on cameras. On the other hand, more FD cameras and detected faces lead to extensive power consumption and computational power demand for real-time FR along with large storage devices to store face images. While FD cameras track detected faces and select the best quality face for FR, multiple faces belonging to the same identities might be sent to the servers due to track algorithms or reentering of identity to the scene. Therefore, we propose a method for finding similar faces belonging to the same identities and removing low-quality faces to keep only the highest-quality face in the server for storage and FR. We utilize facial embedding vectors, obtained from aligned faces using an FR model, and store them in a database along with face image information such as face capture time, camera IP, and face quality score. If a face has similar faces in the database, the highest-quality face is kept in the database and others are removed. As a result, our proposed method eliminates extra face images and keeps the highest-quality face images for high performance, efficient FR, and effective storage.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this study, we analyze the effect of color and texture information on person re-identification models. Identifying a person among different cameras involves several problems such as camera viewpoint and illumination changes, background dissimilarity, color tone as well as human pose changes. Thus, the performance of person re-identification strongly depends on the scenarios and datasets that are published for research. No matter how much convolutional neural network models (CNNs) improve, the limitations are valid since they depend on the nature of the problem. We classify person re-identification as a matching problem and focus on the effects of color and texture on the similarity scores of actual and false matches. The detailed analyses indicate that color is the most dominant cue for person re-identification and color constancy is vital to perform robust re-identification among different cameras. Besides, texture has less effect compared to color. According to these observations, we advise using color augmentations along the training stage of CNNs for re-identification problems.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Federated learning (FL) is a hot research topic enabling training on databases of multiple organizations while preserving the privacy of people whose personal data is stored in the databases. FL supports the sharing of trained machine-learning (ML) models between different organizations without sharing personal data. This is important for many security applications – including video surveillance and document authentication – because access to more data leads to better performance. Over the last years, many papers proposed FL frameworks, but most lack at least one of the following aspects: open-source availability, flexibility in decentralized topology, flexibility in using ML frameworks (e.g., PyTorch), real deployment (not only simulation), and results on multiple computer-vision (CV) tasks. In this paper, we give an overview of existing FL frameworks to assess these aspects. Furthermore, we implemented various CV tasks in a federated way and describe the implementation in the paper. This does not only include a small-scale image classification task, but also more challenging CV tasks, such as object detection, semantic segmentation, and person re-identification. Experiments were performed and the results show that models that are trained with privacy-preserving FL perform much better than the baseline with access to only a subset of the data and reach performance close to the upper limit with access to all data.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This study presents a method for vehicle color recognition using instance segmentation, a powerful tool for image analysis. Vehicle color information is a crucial part of city surveillance, however, extracting the color is a challenging task due to high levels of occlusion and the reflective nature of the vehicles. This method uses a state-of-the-art instance segmentation algorithm to differentiate overlapped objects and then selects informative pixels in terms of the object color. The results indicate taking advantage of an instance segmentation method increases color recognition performance.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Visual indexing, or the ability to search and analyze visual media such as images and videos, is important for law enforcement agencies because it can speed up criminal investigations. As more and more visual media is created and shared online, the ability to effectively search and analyze this data becomes increasingly important for investigators to do their job effectively. The major challenges for video captioning include accurately recognizing the objects and activities in the image, understanding their relationships and context, generating natural and descriptive language, and ensuring the captions are relevant and useful. Near real-time processing is also required in order to facilitate agile forensic decision making and prompt triage, hand-over and reduction of the amount of data to be processed by investigators or subsequent processing tools. This paper presents a captioning-driven efficient video analytic which is able to extract accurate descriptions of images and videos files. The proposed approach includes a temporal segmentation technique providing the most relevant frames. Subsequently, an image captioning approach has been specialized to describe visual media related to counter-terrorism and cybercrime for each relevant frame. Our proposed method achieves high consistency and correlation with human summary on SumMe dataset, outperforming previous similar methods.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Accurate localization and recognition of objects in the three dimensional (3D) space can be useful in security and defence applications such as scene monitoring and surveillance. A main challenge in 3D object localization is to find the depth location of objects. We demonstrate here the use of a camera array with computational integral imaging to estimate depth locations of objects detected and classified in a two-dimensional (2D) image. Following an initial 2D object detection in the scene using a pre-trained deep learning model, a computational integral imaging is employed within the detected objects’ bounding boxes, and by a straightforward blur measure analysis, we estimate the objects’ depth locations.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Emergency responders attending an incident can inadvertently encounter explosive materials that can put their safety and that of the general public at risk. Ensuring that teams have suitable detection equipment is of vital importance to mitigate the impact of the incident and assist in the early return to normality. This study examines the use of Spatially Offset Raman Spectroscopy (SORS) technology for rapid identification of a wide range of explosives, improvised explosive precursors and flash powders either directly or through a range of barriers, with the potential to improve safety, efficiency and critical decision making in incident management and search operations.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The Underwater Internet of Things (UIoT) represents an area of research that holds significant interest among scientific, defence, and industrial communities. The automation of subsea applications is a key focus to safeguard assets in this domain. Visible Light Communications (VLC) shows promise in overcoming the constraints inherent in acoustics, particularly in specific environmental conditions. This project proposes the implementation of a Software Defined Network (SDN) capable of wirelessly transmitting video from source to sink, irrespective of local conditions. This is achieved by incorporating a wired optical connection between the source nodes located on the seafloor, thus providing an additional communication option. This wired connection enables the nodes to bypass obstacles such as suspended particulate matter, objects, or excessive ambient noise in the water, which obstruct VLC communications, and instead transmit the data from an alternate location. To determine whether wireless transmission is viable, a fuzzy logic controller/edge routing system assesses the environmental conditions and based on the collected data, selects the most suitable route. Through MATLAB modelling, it has been demonstrated that this SDN has the potential to reliably transmit video and other forms of data while minimizing energy consumption by eliminating the need for an acoustic communication link.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In the ambit of the computer vision, the moving object detection is an extremely important topic which has drawn the interest of the scientific community. Recently, an emerging dimensionality reduction technique, called Dynamic Mode Decomposition (DMD), has been exploited to make an estimation of the background. The DMD is a pure data-driven technique which provides information about the spatial and temporal evolution of the input video. The main idea behind the usage of the DMD is the possibility of isolating the modes that addresses the background in order to obtain the signal associated with the target by subtraction. In the practice, the DMD produces a unimodal representation of the background, which provides good results under the assumptions that the background is quasi-static, the foreground objects are small and their motion is fast. The objective of this study is to verify the applicability of the DMD in the case of InfraRed videos of maritime scenarios with extended naval targets. In this context, the foreground is neither small, nor fast. To face that problem, we propose a spatial-multiscale approach which slightly improves the detection accuracy of the DMD-based detector. The proposed approach has been tested on a real dataset collected under real operational conditions, during an experimental activity lead by the NATO STO-CMRE in February 2022 in Portovenere (Italy). The performance has been evaluated in terms of precision and recall and has been compared to other state-of-the-art moving target detection algorithms.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Vision-based object tracking is crucial for both civil and military applications. A range of hazards to cyber safety, vital infrastructure, and public privacy are posed by the rise of drones, or unmanned aerial vehicles (UAV). As a result, identifying and tracking suspicious drones/UAV is a serious challenge and that has attracted strong research attention recently. The focus of this research is to develop a new virtual coloured marker based tracking algorithm for estimating the posture of the detected object. After detection, the developed algorithm initiates by determining the coloured area of the detected object as reference-contour. Followed by a Virtual-Bounding Box (V-BB) created over the reference-contour by meeting the minimum area of contour criteria. Additionally, a Virtual Dynamic Crossline with a Virtual Static Graph (VDC-VSG) was developed to track the movement of V-BB, which is considered as virtual coloured marker, helps to estimate the pose of the detected object during observations. Moreover, V-BB helps to avoid ambient illumination-related difficulties during tracking process. A significant number of aerial sequences, including benchmark footage, were tested using the proposed approach, and the outputs were highly encouraging, with satisfactory results. Potential applications of the proposed method includes object detection and analysis applied to the field of security and defence.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Center points are commonly the results of anchor-free object detectors. Starting from this initial representation, a regression scheme is utilized to determine a target point set to capture object properties such as enclosing bounding boxes and further attributes such as class labels.
When only trained for the detection tasks, the encoded center point feature representations are not well suited for tracking objects since the embedded features are not stable over time.
To tackle this problem, we present an approach of joint detection and feature embedding for multiple object tracking. The proposed approach applies an anchor-free detection model to pairs of images to extract single-point feature representations. To generate temporal stable features which are suitable for track association across short time intervals, auxiliary losses are applied to reduce the distance of tracked identities in the embedded feature space.
The abilities of the presented approach are demonstrated on real-world data reflecting prototypical object tracking scenarios.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Logos on clothing are sometimes one of the crucial clues to find a suspect in surveillance video. Automatic logo detection is important during investigations to perform the search as quickly as possible. This can be done immediately after an incident on live camera streams or retrospectively on large video datasets from criminal investigations for forensic purposes. It is common to train an object detector with many examples on a logo dataset to perform logo detection. To obtain good performance, the logo dataset must be large. However, it is time-consuming and difficult to obtain a large training set with realistic annotated images. In this paper, we propose a novel approach for logo detection that requires only one logo image (or a few images) to train a deep neural network. The approach consists of two main steps: data generation and logo detection. In the first step, the logo image is artificially blended in a person re-identification dataset to generate an anonymized synthetic dataset with logos on clothing. Various augmentation steps appeared to be necessary to reach a good performance. In the second step, an object detector is trained on the synthetic dataset, subsequently providing detections on recorded images, video files, and live streams. The results consist of a quantitative assessment based on an ablation study of the augmentation steps and a qualitative assessment from end users that tested the tool.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Threats posed by drones urge defence sectors worldwide to develop drone detection systems. Visible-light and infrared cameras complement other sensors in detecting and identifying drones. Application of Convolutional Neural Networks (CNNs), such as the You Only Look Once (YOLO) algorithm, are known to help detect drones in video footage captured by the cameras quickly, and to robustly differentiate drones from other flying objects such as birds, thus avoiding false positives. However, using still video frames for training the CNN may lead to low drone-background contrast when it is flying in front of clutter, and omission of useful temporal data such as the flight trajectory. This deteriorates the drone detection performance, especially when the distance to the target increases. This work proposes to pre-process the video frames using a Bio-Inspired Vision (BIV) model of insects, and to concatenate the pre-processed video frame with the still frame as input for the CNN. The BIV model uses information from preceding frames to enhance the moving target-to-background contrast and embody the target’s recent trajectory in the input frames. An open benchmark dataset containing infrared videos of small drones (< 25 kg) and other flying objects is used to train and test the proposed methodology. Results show that, at a high sensor-to-target distance, the YOLO algorithms trained on BIV-processed frames and concatenation of the BIV-processed frames with still frames increase the Average Precision (AP) to 0.92 and 0.88, respectively, compared to 0.83 when it is trained on still frames alone.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Mastering the content quality, diversity and context representativeness of a database is a key step to efficiently trained deep learning models. This project aims at controlling the relevant hyper-parameters of training datasets to be able to guarantee a mission performance and to contribute to the explainability of the model behavior. In this presentation, we show an approach to design DRI (Detection, Recognition and Identification) algorithms of military vehicles with different acquisition sources. Starting from a definition of a mission-agnostic image database, this study is focused on controlled image acquisition sources which automates the collection of few but relevant object signatures and their metadata e.g. bounding box, segmentation mask, view angles, object orientations, lighting conditions... By putting the accent on the acquisition of a reduced amount of images coupled with data augmentation technics, it is foreseen to demonstrate a dataset creation method that is fast, efficient, controlled and easily adaptable to new mission scenarios and contexts. This study compares three different sources: an optical acquisition bench of scaled vehicle model, the 3D scanning of scaled models and 3D graphic model of vehicles. The challenge is to make predictions on real situations with a neural network model only trained with the generated images. First results obtained with the datasets extracted from the 3D environment with graphic models and with scanned scaled ones are not yet reaching the previous performance levels obtained with 2D acquisition bench. Further investigations are needed to understand the influence of the numerous hyper-parameters introduced by the 3D environment simulation.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Automated object detection is becoming more relevant in a wide variety of applications in the military domain. This includes the detection of drones, ships, and vehicles in video and IR video. In recent years, deep learning based object detection methods, such as YOLO, have shown to be promising in many applications for object detection. However, current methods have limited success when objects of interest are small in number of pixels, e.g. objects far away or small objects closer by. This is important, since accurate small object detection translates to early detection and the earlier an object is detected the more time is available for action. In this study, we investigate novel image analysis techniques that are designed to address some of the challenges of (very) small object detection by taking into account temporal information. We implement six methods, of which three are based on deep learning and use the temporal context of a set of frames within a video. The methods consider neighboring frames when detecting objects, either by stacking them as additional channels or by considering difference maps. We compare these spatio-temporal deep learning methods with YOLO-v8 that only considers single frames and two traditional moving object detection methods. Evaluation is done on a set of videos that encompasses a wide variety of challenges, including various objects, scenes, and acquisition conditions to show real-world performance.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
An integrated system for processing sensor data has been developed based on novel variational autoencoder (VAE) algorithms with explainability that significantly eases analysis of sensor data. By continuously updating a generative model of the data, the system assists users with minimal artificial intelligence (AI) training or experience to perform data analysis. The system performs an extensive range of integrated machine learning (ML) tasks: anomaly detection, active learning, model-drift detection, synthetic data generation, semi-supervised classification, and counterfactual explanation generation. When the system is provided a data schema (map of Booleans, integers, reals, categories, time series, etc.) and data set, it automatically forms a preliminary generative model of the data. The construction of the system is modular, so new data types can be added as necessary. Counterfactually explainable anomaly detection is immediately performed via sparse gradient search. This informs the user how to interactively remove or repair bad records and/or begin labeling records of interest. The addition of labels to the data allows multi-class, semi-supervised, counterfactually explainable classification via the support vector machine embedded hyperplane algorithm (SVM-EH). Once some labels are added, active learning is used to assist further labeling by suggesting data elements that are highly likely to improve classification accuracy, significantly accelerating the labeling process by trading human effort for computational cycles. In production, the system detects when its training is becoming stale and requests retraining.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
A new deep learning algorithm for performing anomaly detection and multi-class classification with explainability using counterfactuals is described. The system is a Variational Autoencoder (VAE) with a modified loss function and new methods for counterfactual identification. An additional hinge-loss term is added to VAE training. This enables convenient synthetic data generation and allows straightforward construction of multi-class counterfactuals. Counterfactuals are synthetic data generated to explain system decisions by answering the question: “If this data was not anomalous or was in another class, what modifications would need to be made?” To determine counterfactuals, a path is determined through the embedding space via adversarial attack-like techniques to minimize reconstruction error, with the restriction of minimally altering the number of columns changed. Large changes are allowed, unlike adversarial attack approaches, so changes are isolated and easily visible. Anomaly detection is performed by modifying a result to lower its anomalousness. Classification changes are performed by modifying the data to another class. Multi-class classification is performed on the embedding space of the VAE via an attached linear support vector machine (SVM). By adding the hinge-loss term to the VAE embedding training as well as the SVM, the embedding is modified to prefer class separation without being informed of the specific class labels. This causes the classes in the embedding space to be separated by hyperplanes, making counterfactual generation convenient and SVM classification accurate. Accuracy is shown to be comparable to other deep learners. Approaches to accommodating the image and time-series data are discussed.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Electro-optical (EO) sensors are essential for surveillance in military and security applications. Recent technological advancements, especially the developments in Deep Learning (DL), have enabled improved object detection and tracking in complex and dynamic environments. Most of this research focuses on readily available visible light (VIS) images. To apply these technologies for Thermal infrared (TIR) imagery, DL networks can be retrained using image data in the TIR domain. However, such a training set with enough samples is not easily available. This paper presents an unsupervised domain adaptation method for ship detection in TIR imagery using paired VIS and TIR images. The proposed method leverages on the pairing of VIS and TIR images and performs domain adaptation using detections in the VIS imagery as ground-truth to provide data for the TIR domain learning. The method performs ship detection from the VIS images using a pretrained convolutional neural network (CNN). These detections are subsequently improved using a tracking algorithm. The proposed TIR object detection model follows a two-stage training process. In the first stage, the model's head is trained, which consists of the regression layers that output the bounding boxes of the detected objects. In the second stage, the model's feature extractor is trained to learn more discriminative features. The method is evaluated on a dataset of recordings at Rotterdam harbor. Experiments demonstrate that the resulting TIR detector performs comparably with its VIS counterpart, in addition to providing reliable detections in adverse environmental conditions where VIS model fails. The proposed method has significant potential for real-world applications, including maritime surveillance.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
antic segmentation of aerial images is a critical task in various domains such as urban planning, and monitoring deforestation or critical infrastructure. However, the annotation process required for training accurate segmentation models is often time-consuming and labor-intensive. This paper presents a novel approach to address this challenge by leveraging the power of clustering techniques applied to the embeddings obtained from a SimCLRv2 model pretrained on the ImageNet dataset. By using this clustering approach, fewer training samples are needed, and the annotation only needs to be done for each cluster instead of each pixel in the image, significantly reducing the annotation time. Our proposed method uses SimCLRv2 to obtain rich feature representations (embeddings) from a dataset of unlabeled aerial images. These embeddings are then subjected to clustering, enabling the grouping of semantically similar image regions. In addition to directly using these clusters as class labels, we can treat these clusters as pseudo-classes, allowing us to construct a pseudo-label dataset for fine-tuning a segmentation network. Through experiments conducted on two benchmark aerial image datasets (Potsdam and Vaihingen), we demonstrate the effectiveness of our approach in achieving segmentation results in line with similar works on few-shot segmentation while significantly reducing the annotation effort required, thereby highlighting its practical applicability. Overall, the combination of SimCLRv2 embeddings and clustering techniques presents a promising avenue for achieving accurate image segmentation while minimizing the annotation burden, making it highly relevant for remote sensing applications and aerial imagery analysis.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Unmanned systems are deployed currently more and more in many defense and security applications. A main challenge is use these systems more autonomous in the open world, while keeping meaningful human control. Main elements to achieve this are planning potential actions, developing situational awareness and providing self-awareness of the system. In this talk, we discuss this challenge for a single unmanned system and a group of unmanned systems with respect to imaging sensors and present approaches for the elements needed.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Deep learning has emerged as a powerful tool for image analysis in various fields including the military domain. It has the potential to automate and enhance tasks such as object detection, classification, and tracking. Training images for development of such models are typically scarce, due to the restricted nature of this type of data. Consequently, researchers have focused on using synthetic data for model development, since simulated images are fast to generate and can, in theory, make up a large and diverse data set. When using simulated training data it is important to consider the variety needed to bridge the gap between simulated and real data. So far it is not fully understood what variations are important and how much variation is needed. In this study, we investigate the effect of simulation variety. We do so for the development of a deep learning-based military vehicle detector that is evaluated on real-world images of military vehicles. To construct the synthetic training data, 3D models of the vehicles are placed in front of diverse background scenes. We experiment with the number of images, background scene variations, 3D model variations, model textures, camera-object distance, and various object rotations. The insight that we gain can be used to prioritize future efforts towards creating synthetic data for deep learning-based object detection models.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
While deep learning is well-established in image analysis with regard to the visual domain, processing thermal infrared data still mainly relies on conventional methods. Regarding particularly change detection, one may even rely on statistical approaches. The reason beyond is the limited quantity of adequate thermal imagery needed to train deep learning models and the challenging characteristics of thermal imagery itself, which, unlike RGB data, is strongly dependent on the underlying materials and the temporal evolution of environmental conditions as well as scene composition. We therefore aim at generating a new dataset of synthetic thermal imagery which is specifically designed to allow the application of deep learning methods in the area of UAV-based reconnaissance by change detection. In this paper, we present our technical approach to generate this dataset and our preliminary results. We limit the simulated changes within the dataset to two objects of interest, i.e. tanks and landmines, and both object types are rather simplified for preliminary testing. We outline the state-of-the-art methodologies of generating synthetic thermal data and verify them with respect to the requirements, which we deduced from a comprehensive literature review on both deep-learning-based change detection and thermal imagery simulation. We find that the given methods do not fully comply with the requirements on thermal training data. Therefore, two datasets are generated: thermal imagery based on standard methods and thermal imagery that meets the requirements. By comparison, our preliminary findings show significant differences in image features which potentially affect the training of deep learning models.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The capabilities of machine learning algorithms for observing image-based scenes and recognizing embedded targets have been demonstrated by data scientists and computer vision engineers. Performant algorithms must be well-trained to complete such a complex task automatically, and this requires a large set of training data on which to base statistical predictions. For electro-optical infrared (EO/IR) remote sensing applications, a substantial image database with suitable variation is necessary. Numerous times of day, sensor perspectives, scene backgrounds, weather conditions and target mission profiles could be included in the training image set to ensure sufficient variety. Acquiring such a diverse image set from measured sources can be a challenge; generating synthetic imagery with appropriate features is possible but must be done with care if robust training is to be accomplished. In this work, MuSES™ and CoTherm™ are used to generate synthetic EO/IR remote sensing imagery of various high-value targets with a range of environmental factors. The impact of simulation choices on image generation and algorithm performance is studied with standard computer vision deep learning convolutional neural networks and a measured imagery benchmark. Differences discovered in the usage and efficacy of synthetic and measured imagery are reported.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The use of deep neural networks (DNNs) is the dominant approach for image classification, as it achieves state-of-the-art performance when sufficiently large training datasets are available. The best DNN performance is reached when test data conditions are similar to the training data conditions. However, if the test conditions differ, there will usually be a loss of classification performance, for instance when the test targets are more distant, blurry or occluded than those observed in the training data. It is desirable to have an estimate of the expected classification performance prior to using a DNN in practice. A low expected performance may deem the DNN unsuitable for the operational task at hand. While the effect on classification performance of a single changed test condition has been investigated before, this paper studies the combined effect of multiple changed test conditions. In particular, we will compare two prediction models for the estimation of the expected performance compared to the DNN performance on the development data. Our approach allows performance estimation in operation based on knowledge of the expected operational conditions, but without having access to operational data itself. We investigate the aforementioned steps for image classification on the MARVEL vessel dataset and the Stanford Cars dataset. The changing test conditions consist of several common image degradations that are imposed on the original images. We find that the prediction models produce acceptable results in case of small degradations, and when degradations show a constant accuracy falloff over their range.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Fourier-domain correlation approaches have been successful in a variety of image comparison approaches but fail when the scenes, patterns, or objects in the images are distorted. Here, we utilize the sequential training of shallow neural networks on Fourier-preprocessed video to infer 3-D movement. The bio-inspired pipeline learns x, y, and z-direction movement from high-frame-rate, low-resolution, Fourier-domain preprocessed inputs (either cross power spectra or phase correlation data). Our pipeline leverages the high sensitivity of Fourier methods in a manner that is resilient to the parallax distortion of a forward-facing camera. Via sequential training over several path trajectories, models generalize to predict the 3-D movement in unseen trajectory environments. Models with no hidden layer are less accurate initially but converge faster with sequential training over different flightpaths. Our results show important considerations and trade-offs between input data preprocessing (compression) and model complexity (convergence).
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
One of the most important problems encountered by face recognition systems is occlusions. Especially with the COVID-19 pandemic, recognition of masked faces has been attracting interest. There are proposed visible spectrum recognition algorithms, which perform the recognition task from the uncovered part of the face. However, reliability and accuracy of these algorithms does not reach the level of traditional face recognition algorithms, even though they outperform human observers. Furthermore, these approaches do not provide a solution when almost all the face area is covered. In the past, use of millimeter/submillimeter (SMMW/MMW) and terahertz (THz) imaging has been considered as an alternative to visible spectrum systems as an answer to the limitations of visible spectrum systems, such as changing lighting conditions and occlusions. In this study, several performance characteristics of an active THz imaging system operating at 340 GHz are presented for a face recognition approach based on a similarity comparison of THz face images. Here we examine the dynamic range, contrast resolution, spatial resolution, pixel resolution and noise level of the imaging system. Furthermore, analysis results performed on a set of THz images of head areas of 20 individuals are presented. Results indicate that THz imaging is sensitive to facial characteristics through clothing material which can be exploited for a biometric approach based on THz imaging for recognition of concealed faces.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.