PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.
Proceedings Volume Synthetic Data for Artificial Intelligence and Machine Learning: Tools, Techniques, and Applications, 1252901 (2023) https://doi.org/10.1117/12.2690655
This PDF file contains the front matter associated with SPIE Proceedings Volume XXXXX, including the Title Page, Copyright information, Table of Contents, and Conference Committee information.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The Unsung Hero: How Synthetic Data has Helped Computer Vision, Machine Learning, and AI
Proceedings Volume Synthetic Data for Artificial Intelligence and Machine Learning: Tools, Techniques, and Applications, 1252902 https://doi.org/10.1117/12.2666937
This conference presentation was prepared for the Synthetic Data for Artificial Intelligence and Machine Learning: Tools, Techniques, and Applications conference at SPIE Defense + Commercial Sensing 2023
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Synthetic Data for Artificial Intelligence and Machine Learning: Tools, Techniques, and Applications, 1252903 (2023) https://doi.org/10.1117/12.2663879
Deep learning neural networks require large amounts of data to properly train for detection and recognition. Adequate infrared training data is not available especially for military specific applications. This paper describes the Collaborative Research and Development Agreement (CRADA) between U.S. Army Combat Capabilities Development Command Aviation & Missile Center (DEVCOM AvMC) and Lockheed Martin Missiles and Fire Control (LM-MFC) effected since early 2020. The purpose is to collaboratively research and develop a practical understanding of the value in using synthetically generated IR imagery for training modern deep learning algorithms for military applications to perform on field-collected real IR data. This paper details the effort to generate synthetic IR data for algorithm training, the networks and real testing data used for evaluation, and the detection and recognition results achieved. Particular attention was given to creating a set of synthetic IR images having some degree of similarity with a set of real sensor images without purposely trying to synthesize the real sensor images. To that end, target and background models derived or selected based on general thermal conditions present during the real sensor data collections and an empirically derived sensor model was used when creating the synthetic IR images. The synthetic IR image set has been approved for public release and the real IR image set is part of the Defense Systems Information Analysis Center (DSIAC) Automatic Target Recognition Algorithm Development Image Database (ATR ADID).
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Synthetic Data for Artificial Intelligence and Machine Learning: Tools, Techniques, and Applications, 1252904 (2023) https://doi.org/10.1117/12.2665012
In order to achieve state of the art classification and detection performance with modern deep learning approaches, large amounts of labeled data are required. In the infrared (IR) domain, the required quantity of data can be prohibitively expensive and time-consuming to acquire. This makes the generation of synthetic data an attractive alternative. The well-known Unreal Engine (UE) software enables multispectral simulation addon packages to obtain a degree of physical realism, providing a possible avenue for generating such data. However, significant technical challenges remain to design a synthetic IR dataset—varying class, position, object size, and many other factors is critical to achieving a training dataset useful for object detection and classification. In this work we explore these critical axes of variation using standard CNN architectures, evaluating a large UE training set on a real IR validation set, and provide guidelines for variation in many of these critical dimensions for multiple machine learning problems.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Synthetic Data for Artificial Intelligence and Machine Learning: Tools, Techniques, and Applications, 1252905 (2023) https://doi.org/10.1117/12.2657575
Current generation artificial intelligence (AI) is heavily reliant on data and supervised learning (SL). However, dense and accurate truth for SL is often a bottleneck and any imperfections can negatively impact performance and/or result in biases. As a result, several corrective lines of research are being explored, including simulation (SIM). In this article, we discuss fundamental limitations in obtaining truth, both in the physical universe and SIM, and different truth uncertainty modeling strategies are explored. A case study from data-driven monocular vision is provided. These experiments demonstrate performance variability with respect to different truth uncertainty strategies in training and evaluating AI algorithms.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Synthetic Data for Artificial Intelligence and Machine Learning: Tools, Techniques, and Applications, 1252906 (2023) https://doi.org/10.1117/12.2664031
Real imagery and video data from cameras are frequently needed to conduct research and experiments for model development, algorithm training, and more. When collecting real imagery and video with cameras in uncontrolled environments, the environmental signatures can change over time, like temperature and sun angles, and cause the image quality to change in an unpredictable and undesirable manner. Due to the limited availability of military targets, range availability, vast personnel support needed, and the typical high costs associated with conducting data collections in the field, it is imperative that low quality data is not unintentionally collected. Moreover, a need exists to increase automation in order to reduce manpower needed during data collections. To address such issues, this paper describes a software utility incorporating various image quality metrics (IQMs), which can enable automatic monitoring of the quality of collected imagery and video data with less cost and minimal modification of the imaging system. As a part of the utility, an automated alert algorithm based on a majority vote is discussed along with a selection of suitable IQMs according to their characteristics and temporal noise filtering for stable decision making. Design criteria for an optimal performance of the automated alert algorithm is presented. Also discussed is a practical application scenario that demonstrates the capabilities and limitations of the alert system using both real and synthetic video examples.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Synthetic Data for Artificial Intelligence and Machine Learning: Tools, Techniques, and Applications, 1252907 (2023) https://doi.org/10.1117/12.2665017
Achieving state of the art performance with CNNs (Convolutional Neural Networks) on IR (infrared) detection and classification problems requires significant quantities of labeled training data. Real data in this domain can be both expensive and time-consuming to acquire. Synthetic data generation techniques have made significant gains in efficiency and realism in recent work, and provide an attractive and much cheaper alternative to collecting real data. However, the salient differences between synthetic and real IR data still constitute a “realism gap”, meaning that synthetic data is not as effective for training CNNs as real data. In this work we explore the use of image compositing techniques to combine real and synthetic IR data, improving realism while retaining many of the efficiency benefits of the synthetic data approach. In addition, we demonstrate the importance of controlling the object size distribution (in pixels) of synthetic IR training sets. By evaluating synthetically-trained models on real IR data, we show notable improvement over previous synthetic IR data approaches and suggest guidelines for enhanced performance with future training dataset generation.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Chris Goodin, Daniel W. Carruth, Lalitha Dabbiru, Michael Hedrick, Zachary S. Aspin, Justin T. Carrillo, John Kaniarz
Proceedings Volume Synthetic Data for Artificial Intelligence and Machine Learning: Tools, Techniques, and Applications, 1252908 (2023) https://doi.org/10.1117/12.2661663
Simulation has become an important enabler in the development and testing of autonomous ground vehicles (AGV), with simulation being used both to generate training data for AI/ML-based segmentation and classification algorithms and to enable in-the-loop testing of the AGV systems that use those algorithms. Furthermore, digital twins of physical test areas provide a safe, repeatable way to conduct critical safety and performance testing of these AI/ML algorithms and their performance on AGV systems. For both these digital twins and the sensor models that use them to generate synthetic data, it is important to understand the relationship between the fidelity of the scene/model and the accuracy of the resulting synthetic sensor data. This work presents a quantitative evaluation of the relationship between digital scene fidelity, sensor model fidelity, and the quality of the resulting synthetic sensor data, with a focus on camera data typically used on AGV to enable autonomous navigation.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Synthetic Data for Artificial Intelligence and Machine Learning: Tools, Techniques, and Applications, 125290A (2023) https://doi.org/10.1117/12.2663571
Machine learning algorithms have demonstrated state-of-the-art automated target recognition performance but require a large training set. In the case of electro-optical infrared (EO/IR) remote sensing, acquiring sufficient measured imagery can be difficult, but EO/IR scene simulation is a possible alternative. CoTherm, a co-simulation tool which operates MuSES in an automated fashion, is used to manipulate relevant target, background and sensor inputs to generate a library of radiance images. Various options affecting simulation run-time and output fidelity are considered and the trade-off between accuracy and compute time requirements is quantified using a measured imagery benchmark and ResNets for classification.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Synthetic Data for Artificial Intelligence and Machine Learning: Tools, Techniques, and Applications, 125290B (2023) https://doi.org/10.1117/12.2662738
A common constraint in synthetic data generation is the need to evaluate time and resource intensive equations to model physical systems of interest. In fact, many times one needs to evaluate many such models to build up to the real system of interest. In some cases, it is possible to identify a key set of independent variables that govern the equations of interest, and one can build a look up table for interpolation. However, the down side to this strategy is that many computational resources will be spent computing values that may not be used during a simulation. In this paper, we present a new strategy to lazily evaluate complex calculations to build these multi-dimensional look up tables as needed. The technique relies on identifying the fact that some models are able to reuse partial calculations to generate multiple results in a single invocation. This allows generating a base table in the neighborhood of the initial point of interest. After which, the table is grown as the parameter space expands. This reduces the initial computational cost, and the resultant table can be saved for reuse if desired. In a multiprocessing environment, it would also be possible to generate additional table entries in parallel if those points of interest are known in advance. As a specific example, we apply this technique to computing atmospheric corrections for synthetic image generation.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Synthetic Data for Artificial Intelligence and Machine Learning: Tools, Techniques, and Applications, 125290C (2023) https://doi.org/10.1117/12.2663612
Recent years have seen impressive progress in Automatic Target Recognition (ATR) technology, both in the visible and non-visible spectra, which introduces an important challenge to the Army: understanding gaps in ATR algorithms’ feature space for informed design methodology. To tackle this challenge, we look at a combination of synthetic data and adversarial learning techniques to explore the feature space of Machine Learning (ML) algorithms. Adversarial learning, however, requires large amounts of training data representing diversity in terms of target pose, lighting, and environmental conditions. Often the main bottleneck is collecting and labeling this real training data. The problem is exacerbated in infrared (IR) given unique challenges due to material and thermal variation. Here, we present a solution based on a simulator that supports generation of physically accurate custom synthetic IR training data; this data is then leveraged to systematically study weaknesses in a state-of-the-art ATR algorithm that is often used in practice, YOLOv5. We will present results showing that this approach can lead to critical insight on algorithm weaknesses with practical consequence for the design of defense mechanisms against ATR technology as well as improved training of ML algorithms to reduce feature space vulnerabilities.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Synthetic Data for Artificial Intelligence and Machine Learning: Tools, Techniques, and Applications, 125290D (2023) https://doi.org/10.1117/12.2663064
Synthetic data is often leveraged for training and testing Automatic Target Recognition (ATR) systems on a variety of operating conditions (OCs). Existing mechanisms for creating the sampling distribution of OCs to generate this data are difficult to visualize, modify, and extend. To address this, we created a user interface and toolchain for multi-modal OC sampling from probabilistic graphical models (PGMs). Our web browser-based interface allows for visualizing the PGMs, modifying the conditional probability distributions of their nodes, importing and exporting their state, operation in single- or multi-modality configurations, and persisting generated samples to a relational database. The Vue-driven web interface, programmatic interface, as well as the Python machinery for OC sampling and persistence have been containerized to allow for simplified deployment and distribution. The work described here supplies a new baseline for OC generation for single- and multi-sensor simulation and fusion.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Synthetic Data for Artificial Intelligence and Machine Learning: Tools, Techniques, and Applications, 125290E https://doi.org/10.1117/12.2664707
In this paper, we propose a comprehensive approach for the effective simulation of multi-object SAR imagery dataset generation using IRIS Electromagnetic modeling and simulation system – called IRIS-EM. Further, we describe our methodology for systematic generation of large-scale simulated SAR datasets of multi-domain (air, ground, sea) in multi-object scenes. Different from our earlier work, in this study, we considered the impact of having multiple objects in the same scene. We discuss the challenges associated with generating simulated SAR imagery datasets for multi-vehicle for the training of DL-based surveillance systems. Lastly, we present novel automation techniques ensuring realistic multi-vehicle placement in a test scene while sustaining real-world representation with fidelity.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Synthetic Data for Artificial Intelligence and Machine Learning: Tools, Techniques, and Applications, 125290F (2023) https://doi.org/10.1117/12.2663632
Digital twins of real environments are valuable tools for generating realistic synthetic data and performing simulations with artificial intelligence and machine learning models. Creating digital twins of urban, on-road environments have been extensively researched in the light of rising momentum in urban planning and autonomous vehicle systems; yet creating digital twins of rugged, off-road environments such as forests, farms, and mountainous areas is still poorly studied. In this work, we propose a pipeline to produce digital twins of off-road environments with a focus on modeling vegetation and uneven terrain. A point cloud map of the off-road environment is first reconstructed using LiDAR scans paired with scan registration algorithms. Terrain segmentation, vegetation segmentation, and Euclidean clustering are applied to separate point cloud objects into individual entities within the digital twin model. Experimental validation is carried out using LiDAR scans collected from an off-road proving ground at the Center of Advanced Vehicular Systems (CAVS) in Mississippi State University. A prototype system is demonstrated with the Mississippi State University Autonomous Vehicle Simulator (MAVS), and the source code and data are publicly available∗. The proposed framework has a wide range of applications including virtual autonomous vehicle testing, synthetic data generation, and training of AI models.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Synthetic Data for Artificial Intelligence and Machine Learning: Tools, Techniques, and Applications, 125290G (2023) https://doi.org/10.1117/12.2663417
Unmanned combat aerial vehicles (i.e., drones), are changing the modern geopolitical stage’s surveillance, security, and conflict landscape. Various technologies and solutions can help track drones; each technology has different advantages and limitations concerning drone size and detection range. Machine learning (ML) can automatically detect and track drones in real-time while superseding human-level accuracy and providing enhanced situational awareness. Unfortunately, ML’s power depends on the data’s quality and quantity. In the drone detection task scenario, limited datasets provide limited environmental variation, view angle, view distance, and drone type. We developed a customizable software tool called DyViR that generates large synthetic video datasets for training machine learning algorithms in aerial threat object detection. These datasets contain video and audio renderings of aerial objects within user-specified dynamic simulated biomes (i.e., arctic, desert, and forest). Users can alter the environment on a timeline allowing changes to behaviors such as drone flight patterns and weather conditions across a synthetically generated dataset. DyViR supports additional controls such as motion blur, anti-aliasing, and fully dynamic moving cameras to produce imagery across multiple viewing angles. Each aerial object’s classification (drone or airplane) and bounding box data automatically exports to a comma-separated-value (CSV) file and a video to form a synthetic dataset. We demonstrate the value of DyViR by training a real-time YOLOv7-tiny model on these synthetic datasets. The performance of the object detection model improved by 60.4% over its counterpart not using DyViR. This result suggests a use-case of synthetic datasets to surmount the lack of real-world training data for aerial threat object detection.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Synthetic Data for Artificial Intelligence and Machine Learning: Tools, Techniques, and Applications, 125290H (2023) https://doi.org/10.1117/12.2663918
Empirical and high-fidelity computational target models have routinely been inserted into synthetic imagery under conditions in which they were not created. Data collects are expensive and unable to be performed for all times and under all conditions. High-fidelity computational models can be time consuming and require engineers with specific expertise to generate. Therefore, instead of collecting or generating new data and models, previously generated models are used, often with hand-waving arguments that the model is close enough. DEVCOM Aviation and Missile Center has developed a tool to fill in the gap, using a physics-based calibrated model based on either the empirical or high-fidelity models to more easily and rapidly generate a new model under the specific scenario under investigation. this tool has also been used to help generate large numbers of target synthetic imagery for help in training AI algorithms.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Synthetic Data for Artificial Intelligence and Machine Learning: Tools, Techniques, and Applications, 125290I (2023) https://doi.org/10.1117/12.2664109
We consider the problem of synthetically generating data that can closely resemble human decisions made in the context of an interactive human-AI system like a computer game. We propose a novel algorithm that can generate synthetic, human-like, decision making data while starting from a very small set of decision making data collected from humans. Our proposed algorithm integrates the concept of reward shaping with an imitation learning algorithm to generate the synthetic data. We have validated our synthetic data generation technique by using the synthetically generated data as a surrogate for human interaction data to solve three sequential decision making tasks of increasing complexity within a small computer game-like setup. Different empirical and statistical analyses of our results show that the synthetically generated data can substitute the human data and perform the game-playing tasks almost indistinguishably, with very low divergence, from a human performing the same tasks.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Jeffrey Kerley, Derek T. Anderson, Brendan Alvey, Andrew Buck
Proceedings Volume Synthetic Data for Artificial Intelligence and Machine Learning: Tools, Techniques, and Applications, 125290J (2023) https://doi.org/10.1117/12.2663717
Large and diverse datasets can now be simulated with associated truth to train and evaluate AI/ML algorithms. This convergence of readily accessible simulation (SIM) tools, real-time high-performance computing, and large repositories of high-quality, free-to-inexpensive photorealistic scanned assets is a potential artificial intelligence (AI) and machine learning (ML) gamechanger. While this feat is now within our grasp, what SIM data should be generated, how should it be generated, and how can this be achieved in a controlled and scalable fashion? First, we discuss a formal procedural language for specifying scenes (LSCENE) and collecting sampled datasets (LCAP). Second, we discuss specifics regarding our production and storage of data, ground truth, and metadata. Last, two LSCENE/LCAP examples are discussed and three unmanned aerial vehicle (UAV) AI/ML use cases are provided to demonstrate the range and behavior of the proposed ideas. Overall, this article is a step towards closed-loop automated AI/ML design and evaluation.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Synthetic Data for Artificial Intelligence and Machine Learning: Tools, Techniques, and Applications, 125290K (2023) https://doi.org/10.1117/12.2662276
Widespread adoption of artificial intelligence (AI) in civilian and defense government agencies requires the stakeholders to have trust in AI solutions. One of the five principles of ethical AI, identified by the Department of Defense, emphasizes that AI solutions be equitable. The AI system involves a series of choices from data selection to model definition, each of which is subject to human and algorithmic biases and can lead to unintended consequences. This paper focuses on allowing AI bias mitigation with the use of synthetic data. The proposed technique, named Fair-GAN, builds upon the recently developed Fair-SMOTE approach, which used synthesized data to fix class and other imbalances caused by protected attributes such as race and gender. Fair-GAN uses Generative Adversarial Networks (GAN) instead of the Synthetic Minority Oversampling Technique (SMOTE). While SMOTE can only synthesize tabular and numerical data, GAN can synthesize tabular data with numerical, binary, and categorical variables. GAN can also synthesize other data forms such as images, audio and text. In our experiments, we use the Synthetic Data Vault (SDV), which implements approaches such as conditional tabular GAN (CTGAN) and tabular variational autoencoders (TVAE). We show the applicability of Fair-GAN to several benchmark problems, which are used to evaluate the efficacy of AI bias mitigation algorithms. It is shown that Fair-GAN leads to significant improvements in metrics used for evaluating AI fairness such as the statistical parity difference, disparate impact, average odds difference, and equal opportunities difference.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Synthetic Data for Artificial Intelligence and Machine Learning: Tools, Techniques, and Applications, 125290L (2023) https://doi.org/10.1117/12.2663700
Machine learning (ML) requires both quantity and variety of examples in order to learn generalizable patterns. In cybersecurity, labeling network packets is a tedious and difficult task. This leads to insufficient labeled datasets of network packets for training ML-based Network Intrusion Detection Systems (NIDS) to detect malicious intrusions. Furthermore, benign network traffic and malicious cyber attacks are always evolving and changing, meaning that the existing datasets quickly become obsolete. We investigate generative ML modeling for network packet synthetic data generation/augmentation to improve NIDS detection of novel, but similar, cyber attacks by generating well-labeled synthetic network traffic. We develop a Cyber Creative Generative Adversarial Network (CCGAN), inspired by previous generative modeling to create new art styles from existing art images, trained on existing NIDS datasets in order to generate new synthetic network packets. The goal is to create network packet payloads that appear malicious but from different distributions than the original cyber attack classes. We use these new synthetic malicious payloads to augment the training of a ML-based NIDS to evaluate whether it is better at correctly identifying whole classes of real malicious packet payloads that were held-out during classifier training. Results show that data augmentation from CCGAN can increase a NIDS baseline accuracy on a novel malicious class from 79% to 97% with a minimal degradation in accuracy on benign classes (98.9% to 98.7%).
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Synthetic Data for Artificial Intelligence and Machine Learning: Tools, Techniques, and Applications, 125290M https://doi.org/10.1117/12.2664713
In this research work, by using the comprehensive IRIS simulated SAR dataset that includes ground, aerial, and marine vehicles, we explored and exploited different GAN-based techniques to increase the efficiency and effectiveness of the DL-based SAR image classifiers pre-trained based on synthetically generated SAR imagery datasets. Particularly, in this paper, we present three adversarial attach techniques on the DL classifiers. Then, we propose a streamlined generative model for properly training of SAR classifiers with less susceptibility to newly introduced adversarial examples. Lastly, we discuss the merits of our proposed methodologies and offer our future research directions for the further improvement of the proposed SAR-GAN-CNN model and summarize our future research contributions.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Synthetic Data for Multi-Domain AI/ML Applications: Joint Session with Conferences 12529 and 12538
Proceedings Volume Synthetic Data for Artificial Intelligence and Machine Learning: Tools, Techniques, and Applications, 125290O (2023) https://doi.org/10.1117/12.2663088
Automatic target recognition (ATR) technology is likely to play an increasingly prevalent role in maintaining situational awareness in the modern battlefield. Progress in deep learning has enabled considerable progress in the development of ATR algorithms; however, these algorithms require large amounts of high-quality annotated data to train and that is often the main bottleneck. Synthetic data offers a potential solution to this problem, especially given recent proliferation of tools and techniques to synthesize custom data. Here, we focus on ATR, in the visible domain, from the perspective of a small drone, which represents a domain of growing importance to the Army. We describe custom simulators built to support synthetic data for multiple targets in a variety of environments. We describe a field experiment where we compared a baseline (YOLOv5) model, trained on off-the-shelf large generic public datasets, with a model augmented with specialized synthetic data. We deployed the models on a VOXL platform in a small drone. Our results showed a considerable boost in performance when using synthetic data of over 40% in target detection accuracy (average precision with at least 50% overlap). We discuss the value of synthetic data for this domain, the opportunities it creates, but also the novel challenges it introduces.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Synthetic Data for Artificial Intelligence and Machine Learning: Tools, Techniques, and Applications, 125290P (2023) https://doi.org/10.1117/12.2663828
Machine Learning (ML) and Artificial intelligence (AI) have increased automation potential within defense applications such as border protection, compound security, and surveillance applications. Advances in low-size weight and power (SWAP) computing platforms and unmanned aerial systems (UAS) have enabled autonomous systems to meet the critical needs of future defense systems. Recent academic advances in deep learning aided computer vision yielding impressive results on object detection and recognition, necessary capabilities to enable autonomy in defense applications. These advances, often open-sourced, enable the opportunistic integration of state-of-the-art (SOTA) algorithms. However, these systems require a large amount of object-relevant data to transfer from general academic domains to more relevant situations. Additionally, UAS systems require costly verification and validation of autonomy logic. These challenges can lead to high costs for both training data generation and costly field autonomy integration and testing activities. To address these challenges, in conjunction with partners, Elbit America has developed a multipurpose synthetic simulation environment capable of generating synthetic training data and prototyping, verifying, and validating autonomous distributed behaviors. We integrated a thermal modeling capability into Unreal Engine to create realistic training data by enabling the real-time simulation of SWIR, MWIR, and LWIR sensors. This radiometrically correct sensor model capability enables the simulation-based training data generation for our object recognition and classification pipeline, called Rapid Algorithm Development and Deployment (RADD). Several drones were instantiated using emulated flight controllers to enable end-to-end autonomy training and development before hardware availability. Herein, we describe an overview of the simulation environment and its relevance to detection, classification, and distributed autonomous decision-making.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Synthetic Data for Artificial Intelligence and Machine Learning: Tools, Techniques, and Applications, 125290Q (2023) https://doi.org/10.1117/12.2663906
Many AI/ML training datasets used for military algorithm development lack the necessary diversity to span typical operating conditions. Typical workflows for augmenting datasets with synthetic data require cumbersome setups and slow runtimes. To address the need for rapid augmentation of datasets, DEVCOM AvMC has developed a suite of tools that can be independently modulated to allow for the rapid generation of diverse training data. This paper will outline the key products that allow this rapid generation capability and share results demonstrating the capability.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Synthetic Data for Unmanned Systems Technology Applications: Joint Session with Conferences 12529 and 12549
Proceedings Volume Synthetic Data for Artificial Intelligence and Machine Learning: Tools, Techniques, and Applications, 125290R (2023) https://doi.org/10.1117/12.2664569
Synthetic data has been shown to be incredibly useful for improving the performance of perception algorithms; however, it is still unclear how to identify the right techniques to generate and to train perception models with those synthetic datasets. In this work, we show how creating a digital twin of a real-world dataset allows us to have a more principled evaluation of the synthetic-to-real domain gaps for both training and inference, and use this information to test and evaluate synthetic datasets themselves, rather than the models trained on them. Furthermore, we show how this framework allows for a measure of the inference domain gap—a measure that tells us whether testing a perception model on synthetic data is representative of testing in the real world. We use these measures to generate a synthetic dataset from the nuScenes autonomous driving dataset, targeted to maximally improve performance on specific rare classes. We optimize the synthetic data generation parameters for this dataset in order to reduce the inference and training domain gaps. We show performance improvements of over 18% on the bicycle class. Our training results provide a way of measuring the training domain gap to analyze synthetic datasets.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Synthetic Data for Artificial Intelligence and Machine Learning: Tools, Techniques, and Applications, 125290S https://doi.org/10.1117/12.2664704
In this paper, we present IRIS electromagnetic (EM) modeling and simulation (IRIS-EM) virtual environment system for systematic and automatic generation of synthetic point-cloud datasets of multi-domain (air, ground, sea) vehicles. Through a multi-step process, we furnish each CAD model with additional layers of essential physics-related information. This qualifies them for deployment into our IRIS-EM virtual environment. After this initial process, the prepared models are introduced in a physics-based modeled virtual environment with proper context to facilitate their automatic LiDAR signature generation from different viewing perspectives and ranges. Lastly, we focus on the LiDAR scene processing for automatic scene generation, terrain segmentation, object segmentation, and data parsing for deep learning classification.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Synthetic Data for Artificial Intelligence and Machine Learning: Tools, Techniques, and Applications, 125290T (2023) https://doi.org/10.1117/12.2662920
Deep learning based vision models have had significant success recently. However, one of the biggest challenges is to apply the established models to new targets and environments where no samples exist for training. To solve this zero-shot learning problem, we formulated a heterogeneous learning domain adaptation scenario where labeled data from other domains are available for the new target and for some other unrelated classes. We developed an innovative zero-shot domain adaptation (ZSDA) method by implementing end-to-end adversarial training with class-aware alignment and latent space feature transformation. We demonstrated the performance improvements in several applications in comparison with traditional unsupervised domain adaptation (UDA) approach including threat detection in X-ray security screening imagery.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Synthetic Data for Artificial Intelligence and Machine Learning: Tools, Techniques, and Applications, 125290U (2023) https://doi.org/10.1117/12.2662948
A yet-unsolved problem in the Radio Frequency (RF) domain is with transmission collisions in Amplitude Modulated (AM) radio, an example of which is air traffic control radios. In a high-traffic environment like aviation, radio operators often unknowingly transmit at the same time, leading to other radios receiving both transmissions layered together. This renders both transmissions difficult – if not impossible – to understand, leading to frustration at best and, at worst, critical transmissions being completely lost. Machine learning (ML) can successfully separate multiple overlapping speakers in audio, but we extend this idea to the RF domain. Training a performant ML model for such a scenario requires ample quantities of not only the signals with the overlapping transmissions, but also each original, individual signal. This data is easier to collect for audio – and many open-source datasets for such tasks are readily available – but no such datasets exist for AM radio separation. To collect adequate volumes of such data that is sufficiently diverse would be time-consuming and expensive. To solve this problem, we turn to data generation. Using our custom data generation pipeline combined with a Deep Neural Network (DNN), we demonstrate a 98.9% increase in signal separation efficacy when using AM radio as compared to when using audio alone. AM radio collision mitigation has broad implications, especially in congested scenarios with a high likelihood of colliding transmitters like aviation communications. Successful separation of such signals enables mitigation logic, leading to a smoother and safer user experience.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Synthetic Data for Artificial Intelligence and Machine Learning: Tools, Techniques, and Applications, 125290V (2023) https://doi.org/10.1117/12.2663695
Deep Q-learning (DQL) method has been proven a great success in autonomous mobile robots. However, the routine of DQL can often yield improper agent behavior (multiple circling-in-place actions) that comes with long training episodes until convergence. To address such problem, this project develops novel techniques that improve DQL training in both simulations and physical experiments. Specifically, the Dynamic Epsilon Adjustment method is integrated to reduce the frequency of non-ideal agent behaviors and therefore improve the control performance (i.e., goal rate). A Dynamic Window Approach (DWA) global path planner is designed in the physical training process so that the agent can reach more goals with less collision within a fixed amount of episodes. The GMapping Simultaneous Localization and Mapping (SLAM) method is also applied to provide a SLAM map to the path planner. The experiment results demonstrate that our developed approach can significantly improve the training performance in both simulation and physical training environment.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Synthetic Data for Artificial Intelligence and Machine Learning: Tools, Techniques, and Applications, 125290Y (2023) https://doi.org/10.1117/12.2663827
Effective communication and control of a team of humans and robots is critical for a number DoD operations and scenarios. In an ideal case, humans would communicate with the robot teammates using nonverbal cues (i.e., gestures) that work reliably in a variety of austere environments and from different vantage points. A major challenge is that traditional gesture recognition algorithms using deep learning methods require large amounts of data to achieve robust performance across a variety of conditions. Our approach focuses on reducing the need for “hard-to-acquire” real data by using synthetically generated gestures in combination with synthetic-to-real domain adaptation techniques. We also apply the algorithms to improve the robustness and accuracy of gesture recognition from shifts in viewpoints (i.e., air to ground). Our approach leverages the soon-to-be released dataset called Robot Control Gestures (RoCoG-v2), consisting of corresponding real and synthetic videos from ground and aerial viewpoints. We first demonstrate real-time performance of the algorithm running on low-SWAP, edge hardware. Next, we demonstrate the ability to accurately classify gestures from different viewpoints with varying backgrounds representative of DoD environments. Finally, we show the ability to use the inferred gestures to control a team of Boston Dynamic Spot robots. This is accomplished using inferred gestures to control the formation of the robot team as well as to coordinate the robot’s behavior. Our expectation is that the domain adaptation techniques will significantly reduce the need for real-world data and improve gesture recognition robustness and accuracy using synthetic data.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Synthetic Data for Artificial Intelligence and Machine Learning: Tools, Techniques, and Applications, 125290Z (2023) https://doi.org/10.1117/12.2664030
Effectively recognizing human gestures from variant viewpoints plays a fundamental role in the successful collaboration between humans and robots. Deep learning approaches have achieved promising performance in gesture recognition. However, they are usually data-hungry and require large-scale labeled data, which are not usually accessible in a practical setting. Synthetic data, on the other hand, can be easily obtained from simulators with fine-grained annotations and variant modalities. Existing state-of-the-art approaches have shown promising results using synthetic data, but there is still a large performance gap between the models trained on synthetic data and real data. To learn domain-invariant feature representations, we propose a novel approach which jointly takes RGB videos and 3D meshes as inputs to perform robust action recognition. We empirically validate our model on the RoCoG-v2 dataset, which consists of a variety of real and synthetic videos of gestures from the ground and air perspectives. We show that our model trained on synthetic data can outperform state-of-the-art models under the same training setting and models trained on real data.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Synthetic Data for Artificial Intelligence and Machine Learning: Tools, Techniques, and Applications, 1252910 (2023) https://doi.org/10.1117/12.2664064
Synthetic-to-real data translation using generative adversarial learning has achieved significant success in improving synthetic data. Yet, limited studies focus on deep evaluation and comparison of adversarial training on general-purpose synthetic data for machine learning. This work aims to train and evaluate a synthetic-to-real generative model that transforms the synthetic renderings into more realistic styles on general-purpose datasets conditioned with unlabeled real-world data. Extensive performance evaluation and comparison have been conducted through qualitative and quantitative metrics and a defined downstream perception task.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Synthetic Data for Artificial Intelligence and Machine Learning: Tools, Techniques, and Applications, 1252911 (2023) https://doi.org/10.1117/12.2665333
Contemporary object pose estimation algorithms predict transformation parameters of perspectives of objects from a reference pose. Learning these parameters often requires significantly more data than conventional sensors provide. Therefore, synthetic data is frequently used to increase the amount of data, number of object perspectives, and number of object classes, which is beneficial for improving the generalization of pose estimation algorithms. However, robust synthesis of objects from different perspectives requires manually setting precision describing increments between pose angles. Consequently, learning from arbitrarily small increments requires very precise sampling from existing sensor data, which increases time, complexity, and resources necessary for a larger sample size. Therefore, there is a need to minimize the amount of sampling and processing required for synthesis methods (e.g., generative) which have difficulty producing samples that lie outside of groups within the latent space resulting in modal collapse. While reducing the number of observed object perspectives directly addresses this problem, generative models have issues synthesizing out-of-distribution (OOD) data. We study the effects of synthesizing OOD data by exploiting orthogonality constraints to synthesize intermediate poses of 3D point cloud object representations that are not observed during training. Additionally, we perform an ablation study on each axial rotation for poses and the OOD generative capabilities between different model types. We test and evaluate our proposed method using objects from ShapeNet.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Synthetic Data for Artificial Intelligence and Machine Learning: Tools, Techniques, and Applications, 1252912 (2023) https://doi.org/10.1117/12.2666925
An important application of deep learning classifiers is to recognize vehicles or ships in satellite images. Neural Radiance Field (NeRF) methods apply a limited number of 2D electro-optical (EO) views of an object to learn its 3D shape and view-dependent radiance properties. The resulting latent model generates novel views for training a deep learning classifier. Space-based synthetic aperture radar (SAR) sensors present a new, useful source of wide-area imagery. Because SAR phenomenology and geometry are different from EO, we construct a suitable NeRF-like approach for SAR and demonstrate generation of realistic simulated SAR imagery..Several commercial and military applications classify vehicles or ships in satellite images. In many cases, it is infeasible to acquire looks at the objects over the wide range of views and conditions needed for machine learning classifier training. Neural Radiance Fields (NeRF) and other related methods apply a limited number of 2D views of an object to learn its 3D shape and view-dependent radiance properties. One application of these techniques is to generate additional, novel views of objects for training deep learning classifiers. Current NeRF and NeRF-like methods have been demonstrated with electro-optical (EO) imagery. The emergence of space-based synthetic aperture radar (SAR) imaging sensors presents a new, useful source of wide-area imagery with day/night, all-weather commercial and military applications. Because SAR imaging phenomenology and projection geometry are different from EO, the application of NeRF-like methods to generate novel SAR images of objects for training a classifier presents new challenges. For example, unlike EO, the mono-static SAR illumination source moves with the sensor view geometry. In addition, the 2D SAR image projection is angle-range, not angle-angle. In this paper, we evaluate the salient differences between EO and SAR, and construct a processing pipeline to generate realistic synthetic SAR imagery. The synthetic SAR imagery provides additional training data, augmenting collected image data, for machine learning-based Automatic Target Recognition (ATR) algorithms. We provide examples of synthetic SAR image creation using this approach.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Synthetic Data for Artificial Intelligence and Machine Learning: Tools, Techniques, and Applications, 1252915 (2023) https://doi.org/10.1117/12.2652598
Using Machine Learning systems in the real world can often be problematic, with inexplicable black-box models, the assumed certainty of imperfect measurements, or providing a single classification instead of a probability distribution. This paper introduces Indecision Trees, a modification to Decision Trees which learn under uncertainty, can perform inference under uncertainty, provide a robust distribution over the possible labels, and can be disassembled into a set of logical arguments for use in other reasoning systems.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Synthetic Data for Artificial Intelligence and Machine Learning: Tools, Techniques, and Applications, 1252916 (2023) https://doi.org/10.1117/12.2678034
Detecting camouflaged objects is crucial in various applications such as military surveillance, wildlife conservation, and in search and rescue operations. However, the limited availability of camouflaged object data poses a significant challenge in developing accurate detection models. This paper proposes a quasi-synthetic data generation by image compositing combined with attention-based deep learning-based harmonization methodology to generate feature-enriched realistic images for camouflaged objects under varying scenarios. In our work, we developed a diverse set of images to simulate different environmental conditions, including lighting, shadows, fog, dust, and snow, to test our proposed methodology. The intention of generating such photo-realistic images is to increase the robustness of the model with the additional benefit of data augmentation for training our camouflaged object detection model(COD). Furthermore, we evaluate our approach using state-of-the-art object detection models and demonstrate that training with our quasi-synthetic images can significantly improve the detection accuracy of camouflaged objects under varying conditions. Additionally, to test the real operational performance of the developed models, we deployed the models on resource-constrained edge devices for real-time object detection to validate the performance of the trained model on quasi-synthetic data compared to the synthetic data generated by conventional neural style transfer architecture.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.