Acquiring representative data samples is pivotal to the process of creating machine learning models. However, gathering real-world imagery often presents challenges related to privacy concerns, regulatory constraints, financial resources, and accessibility limitations. Synthetic imagery offers an opportunity to augment real -world computer vision datasets while bypassing these obstacles. Yet, a fundamental challenge in working with synthetic imagery is ensuring that the generated data closely resembles its real-world counterpart. Further, it can be difficult to generate synthetic imagery with the same features and quality required to train well-generalized computer vision models. This research paper introduces and evaluates our custom-built Replicant framework – a novel synthetic data generation framework integrated into Booz Allen’s Vision AI Stack. In developing this service, we created a framework to produce synthetic imagery that closely resembles a real-world maritime dataset, and which can be used to develop any domain-specific synthetic data. We utilize this data to train object detection models and demonstrate how synthetic data benefits model performance. Additionally, we employ similarity metrics, including perceptual hashing (pHash), Optimal Transport Dataset Distance (OTDD), and Fréchet Inception Distance (FID) to assess the likeness of these real and synthetic datasets. Finally, we explore the applicability and effectiveness of explainable AI (XAI) techniques, such as Eigen Class Activation Mapping (Eigen CAM) and Shapley Additive Explanation (SHAP), to gain insights into the performance of our deep learning models and the utility of our synthetic data. Our findings underscore the vast potential of synthetic data to benefit deep learning model performance while overcoming challenges associated with real -world data acquisition.
|