Object detection on imagery captured onboard aerial platforms involves different challenges than in ground-to-ground object detection. For example, images captured from UAVs with varying altitude and view angles present challenges for machine learning that are due to variations in appearance and scene attributes. Thus, it is essential to closely examine the critical variables that impact object detection from UAV platforms, such as the significant variations in pose, range to objects, background clutter, lighting, weather conditions, and velocity/acceleration of the UAV. To that end, in this work, we introduce a UAV-based image dataset, called the Archangel dataset, collected with a UAV that includes pose and range information in the form of metadata. Additionally, we use the Archangel dataset to conduct comprehensive studies of how the critical attributes of UAV-based images affect machine learning models for object detection. The extensive analysis on the Archangel dataset aims to advance optimal training and testing of machine learning models in general as well as the more specific case of UAV-based object detection using deep neural networks.
Object detection is increasingly used onboard Unmanned Aerial Vehicles (UAV) for various applications; however, the machine learning (ML) models for UAV-based detection are often validated using data curated for tasks unrelated to the UAV application. This is a concern because training neural networks on large-scale benchmarks have shown excellent capability in generic object detection tasks, yet conventional training approaches can lead to large inference errors for UAV-based images. Such errors arise due to differences in imaging conditions between images from UAVs and images in training. To overcome this problem, we characterize boundary conditions of ML models, beyond which the models exhibit rapid degradation in detection accuracy. Our work is focused on understanding the impact of different UAV-based imaging conditions on detection performance by using synthetic data generated using a game engine. Properties of the game engine are exploited to populate the synthetic datasets with realistic and annotated images. Specifically, it enables the fine control of various parameters, such as camera position, view angle, illumination conditions, and object pose. Using the synthetic datasets, we analyze detection accuracy in different imaging conditions as a function of the above parameters. We use three well-known neural network models with different model complexity in our work. In our experiment, we observe and quantify the following: 1) how detection accuracy drops as the camera moves toward the nadir-view region; 2) how detection accuracy varies depending on different object poses, and 3) the degree to which the robustness of the models changes as illumination conditions vary.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.