With the continuous development of intelligent unmanned aerial vehicles reconnaissance, single-object detection or semantic segmentation can no longer meet the diversified requirements, while simple model stacking will cause the model to be too complex, which will seriously affect the real-time running effect. We propose a visual scene understanding algorithm based on a multitask learning network with an encoder–decoder structure. First, the efficient classification network VoVNet is selected as the feature-sharing network to obtain multiscale coding features. Second, based on the one-stage anchor-free object detector, the feature screening supplementary module implements feature interaction with segmented characters to enhance the detection capability of potential targets. Then for semantic segmentation and depth estimation pixel-level classification tasks, a cascaded chained residual pooling module is used as a parameter-sharing decoder, through the parameter-sharing mechanism to reduce the repeated decoding process to ensure the running speed. Finally, to improve the generalization ability of the model, a general-purpose dataset was constructed and network training was carried out based on the idea of knowledge distillation. Experiments on the dataset show that the performance of the multitask learning network can reach the mainstream algorithm level in semantic segmentation and depth estimation branches, whereas the object detection branch is better in multitarget recall rate, and the average time of each subtask is 39 ms, which meets the requirements of real-time performance and proves the effectiveness of the multitask network. |
ACCESS THE FULL ARTICLE
No SPIE Account? Create one
Object detection
Semantics
Unmanned aerial vehicles
Visualization
Education and training
Image segmentation
Classification systems