Paper
18 November 2024 MODCL: multi-modal object detection with end-to-end contrastive learning in indoor scene
Zixu Lan, Fang Deng, Angang Zhang, Zhongjian Chen
Author Affiliations +
Proceedings Volume 13403, International Conference on Algorithms, High Performance Computing, and Artificial Intelligence (AHPCAI 2024) ; 134033T (2024) https://doi.org/10.1117/12.3051405
Event: International Conference on Algorithms, High Performance Computing, and Artificial Intelligence, 2024, Zhengzhou, China
Abstract
In recent years, research on multi-modal object detection has garnered significant attention due to the comprehensive information obtained from multi-modal data. However, most object detection studies focus on outdoor scenarios, such as autonomous driving, with relatively few investigations addressing the characteristics of indoor scenes. In this paper, we identify the shortcomings of similar object detection performance in indoor scenes and propose MODCL: Multi-modal Object Detection with End to End Contrastive Learning in Indoor Scene. Within MODCL, we focus on two aspects of improvement: first, the fusion of multi-modal context based on the mapping relationship between point clouds and images; second, the incorporation of supervised contrastive learning in an end-to-end manner, eliminating the need for pre-training. Furthermore, we conducted experiments on the SUN RGB-D dataset, and the results indicate that MODCL outperforms existing detection methods that utilize both point clouds and images compared to those that rely solely on point clouds.
(2024) Published by SPIE. Downloading of the abstract is permitted for personal use only.
Zixu Lan, Fang Deng, Angang Zhang, and Zhongjian Chen "MODCL: multi-modal object detection with end-to-end contrastive learning in indoor scene", Proc. SPIE 13403, International Conference on Algorithms, High Performance Computing, and Artificial Intelligence (AHPCAI 2024) , 134033T (18 November 2024); https://doi.org/10.1117/12.3051405
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Object detection

Point clouds

Machine learning

Image fusion

Target detection

Perceptual learning

Sensors

Back to Top