Efficient-NVR: accurate object 6D pose estimation via enhancing semantic representation

Guangkun Feng; Tianren Li; Zhenzhong Wei

doi:10.1117/12.3032409

13 September 2024 Efficient-NVR: accurate object 6D pose estimation via enhancing semantic representation

Guangkun Feng, Tianren Li, Zhenzhong Wei

Author Affiliations +

Proceedings Volume 13178, Eleventh International Symposium on Precision Mechanical Measurements; 131780B (2024) https://doi.org/10.1117/12.3032409
Event: Eleventh International Symposium on Precision Mechanical Measurements, 2023, Guangzhou, China

Abstract

Estimating the 6-degree-of-freedom (6Dof) pose for objects is a fundamental task in vision-based measurement. It offers targets' 3D position and orientation information with respect to the camera, which is valuable in various applications, such as robotics, autonomous driving, and augmented reality. Among different approaches, monocular vision methods have the advantage of being flexible and economical. It extracts features from a single RGB image and matches them with the corresponding parts of the target's known 3D model. Recently, regression methods directly predict objects' 6Dof pose have dominated this field by leveraging Convolution Neural Networks (CNN) and learning from tremendous data to extract semantic features. The previous method that leverages objects' surface normal vectors to disentangle rotation estimation from translation achieves superior performance. However, it adopts a backbone network to extract orientation and position features from the input image simultaneously. Therefore, the backbone network restricts the method's overall performance. In this paper, we illustrate this problem and adopt an advanced backbone network as well as a Feature Pyramid Network (FPN) to enhance the feature-extracting capability of our method. We conduct various experiments and ablation studies to demonstrate the outperformance and effectiveness of our newly proposed network, namely Efficient-NVR. Notably, it surpasses state-of-the-art methods on the Linemod benchmark by obtaining 1.3% more accuracy than the baseline.

(2024) Published by SPIE. Downloading of the abstract is permitted for personal use only.

Citation Download Citation

Guangkun Feng, Tianren Li, and Zhenzhong Wei "Efficient-NVR: accurate object 6D pose estimation via enhancing semantic representation", Proc. SPIE 13178, Eleventh International Symposium on Precision Mechanical Measurements, 131780B (13 September 2024); https://doi.org/10.1117/12.3032409

ACCESS THE FULL ARTICLE

INSTITUTIONAL
Select your institution to access the SPIE Digital Library.

SELECT YOUR INSTITUTION

PERSONAL
Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.

PERSONAL SIGN IN

No SPIE Account? Create one

;

PURCHASE THIS CONTENT

SUBSCRIBE TO DIGITAL LIBRARY

50 downloads per 1-year subscription

Members: $195

Non-members: $335 ADD TO CART

25 downloads per 1 - year subscription

Members: $145

Non-members: $250 ADD TO CART

PURCHASE SINGLE ARTICLE

Includes PDF, HTML & Video, when available

Members: $17.00

Non-members: $21.00 ADD TO CART

PROCEEDINGS
7 PAGES

DOWNLOAD PAPER SAVE TO MY LIBRARY

GET CITATION

RIGHTS & PERMISSIONS

Get copyright permission Get copyright permission on Copyright Marketplace

KEYWORDS

Pose estimation

Feature extraction

Education and training

Semantics

3D modeling

3D image processing

Ablation

Show All Keywords

Keywords/Phrases

Search In:

Publication Years