Hui Yuan, Yan Huang, Naigong Yu, Dongbo Zhang, Zetao Du, Ziqi Liu, Kun Zhang. Multimodal Pretrained Knowledge for Real-world Object Navigation[J]. Machine Intelligence Research. DOI: 10.1007/s11633-024-1537-x
Citation: Hui Yuan, Yan Huang, Naigong Yu, Dongbo Zhang, Zetao Du, Ziqi Liu, Kun Zhang. Multimodal Pretrained Knowledge for Real-world Object Navigation[J]. Machine Intelligence Research. DOI: 10.1007/s11633-024-1537-x

Multimodal Pretrained Knowledge for Real-world Object Navigation

  • Most vision-and-language navigation (VLN) research focuses on simulate environments, but applying these methods to real-world scenarios is challenging because of misalignments between vision and language in complex environments, leading to path deviations. To address this, we propose a novel vision-and-language object navigation strategy that uses multimodal pretrained knowledge as a cross-modal bridge to link semantic concepts in both images and text. This improves navigation supervision at key-points and enhances robustness. Specifically, we 1) randomly generate key-points within a specific density range and optimize them on the basis of challenging locations; 2) use pretrained multimodal knowledge to efficiently retrieve target objects; 3) combine depth information with SLAM map data to predict optimal positions and orientations for accurate navigation; and 4) implement the method on a physical robot, successfully conducting navigation tests. Our approach achieves a maximum success rate of 66.7\%, outperforming existing VLN methods in real-world environments.
  • loading

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return