Multi-modal Fusion and Wavelet Feature Extraction for Event-intensity Stereo Matching
-
-
Abstract
Stereo matching networks based on intensity modality or events modality are susceptible to complex environments, whereas multi-modal stereo matching networks fail to fully utilize geometric contextual information and suffer from information loss during feature extraction. To address these challenges, we propose a novel end-to-end event-intensity stereo matching network based on multi-modal fusion and wavelet feature extraction. First, the multi-modal geometric contextual fusion module (MGCFM) is proposed to integrate contextual information from event streams with geometric information from intensity images, thereby generating robust fusion representations. Subsequently, the wavelet feature extraction module (WFEM) is designed to extract precise wavelet features via Haar wavelet, transforming fusion representations into the frequency domain for pixelwise stereo matching. Finally, the wavelet features are fed into a stereo matching network to generate depth maps. The proposed method outperforms most state-of-the-art (SOTA) approaches in accuracy and speed, achieving a mean disparity error of 0.34 pixels on the multi-vehicle stereo camera dataset (MVSEC) dataset and requiring 0.04 seconds on the stereo event camera dataset (DSEC) with a 1PE metric of 6.728%.
-
-