Files
Abstract
Self-supervised monocular depth estimation, a challenging yet promising area in computer vision due to its independence from labelled data for training, traditionally grapples with the challenge of depth prediction scaled by an unknown factor due to the inherent limitations of monocular vision. Commonly, LiDAR ground truth is used to scale to absolute depth during inference, a method that indirectly relies on labeled data, limiting its practical applications. Our research introduces a novel approach to calculate the absolute depth of flat ground surfaces in images, by utilizing camera model parameters to determine the scaling factor, thereby bypassing the need for LiDAR. This approach, rigorously tested on the KITTI dataset, has shown promising results. By extending from flat ground surfaces to other image regions, our sophisticated yet computational efficiency method can significantly augment the capabilities of both self-supervised and supervised monocular depth estimation depth estimation techniques.