Abstract
With the increased availability of multi-view satellite images, the number of investigations on 3D urban scene reconstruction from multiple satellite images is also increasing. Conventional Multi-View Stereo (MVS) pipelines require the calibrated pose information of the satellite cameras to determine the epipolar geometry and the 3D structure of the stereo correspondences. In this study, we propose a novel Monocular Height estimation and Fusion (MHF) method for 3D reconstruction from uncalibrated multi-view satellite images. By employing a learned monocular depth network, the proposed method first obtains the height map of each satellite image. Second, all height maps obtained from the multi-view images are fused to a refined height map in each image plane. To fuse the height maps, all maps are affine transformed to a virtual reference coordinate system and the transformed maps are then projected to the image plane of each camera coordinate system. The monocular depth network was trained and evaluated on the Data Fusion Contest 2019 (DFC19) dataset including Jacksonville, FL, and Omaha, NE. We also evaluate the ATL-SN4 dataset covering Atlanta, GA to test on untrained new urban scenes.
Original language | English |
---|---|
Pages (from-to) | 1260-1270 |
Number of pages | 11 |
Journal | Remote Sensing Letters |
Volume | 14 |
Issue number | 12 |
DOIs | |
State | Published - 2023 |
Keywords
- 3D urban scene reconstruction
- Deep learning
- Monocular height map
- Photogrammetry
- Uncalibrated satellite image