{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,13]],"date-time":"2026-02-13T17:20:35Z","timestamp":1771003235951,"version":"3.50.1"},"reference-count":42,"publisher":"Springer Science and Business Media LLC","issue":"11","license":[{"start":{"date-parts":[[2025,7,15]],"date-time":"2025-07-15T00:00:00Z","timestamp":1752537600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2025,7,15]],"date-time":"2025-07-15T00:00:00Z","timestamp":1752537600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"name":"National University Ireland, Galway"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["SIViP"],"published-print":{"date-parts":[[2025,11]]},"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:p>Depth estimation from 2D images is an essential task in computer vision with applications in scene understanding, robotics, and autonomous systems. The performance of supervised depth models depends on network design, loss formulation, data quality, and fine-tuning strategy. In this study, we propose a progressive fine-tuning approach for metric (absolute-scale) depth estimation. Our method uses transfer learning across multiple indoor datasets: real, synthetic, and pseudo-labelled. DenseNet-169 and EfficientNet-B0 backbones are fine-tuned on MIT-G, SUN-RGBD, SceneNet, and NYU2. We apply a three-scale combined loss with weighted MAE + Edge + SSIM terms at full, 1\/2, and 1\/4 resolution, and add a perceptual VGG component, while we keep the global coefficients of the loss at 1 for simplicity and reproducibility. We find that EfficientNet performs better on the smaller datasets, while DenseNet benefits most from the million-image SceneNet stage and reaches REL 0.105 and RMSE 0.359 on NYU2, comparable to recent transformer baselines yet using 6<jats:inline-formula>\n              <jats:alternatives>\n                <jats:tex-math>$$\\times $$<\/jats:tex-math>\n                <mml:math xmlns:mml=\"http:\/\/www.w3.org\/1998\/Math\/MathML\">\n                  <mml:mo>\u00d7<\/mml:mo>\n                <\/mml:math>\n              <\/jats:alternatives>\n            <\/jats:inline-formula> fewer parameters. The pseudo-labelled MIT-G data is used as a warm-start and shows the potential of reducing annotation cost. All headline metrics results are based on sensor ground-truth data, avoiding circular evaluation. Qualitative analysis and zero-shot tests on the unseen iBims-1 benchmark confirm that the models generalise and produce coherent, detailed depth maps across diverse indoor scenes. The proposed pipeline thus offers a balanced trade-off between accuracy and computational cost for practical indoor depth estimation.<\/jats:p>","DOI":"10.1007\/s11760-025-04496-8","type":"journal-article","created":{"date-parts":[[2025,7,15]],"date-time":"2025-07-15T07:13:30Z","timestamp":1752563610000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":1,"title":["Multi-Source Depth Estimation: Utilizing Real, Synthetic, and Monocular Depth Data with Custom Loss Functions"],"prefix":"10.1007","volume":"19","author":[{"given":"Muhammad Adeel","family":"Hafeez","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Ganesh","family":"Sistu","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Michael G.","family":"Madden","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Ihsan","family":"Ullah","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2025,7,15]]},"reference":[{"issue":"10","key":"4496_CR1","doi-asserted-by":"publisher","first-page":"16940","DOI":"10.1109\/TITS.2022.3160741","volume":"23","author":"X Dong","year":"2022","unstructured":"Dong, X., Garratt, M.A., Anavatti, S.G., Abbass, H.A.: Towards real-time monocular depth estimation for robotics: A survey. IEEE Trans. Intell. Transp. Syst. 23(10), 16940\u201316961 (2022)","journal-title":"IEEE Trans. Intell. Transp. Syst."},{"key":"4496_CR2","doi-asserted-by":"crossref","unstructured":"Xue, F., Zhuo, G., Huang, Z., Fu, W., Wu, Z., Ang, M.H.: Toward hierarchical self-supervised monocular absolute depth estimation for autonomous driving applications. In: 2020 IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 2330\u20132337 (2020)","DOI":"10.1109\/IROS45743.2020.9340802"},{"key":"4496_CR3","doi-asserted-by":"crossref","unstructured":"Diaz, C., Walker, M., Szafir, D.A., Szafir, D.: Designing for depth perceptions in augmented reality. In: 2017 IEEE International Symposium on Mixed and Augmented Reality (ISMAR), pp. 111\u2013122 (2017)","DOI":"10.1109\/ISMAR.2017.28"},{"key":"4496_CR4","doi-asserted-by":"crossref","unstructured":"Tsai, Y.-M., Chang, Y.-L., Chen, L.-G.: Block-based vanishing line and vanishing point detection for 3d scene reconstruction. In: 2006 International Symposium on Intelligent Signal Processing and Communications, pp. 586\u2013589 (2005)","DOI":"10.1109\/ISPACS.2006.364726"},{"key":"4496_CR5","doi-asserted-by":"crossref","unstructured":"Liu, B., Gould, S., Koller, D.: Single image depth estimation from predicted semantic labels. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 1253\u20131260 (2010)","DOI":"10.1109\/CVPR.2010.5539823"},{"key":"4496_CR6","doi-asserted-by":"crossref","unstructured":"Yang, L., Kang, B., Huang, Z., Xu, X., Feng, J., Zhao, H.: Depth anything: Unleashing the power of large-scale unlabeled data. In: Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, pp. 10371\u201310381 (2024)","DOI":"10.1109\/CVPR52733.2024.00987"},{"issue":"65","key":"4496_CR7","first-page":"1","volume":"17","author":"J \u017dbontar","year":"2016","unstructured":"\u017dbontar, J., LeCun, Y.: Stereo matching by training a convolutional neural network to compare image patches. J. Mach. Learn. Res. 17(65), 1\u201332 (2016)","journal-title":"J. Mach. Learn. Res."},{"key":"4496_CR8","doi-asserted-by":"publisher","first-page":"1628","DOI":"10.1109\/TIP.2019.2943019","volume":"29","author":"F Liu","year":"2019","unstructured":"Liu, F., Zhou, S., Wang, Y., Hou, G., Sun, Z., Tan, T.: Binocular light-field: Imaging theory and occlusion-robust depth perception application. IEEE Trans. Image Process. 29, 1628\u20131640 (2019)","journal-title":"IEEE Trans. Image Process."},{"key":"4496_CR9","unstructured":"Alhashim, I., Wonka, P.: High quality monocular depth estimation via transfer learning. arXiv preprint arXiv:1812.11941 (2018)"},{"key":"4496_CR10","doi-asserted-by":"crossref","unstructured":"Ranftl, R., Bochkovskiy, A., Koltun, V.: Vision transformers for dense prediction. In: Proceedings of the IEEE\/CVF International Conference on Computer Vision, pp. 12179\u201312188 (2021)","DOI":"10.1109\/ICCV48922.2021.01196"},{"key":"4496_CR11","doi-asserted-by":"crossref","unstructured":"Richter, S.R., Hayder, Z., Koltun, V.: Playing for benchmarks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2213\u20132222 (2017)","DOI":"10.1109\/ICCV.2017.243"},{"issue":"8","key":"4496_CR12","doi-asserted-by":"publisher","first-page":"10235","DOI":"10.1007\/s11063-023-11325-x","volume":"55","author":"Z Lu","year":"2023","unstructured":"Lu, Z., Chen, Y.: Joint self-supervised depth and optical flow estimation towards dynamic objects. Neural Process. Lett. 55(8), 10235\u201310249 (2023)","journal-title":"Neural Process. Lett."},{"issue":"3","key":"4496_CR13","doi-asserted-by":"publisher","first-page":"1623","DOI":"10.1109\/TPAMI.2020.3019967","volume":"44","author":"R Ranftl","year":"2020","unstructured":"Ranftl, R., Lasinger, K., Hafner, D., Schindler, K., Koltun, V.: Towards robust monocular depth estimation: Mixing datasets for zero-shot cross-dataset transfer. IEEE Trans. Pattern Anal. Mach. Intell. 44(3), 1623\u20131637 (2020)","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"4496_CR14","doi-asserted-by":"crossref","unstructured":"Hafeez, M.A., Madden, M.G., Sistu, G., Ullah, I.: Depth estimation using weighted-loss and transfer learning. Proceedings Copyright 780, 787","DOI":"10.5220\/0012461300003660"},{"key":"4496_CR15","doi-asserted-by":"publisher","DOI":"10.1016\/j.cmpb.2021.106504","volume":"213","author":"A Bailly","year":"2022","unstructured":"Bailly, A., Blanc, C., Francis, \u00c9., Guillotin, T., Jamal, F., Wakim, B., Roy, P.: Effects of dataset size and interactions on the prediction performance of logistic regression and deep learning models. Comput. Methods Programs Biomed. 213, 106504 (2022)","journal-title":"Comput. Methods Programs Biomed."},{"key":"4496_CR16","doi-asserted-by":"crossref","unstructured":"Silberman, N., Hoiem, D., Kohli, P., Fergus, R.: Indoor segmentation and support inference from rgbd images. In: Computer Vision\u2013ECCV 2012: 12th European Conference on Computer Vision, Florence, Italy, October 7-13, 2012, Proceedings, Part V 12, pp. 746\u2013760 (2012). Springer","DOI":"10.1007\/978-3-642-33715-4_54"},{"key":"4496_CR17","doi-asserted-by":"crossref","unstructured":"Song, S., Lichtenberg, S.P., Xiao, J.: Sun rgb-d: A rgb-d scene understanding benchmark suite. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 567\u2013576 (2015)","DOI":"10.1109\/CVPR.2015.7298655"},{"key":"4496_CR18","doi-asserted-by":"crossref","unstructured":"Quattoni, A., Torralba, A.: Recognizing indoor scenes. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 413\u2013420 (2009)","DOI":"10.1109\/CVPR.2009.5206537"},{"key":"4496_CR19","doi-asserted-by":"crossref","unstructured":"McCormac, J., Handa, A., Leutenegger, S., Davison, A.J.: Scenenet rgb-d: Can 5m synthetic images beat generic imagenet pre-training on indoor segmentation? In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2678\u20132687 (2017)","DOI":"10.1109\/ICCV.2017.292"},{"key":"4496_CR20","doi-asserted-by":"crossref","unstructured":"Bleyer, M., Rhemann, C., Rother, C.: Patchmatch stereo-stereo matching with slanted support windows. In: Bmvc, vol. 11, pp. 1\u201311 (2011)","DOI":"10.5244\/C.25.14"},{"key":"4496_CR21","doi-asserted-by":"crossref","unstructured":"Carvalho, M., Le\u00a0Saux, B., Trouv\u00e9-Peloux, P., Almansa, A., Champagnat, F.: On regression losses for deep depth estimation. In: 2018 25th IEEE International Conference on Image Processing (ICIP), pp. 2915\u20132919 (2018)","DOI":"10.1109\/ICIP.2018.8451312"},{"key":"4496_CR22","doi-asserted-by":"publisher","first-page":"14","DOI":"10.1016\/j.neucom.2020.12.089","volume":"438","author":"Y Ming","year":"2021","unstructured":"Ming, Y., Meng, X., Fan, C., Yu, H.: Deep learning for monocular depth estimation: A review. Neurocomputing 438, 14\u201333 (2021)","journal-title":"Neurocomputing"},{"key":"4496_CR23","doi-asserted-by":"crossref","unstructured":"Eigen, D., Fergus, R.: Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2650\u20132658 (2015)","DOI":"10.1109\/ICCV.2015.304"},{"key":"4496_CR24","doi-asserted-by":"crossref","unstructured":"Fu, H., Gong, M., Wang, C., Batmanghelich, K., Tao, D.: Deep ordinal regression network for monocular depth estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2002\u20132011 (2018)","DOI":"10.1109\/CVPR.2018.00214"},{"key":"4496_CR25","doi-asserted-by":"crossref","unstructured":"Shim, D., Kim, H.J.: Learning a geometric representation for data-efficient depth estimation via gradient field and contrastive loss. In: 2021 IEEE International Conference on Robotics and Automation (ICRA), pp. 13634\u201313640 (2021)","DOI":"10.1109\/ICRA48506.2021.9561793"},{"key":"4496_CR26","unstructured":"Cabon, Y., Murray, N., Humenberger, M.: Virtual kitti 2. arXiv preprint arXiv:2001.10773 (2020)"},{"key":"4496_CR27","doi-asserted-by":"publisher","first-page":"44176","DOI":"10.1109\/ACCESS.2023.3272292","volume":"11","author":"A Hendra","year":"2023","unstructured":"Hendra, A., Kanazawa, Y.: Tp-gan: Simple adversarial network with additional player for dense depth image estimation. IEEE Access 11, 44176\u201344191 (2023)","journal-title":"IEEE Access"},{"key":"4496_CR28","unstructured":"Dosovitskiy, A.: An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)"},{"key":"4496_CR29","doi-asserted-by":"crossref","unstructured":"Piccinelli, L., Yang, Y.-H., Sakaridis, C., Segu, M., Li, S., Van\u00a0Gool, L., Yu, F.: Unidepth: Universal monocular metric depth estimation. In: Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, pp. 10106\u201310116 (2024)","DOI":"10.1109\/CVPR52733.2024.00963"},{"key":"4496_CR30","doi-asserted-by":"crossref","unstructured":"Li, Z., Wang, X., Liu, X., Jiang, J.: Binsformer: Revisiting adaptive bins for monocular depth estimation. IEEE Transactions on Image Processing (2024)","DOI":"10.1109\/TIP.2024.3416065"},{"key":"4496_CR31","doi-asserted-by":"crossref","unstructured":"Ke, B., Obukhov, A., Huang, S., Metzger, N., Daudt, R.C., Schindler, K.: Repurposing diffusion-based image generators for monocular depth estimation. In: Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, pp. 9492\u20139502 (2024)","DOI":"10.1109\/CVPR52733.2024.00907"},{"key":"4496_CR32","doi-asserted-by":"crossref","unstructured":"Schreiber, A.M., Hong, M., Rozenblit, J.W.: Monocular depth estimation using synthetic data for an augmented reality training system in laparoscopic surgery. In: 2021 IEEE International Conference on Systems, Man, and Cybernetics (SMC), pp. 2121\u20132126 (2021)","DOI":"10.1109\/SMC52423.2021.9658708"},{"key":"4496_CR33","doi-asserted-by":"crossref","unstructured":"Ullah, I., Abinesh, S., Smyth, D.L., Karimi, N.B., Drury, B., Glavin, F.G., Madden, M.G.: A virtual testbed for critical incident investigation with autonomous remote aerial vehicle surveying, artificial intelligence, and decision support. In: ECML PKDD 2018 Workshops: Nemesis 2018, UrbReas 2018, SoGood 2018, IWAISe 2018, and Green Data Mining 2018, Dublin, Ireland, September 10-14, 2018, Proceedings 18, pp. 216\u2013221 (2019). Springer","DOI":"10.1007\/978-3-030-13453-2_18"},{"key":"4496_CR34","doi-asserted-by":"crossref","unstructured":"Tonioni, A., Rahnama, O., Joy, T., Stefano, L.D., Ajanthan, T., Torr, P.H.: Learning to adapt for stereo. In: Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, pp. 9661\u20139670 (2019)","DOI":"10.1109\/CVPR.2019.00989"},{"key":"4496_CR35","unstructured":"Gupta, S., Ullah, I., Madden, M.: Coyote: A dataset of challenging scenarios in visual perception for autonomous vehicles. In: AISafety@ IJCAI (2021)"},{"key":"4496_CR36","doi-asserted-by":"crossref","unstructured":"Bhanushali, J., Muniyandi, M., Chakravarthula, P.: Cross-domain synthetic-to-real in-the-wild depth and normal estimation for 3d scene understanding. In: Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, pp. 1290\u20131300 (2024)","DOI":"10.1109\/CVPRW63382.2024.00136"},{"key":"4496_CR37","first-page":"103","volume":"75","author":"K Hao","year":"2019","unstructured":"Hao, K.: Training a single ai model can emit as much carbon as five cars in their lifetimes. MIT technology Review 75, 103 (2019)","journal-title":"MIT technology Review"},{"key":"4496_CR38","doi-asserted-by":"publisher","DOI":"10.1016\/j.mlwa.2021.100218","volume":"7","author":"S Paul","year":"2022","unstructured":"Paul, S., Jhamb, B., Mishra, D., Kumar, M.S.: Edge loss functions for deep-learning depth-map. Machine Learning with Applications 7, 100218 (2022)","journal-title":"Machine Learning with Applications"},{"key":"4496_CR39","doi-asserted-by":"crossref","unstructured":"Liu, X., Gao, H., Ma, X.: Perceptual losses for self-supervised depth estimation. In: Journal of Physics: Conference Series, vol. 1952, p. 022040 (2021). IOP Publishing","DOI":"10.1088\/1742-6596\/1952\/2\/022040"},{"key":"4496_CR40","doi-asserted-by":"crossref","unstructured":"Koch, T., Liebel, L., Fraundorfer, F., Korner, M.: Evaluation of cnn-based single-image depth estimation methods. In: Proceedings of the European Conference on Computer Vision (ECCV) Workshops, pp. 0\u20130 (2018)","DOI":"10.1007\/978-3-030-11015-4_25"},{"key":"4496_CR41","unstructured":"Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Advances in neural information processing systems 27 (2014)"},{"key":"4496_CR42","doi-asserted-by":"crossref","unstructured":"Sagar, A.: Monocular depth estimation using multi scale neural network and feature fusion. In: Proceedings of the IEEE\/CVF Winter Conference on Applications of Computer Vision, pp. 656\u2013662 (2022)","DOI":"10.1109\/WACVW54805.2022.00072"}],"container-title":["Signal, Image and Video Processing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11760-025-04496-8.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s11760-025-04496-8\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11760-025-04496-8.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,9,7]],"date-time":"2025-09-07T09:58:54Z","timestamp":1757239134000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s11760-025-04496-8"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,7,15]]},"references-count":42,"journal-issue":{"issue":"11","published-print":{"date-parts":[[2025,11]]}},"alternative-id":["4496"],"URL":"https:\/\/doi.org\/10.1007\/s11760-025-04496-8","relation":{},"ISSN":["1863-1703","1863-1711"],"issn-type":[{"value":"1863-1703","type":"print"},{"value":"1863-1711","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,7,15]]},"assertion":[{"value":"26 June 2025","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"29 June 2025","order":2,"name":"revised","label":"Revised","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"1 July 2025","order":3,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"15 July 2025","order":4,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"The authors declare no competing interests.","order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"876"}}