3D indoor semantic scene completion method based on multi-channel difference fusion

doi:10.14088/j.cnki.issn0439-8114.2026.02.029

Abstract

Abstract: To address the issues of missing 3D perceptual information and insufficient semantic understanding caused by object occlusion and compact spatial structures in complex indoor environments, a multi-channel difference fusion network for semantic scene completion (MCDFNet) based on RGB-D input was proposed. The model designed a multi-channel difference fusion (MCDF) module, which, based on unified RGB-D representation, extracted differential features among RGB, Depth, and their fused channels to effectively enhance the modeling capability for the geometric structure and semantic consistency of occluded regions. Experiments on the NYUCAD dataset showed that the MCDFNet model achieved an accuracy of 72.8%, a precision of 77.1%, and a mean Intersection over Union (mIoU) of 43.4% while maintaining a single-scene completion inference time of 1.9 s, outperforming mainstream models such as AICNet, DDRNet, and GRFNet. Ablation studies demonstrated that introducing the MCDF module could improve the mIoU by 1.5 percentage points, proving its critical role in enhancing completion accuracy. The model could operate stably in highly occluded indoor environments, improving the completeness and practical value of 3D maps, and was suitable for various typical indoor application scenarios.

Key words: multi-channel difference fusion, 3D, indoor, semantic scene completion, RGB, Depth

CLC Number:

TP391

WANG Chang-shuan, LU Yun-he, JIANG Jian-wu. 3D indoor semantic scene completion method based on multi-channel difference fusion[J]. HUBEI AGRICULTURAL SCIENCES, 2026, 65(2): 195-201.

References

[1] ROLDÃO L, DE CHARETTE R, VERROUST-BLONDET A. 3D semantic scene completion: A survey[J]. International journal of computer vision, 2022, 130(8): 1978-2005.
[2] GARG S, SÜNDERHAUF N, DAYOUB F, et al. Semantics for robotic mapping, perception and interaction: A survey[J]. Foundations and trends in robotics, 2020, 8(1-2): 1-224.
[3] SONG S R, YU F, ZENG A, et al.Semantic scene completion from a single depth image[A].2017 IEEE conference on computer vision and pattern recognition (CVPR)[C]. Honolulu, HI, USA:IEEE, 2017.190-198.
[4] GUO Y X, TONG X. View-volume network for semantic scene completion from a single depth image[EB/OL]. (2018-06-14). https://arxiv.org/abs/1806.05361.
[5] LI J, LIU Y, YUAN X, et al.Depth based semantic scene completion with position importance aware loss[J]. IEEE robotics and automation letters, 2020, 5(1): 219-226.
[6] LI J, LIU Y, GONG D, et al.RGBD based dimensional decomposition residual network for 3D semantic scene completion[A].2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR)[C]. Long Beach, CA, USA:IEEE, 2020.7685-7694.
[7] WANG X, LIN D, WAN L.Ffnet: Frequency fusion network for semantic scene completion[A]. Proceedings of the association for the advancement of artificial intelligence conference on artificial intelligence[C]. Vancouver, Canada: AAAI, 2022.2550-2557.
[8] DOURADO A, DE CAMPOS T E, KIM H, et al. EdgeNet: Semantic scene completion from a single RGB-D image[A].2020 25th international conference on pattern recognition (ICPR)[C]. Milan, Italy:IEEE, 2021.503-510.
[9] LI Y M, YU Z D, CHOY C, et al.VoxFormer: Sparse voxel transformer for camera-based 3D semantic scene completion[A].2023 IEEE/CVF conference on computer vision and pattern recognition(CVPR)[C].Vancouver, BC, Canada:IEEE, 2023.9087-9098.
[10] CAO A Q, DE CHARETTE R.MonoScene: Monocular 3D semantic scene completion[A].2022 IEEE/CVF conference on computer vision and pattern recognition(CVPR)[C].New Orleans, LA, USA:IEEE, 2022.3981-3991.
[11] LI J, WANG P, HAN K, et al.Anisotropic convolutional neural networks for RGB-D based semantic scene completion[J]. IEEE transactions on pattern analysis and machine intelligence, 2022, 44(11): 8125-8138.
[12] CAI Y J, CHEN X S, ZHANG C, et al.Semantic scene completion via integrating instances and scene in-the-loop[A].2021 IEEE/CVF conference on computer vision and pattern recognition (CVPR)[C].Nashville, TN, USA:IEEE, 2021. 324-333.
[13] CHENG R, AGIA C, REN Y, et al.S3CNet: A sparse semantic scene completion network for Lidar point clouds[A]. Proceedings of the 2021 conference on robot learning[C]. London, UK: PMLR, 2021.2148-2161.
[14] ZHANG J H, ZHAO H, YAO A B, et al.Efficient semantic scene completion network with spatial group convolution[M]. Cham: Springer international publishing, 2018.749-765.
[15] LIANG Y Q,CHEN B Y, SONG S R.SSCNav: Confidence-aware semantic scene completion for visual semantic navigation[A].2021 IEEE international conference on robotics and automation(ICRA)[C]. Xi’an, China:IEEE, 2021. 13194-13200.
[16] ZHANG S L, LI S, HAO A M, et al.Point cloud semantic scene completion from RGB-D images[J]. Proceedings of the AAAI conference on artificial intelligence, 2021, 35(4): 3385-3393.
[17] YANG Z P, PAN J Z, LUO L J, et al.Extreme relative pose estimation for RGB-D scans via scene completion[A].2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR)[C]. Long Beach, CA, USA:IEEE, 2020.4526-4535.
[18] WANG Y D, TAN D J, NAVAB N, et al.Adversarial semantic scene completion from a single depth image[A].2018 international conference on 3d vision(3DV)[C]. Verona, Italy:IEEE, 2018.426-434.
[19] SILBERMAN N, HOIEM D, KOHLI P, et al.Indoor segmentation and support inference from RGBD images[M].Berlin, Heidelberg: Springer, 2012.746-760.
[20] LIU Y, LI J, YAN Q S, et al. 3D gated recurrent fusion for semantic scene completion[EB/OL]. (2020-02-17). https://arxiv.org/abs/2002.07269.
[21] LI J, HAN K, WANG P, et al.Anisotropic convolutional networks for 3D semantic scene completion[A].2020 IEEE/CVF conference on computer vision and pattern recognition(CVPR)[C]. Seattle, WA, USA:IEEE, 2020. 3348-3356.