Multi-modal RGB-D Image Segmentation from Appearance and Geometric Depth Maps

Isail Salazar; Said Pertuz; Fabio Martínez

doi:10.22430/22565337.1538

Isail Salazar Universidad Industrial de Santander, Colombia https://orcid.org/0000-0002-9638-3952
Said Pertuz Universidad Industrial de Santander, Colombia https://orcid.org/0000-0001-8498-9917
Fabio Martínez* Universidad Industrial de Santander, Colombia https://orcid.org/0000-0001-7353-049X

DOI: https://doi.org/10.22430/22565337.1538

Keywords: Image segmentation, over-segmentation, RGB-D images, depth information, multi-modal segmentation

Abstract Authors References How to Cite Downloads

Abstract

Classical image segmentation algorithms exploit the detection of similarities and discontinuities of different visual cues to define and differentiate multiple regions of interest in images. However, due to the high variability and uncertainty of image data, producing accurate results is difficult. In other words, segmentation based just on color is often insufficient for a large percentage of real-life scenes. This work presents a novel multi-modal segmentation strategy that integrates depth and appearance cues from RGB-D images by building a hierarchical region-based representation, i.e., a multi-modal segmentation tree (MM-tree). For this purpose, RGB-D image pairs are represented in a complementary fashion by different segmentation maps. Based on color images, a color segmentation tree (C-tree) is created to obtain segmented and over-segmented maps. From depth images, two independent segmentation maps are derived by computing planar and 3D edge primitives. Then, an iterative region merging process can be used to locally group the previously obtained maps into the MM-tree. Finally, the top emerging MM-tree level coherently integrates the available information from depth and appearance maps. The experiments were conducted using the NYU-Depth V2 RGB-D dataset, which demonstrated the competitive results of our strategy compared to state-of-the-art segmentation methods. Specifically, using test images, our method reached average scores of 0.56 in Segmentation Covering and 2.13 in Variation of Information.

Author Biographies

Isail Salazar, Universidad Industrial de Santander, Colombia

Electronic Engineer, Escuela de Ingeniería de Sistemas e Informática, Universidad Industrial de Santander, Santander-Colombia, isail.salazar@saber.uis.edu.co

Said Pertuz, Universidad Industrial de Santander, Colombia

Ph.D in Computer Science, Escuela de Ingenierías Eléctrica, Electrónica y de Telecomunicaciones, Universidad Industrial de Santander, Santander- Colombia, spertuz@uis.edu.co

Fabio Martínez*, Universidad Industrial de Santander, Colombia

Ph.D in Systems and Computer Engineering, Escuela de Ingeniería de Sistemas e Informática, Universidad Industrial de Santander, Santander- Colombia, famarcar@saber.uis.edu.co

References

P. Arbelaez, M. Maire, C. Fowlkes, and J. Malik, “Contour Detection and Hierarchical Image Segmentation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 33, no. 5, pp. 898–916, May. 2011. https://doi.org/10.1109/TPAMI.2010.161

X. Wang, Y. Tang, S. Masnou, and L. Chen, “A Global/Local Affinity Graph for Image Segmentation,” IEEE Trans. Image Process., vol. 24, no. 4, pp. 1399–1411, Apr. 2015. https://doi.org/10.1109/TIP.2015.2397313

J. Han, L. Shao, D. Xu, and J. Shotton, “Enhanced Computer Vision With Microsoft Kinect Sensor: A Review,” IEEE Trans. Cybern., vol. 43, no. 5, pp. 1318–1334, Oct. 2013. https://doi.org/10.1109/TCYB.2013.2265378

N. Silberman, D. Hoiem, P. Kohli, and R. Fergus, “Indoor segmentation and support inference from RGBD images,” Comput. Vis. -- ECCV 2012 12th Eur. Conf. Comput. Vis., pp. 746–760, Berlin, 2012. https://doi.org/10.1007/978-3-642-33715-4_54

X. Ren, L. Bo, and D. Fox, “RGB-(D) scene labeling: Features and algorithms,” in 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, 2012, pp. 2759–2766. https://doi.org/10.1109/CVPR.2012.6247999

S. Gupta, P. Arbelaez, and J. Malik, “Perceptual organization and recognition of indoor scenes from RGB-D images,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Portland, 2013, pp. 564–571. https://doi.org/10.1109/CVPR.2013.79

Z. Li, X. M. Wu, and S. F. Chang, “Segmentation using superpixels: A bipartite graph partitioning approach,” in 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, 2012, pp. 789–796. https://doi.org/10.1109/CVPR.2012.6247750

R. Nock and F. Nielsen, “Statistical region merging,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 26, no. 11, pp. 1452–1458, Nov. 2004. https://doi.org/10.1109/TPAMI.2004.110

J. Yang, Z. Gan, K. Li, and C. Hou, “Graph-Based Segmentation for RGB-D Data Using 3-D Geometry Enhanced Superpixels,” IEEE Trans. Cybern., vol. 45, no. 5, pp. 927–940, May 2015. https://doi.org/10.1109/TCYB.2014.2340032

A. Richtsfeld, T. Mörwald, J. Prankl, M. Zillich, and M. Vincze, “Learning of perceptual grouping for object segmentation on RGB-D data,” J. Vis. Commun. Image Represent., vol. 25, no. 1, pp. 64–73, Jan. 2014. https://doi.org/10.1016/j.jvcir.2013.04.006

L. Cruz, D. Lucio, and L. Velho, “Kinect and rgbd images: Challenges and applications,” in Graphics, Patterns and Images Tutorials (SIBGRAPI-T), 2012 25th SIBGRAPI Conference on, Ouro Preto, 2012, pp. 36–49. https://doi.org/10.1109/SIBGRAPI-T.2012.13

K. Chen, Y.-K. Lai, and S.-M. Hu, “3D indoor scene modeling from RGB-D data: a survey,” Comput. Vis. Media, vol. 1, no. 4, pp. 267–278, Dec. 2015. https://doi.org/10.1007/s41095-015-0029-x

D. Lin, G. Chen, D. Cohen-Or, P. A. Heng, and H. Huang, “Cascaded Feature Network for Semantic Segmentation of RGB-D Images,” in 2017 IEEE International Conference on Computer Vision (ICCV), Venice, 2017, pp. 1320–1328. https://doi.org/10.1109/ICCV.2017.147

J. McCormac, A. Handa, S. Leutenegger, and A. J. Davison, “SceneNet RGB-D: Can 5M Synthetic Images Beat Generic ImageNet Pre-training on Indoor Segmentation?,” in 2017 IEEE International Conference on Computer Vision (ICCV), Venice, 2017, pp. 2697–2706. https://doi.org/10.1109/ICCV.2017.292

W. Wang and U. Neumann, “Depth-aware cnn for rgb-d segmentation,” in Proceedings of the European Conference on Computer Vision (ECCV), Switzerland, 2018, pp. 135–150. https://doi.org/10.1007/978-3-030-01252-6_9

Y. Guo, Y. Liu, T. Georgiou, and M. S. Lew, “A review of semantic segmentation using deep neural networks,” Int. J. Multimed. Inf. Retr., vol. 7, no. 2, pp. 87–93, Jun. 2018. https://doi.org/10.1007/s13735-017-0141-z

D. Huang, J.-H. Lai, C.-D. Wang, and P. C. Yuen, “Ensembling over-segmentations: From weak evidence to strong segmentation,” Neurocomputing, vol. 207, pp. 416–427, Sep. 2016. https://doi.org/10.1016/j.neucom.2016.05.028

J. Smisek, M. Jancosek, and T. Pajdla, “3D with Kinect,” in Consumer depth cameras for computer vision, London: Springer, 2013, pp. 3–25. https://doi.org/10.1007/978-1-4471-4640-7_1

M. Maire, P. Arbelaez, C. Fowlkes, and J. Malik, “Using contours to detect and localize junctions in natural images,” in 2008 IEEE Conference on Computer Vision and Pattern Recognition, Anchorage, Ak, 2008, pp. 1–8. https://doi.org/10.1109/CVPR.2008.4587420

P. Arbelaez, “Boundary extraction in natural images using ultrametric contour maps,” in Computer Vision and Pattern Recognition Workshop, 2006. CVPRW’06. Conference on, New York, 2006, pp. 182. https://doi.org/10.1109/CVPRW.2006.48

C. Feng, Y. Taguchi, and V. R. Kamat, “Fast plane extraction in organized point clouds using agglomerative hierarchical clustering,” in 2014 IEEE International Conference on Robotics and Automation (ICRA), Hong Kong, 2014, pp. 6218–6225. https://doi.org/10.1109/ICRA.2014.6907776

R. Hulik, M. Spanel, P. Smrz, and Z. Materna, “Continuous plane detection in point-cloud data based on 3D Hough Transform,” J. Vis. Commun. Image Represent., vol. 25, no. 1, pp. 86–97, Jan. 2014. https://doi.org/10.1016/j.jvcir.2013.04.001

T. H. Kim and K. M. Lee, S. U. Lee, “Learning full pairwise affinities for spectral segmentation,” in 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Jul. 2013, pp. 1690-1703. https://doi.org/10.1109/TPAMI.2012.237

P. Arbelaez, M. Maire, C. Fowlkes, and J. Malik, “From contours to regions: An empirical evaluation,” in 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, 2009, pp. 2294–2301. https://doi.org/10.1109/CVPR.2009.5206707

R. Unnikrishnan, C. Pantofaru, and M. Hebert, “Toward Objective Evaluation of Image Segmentation Algorithms,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 29, no. 6, pp. 929–944, Jun. 2007. https://doi.org/10.1109/TPAMI.2007.1046

M. Meilǎ, “Comparing clusterings: an axiomatic view,” in Proceedings of the 22nd international conference on Machine learning, Aug. 2005, pp. 577–584. https://doi.org/10.1145/1102351.1102424

A. Goder and V. Filkov, “Consensus clustering algorithms: Comparison and refinement,” in Proceedings of the Meeting on Algorithm Engineering & Expermiments, Jan. 2008, pp. 109–117. http://dl.acm.org/citation.cfm?id=2791204.2791215

How to Cite

[1]

I. Salazar, S. Pertuz, and F. Martínez, “Multi-modal RGB-D Image Segmentation from Appearance and Geometric Depth Maps”, TecnoL., vol. 23, no. 48, pp. 143–161, May 2020.

Download Citation

Downloads

Download data is not yet available.

Multi-modal RGB-D Image Segmentation from Appearance and Geometric Depth Maps

Abstract

Author Biographies

References

Downloads

Altmetric

Language

ingreso

Make a Submission

contact

Statistics

contenido_redes