Multimodal Medical Image Fusion: The Perspective of Deep Learning


  • Mingyang Wei
  • Mengbo Xi
  • Yabei Li
  • Minjun Liang
  • Ge Wang



Medical image, Multimodal fusion, Deep learning.


Multimodal medical image fusion involves the integration of medical images originating from distinct modalities and captured by various sensors, with the aim to enhance image quality, minimize redundant information, and preserve specific features, ultimately leading to increased efficiency and accuracy in clinical diagnoses. In recent years, the emergence of deep learning techniques has propelled significant advancements in image fusion, addressing the limitations of conventional methods that necessitate manual design of activity level measurement and fusion rules. This paper initially presents a systematic description of the multimodal medical image fusion problem, delineating the interrelationships between different fusion modalities while summarizing their characteristics and functions. Subsequently, it reviews the theories and enhancement approaches associated with deep learning in the medical image fusion domain, striving for a comprehensive overview of the state-of-the-art developments in this field from a deep learning perspective. These developments encompass multimodal feature extraction methods based on convolutional techniques, adversarial learning-based methods, convolutional sparse representation and stacked autoencoder-based signal processing methods, and unified models. Lastly, the paper summarizes the enhancement techniques for multimodal medical image fusion methods, highlighting the pressing issues and challenges encountered by deep learning approaches in this domain.


Download data is not yet available.
<br data-mce-bogus="1"> <br data-mce-bogus="1">


A. P. James and B. V. Dasarathy, "Medical image fusion: A survey of the state of the art," Information fusion, vol. 19, pp. 4-19, 2014.

D. Nie, H. Zhang, E. Adeli, L. Liu, and D. Shen, "3D deep learning for multi-modal imaging-guided survival time prediction of brain tumor patients," in International conference on medical image computing and computer-assisted intervention, 2016: Springer, pp. 212-220.

Y. Liu, X. Chen, H. Peng, and Z. Wang, "Multi-focus image fusion with a deep convolutional neural network," Information Fusion, vol. 36, pp. 191-207, 2017.

H. Hermessi, O. Mourali, and E. Zagrouba, "Convolutional neural network-based multimodal image fusion via similarity learning in the shearlet domain," Neural Computing and Applications, vol. 30, no. 7, pp. 2029-2045, 2018.

T.-y. Zhang, Q. Zhou, H.-j. Feng, Z.-h. Xu, Q. Li, and Y.-t. Chen, "Fusion of infrared and visible light images based on nonsubsampled shearlet transform," in International Symposium on Photoelectronic Detection and Imaging 2013: Infrared Imaging and Applications, 2013, vol. 8907: International Society for Optics and Photonics, p. 89071H.

R. Hou, D. Zhou, R. Nie, D. Liu, and X. Ruan, "Brain CT and MRI medical image fusion using convolutional neural networks and a dual-channel spiking cortical model," Medical & biological engineering & computing, vol. 57, no. 4, pp. 887-900, 2019.

A. L. Da Cunha, J. Zhou, and M. N. Do, "The nonsubsampled contourlet transform: theory, design, and applications," IEEE transactions on image processing, vol. 15, no. 10, pp. 3089-3101, 2006.

S. Singh and R. S. Anand, "Multimodal neurological image fusion based on adaptive biological inspired neural model in nonsubsampled shearlet domain," International Journal of Imaging Systems and Technology, vol. 29, no. 1, pp. 50-64, 2019.

X. Liang, P. Hu, L. Zhang, J. Sun, and G. Yin, "MCFNet: Multi-layer concatenation fusion network for medical images fusion," IEEE Sensors Journal, vol. 19, no. 16, pp. 7107-7119, 2019.

L. Wang, J. Zhang, Y. Liu, J. Mi, and J. Zhang, "Multimodal medical image fusion based on Gabor representation combination of multi-CNN and fuzzy neural network," IEEE Access, vol. 9, pp. 67634-67647, 2021.

Y. Li, J. Zhao, Z. Lv, and Z. Pan, "Multimodal Medical Supervised Image Fusion Method by CNN," Frontiers in Neuroscience, vol. 15, 2021.

K. Guo, X. Li, X. Hu, J. Liu, and T. Fan, "Hahn-PCNN-CNN: an end-to-end multi-modal brain medical image fusion framework useful for clinical diagnosis," BMC Medical Imaging, vol. 21, no. 1, 2021.

X. Han, "MR‐based synthetic CT generation using a deep convolutional neural network method," Medical physics, vol. 44, no. 4, pp. 1408-1419, 2017.

I. Goodfellow et al., "Generative adversarial nets," Advances in neural information processing systems, vol. 27, 2014.

W. Tang, Y. Liu, C. Zhang, J. Cheng, H. Peng, and X. Chen, "Green fluorescent protein and phase-contrast image fusion via generative adversarial networks," Computational and Mathematical Methods in Medicine, vol. 2019, 2019.

J. Kang, W. Lu, and W. Zhang, "Fusion of brain PET and MRI images using tissue-aware conditional generative adversarial network with joint loss," IEEE Access, vol. 8, pp. 6368-6378, 2020.

Z. Le, J. Huang, F. Fan, X. Tian, and J. Ma, "A generative adversarial network for medical image fusion," in 2020 IEEE International Conference on Image Processing (ICIP), 2020: IEEE, pp. 370-374.

M. D. Zeiler, D. Krishnan, G. W. Taylor, and R. Fergus, "Deconvolutional networks," in 2010 IEEE Computer Society Conference on computer vision and pattern recognition, 2010: IEEE, pp. 2528-2535.

B. Wohlberg, "Endogenous convolutional sparse representations for translation invariant image subspace models," in 2014 IEEE International Conference on Image Processing (ICIP), 2014: IEEE, pp. 2859-2863.

D. Carrera, G. Boracchi, A. Foi, and B. Wohlberg, "Detecting anomalous structures by convolutional sparse models," in 2015 International Joint Conference on Neural Networks (IJCNN), 2015: IEEE, pp. 1-8.

S. Gu, W. Zuo, Q. Xie, D. Meng, X. Feng, and L. Zhang, "Convolutional sparse coding for image super-resolution," in Proceedings of the IEEE International Conference on Computer Vision, 2015, pp. 1823-1831.

B. Yang and S. Li, "Multifocus image fusion and restoration with sparse representation," IEEE transactions on Instrumentation and Measurement, vol. 59, no. 4, pp. 884-892, 2009.

Y. Liu, X. Chen, R. K. Ward, and Z. J. Wang, "Image fusion with convolutional sparse representation," IEEE signal processing letters, vol. 23, no. 12, pp. 1882-1886, 2016.

K.-j. Xia, H.-s. Yin, and J.-q. Wang, "A novel improved deep convolutional neural network model for medical image fusion," Cluster Computing, vol. 22, no. 1, pp. 1515-1527, 2019.

Y. Zhang, Y. Liu, P. Sun, H. Yan, X. Zhao, and L. Zhang, "IFCNN: A general image fusion framework based on convolutional neural network," Information Fusion, vol. 54, pp. 99-118, 2020.

H. Zhang, H. Xu, Y. Xiao, X. Guo, and J. Ma, "Rethinking the image fusion: A fast unified image fusion network based on proportional maintenance of gradient and intensity," in Proceedings of the AAAI Conference on Artificial Intelligence, 2020, vol. 34, no. 07, pp. 12797-12804.

H. Xu, J. Ma, Z. Le, J. Jiang, and X. Guo, "Fusiondn: A unified densely connected network for image fusion," in Proceedings of the AAAI Conference on Artificial Intelligence, 2020, vol. 34, no. 07, pp. 12484-12491.

H. Xu, J. Ma, J. Jiang, X. Guo, and H. Ling, "U2Fusion: A unified unsupervised image fusion network," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 1, pp. 502-518, 2020.




How to Cite

Wei, M., Xi, M., Li, Y., Liang, M., & Wang, G. (2023). Multimodal Medical Image Fusion: The Perspective of Deep Learning. Academic Journal of Science and Technology, 5(3), 202–208.