A Review of Multimodal Medical Image Classification Based on Deep Learning

Jiahao Song

doi:10.54097/011w6454

Authors

Jiahao Song

DOI:

https://doi.org/10.54097/011w6454

Keywords:

Multimodal fusion; Neural networks; Medical image classification.

Abstract

Medical imaging plays an important role in the field of modern medicine. It provides key information about the internal structure and biological activities of the human body for clinical diagnosis and treatment. However, single-modality medical imaging is limited by the imaging principle and is difficult to fully present the characteristics of specific organs or lesions, which restricts the accuracy and comprehensiveness of clinical diagnosis. Multimodal medical image fusion technology can more comprehensively and accurately reflect the characteristics of lesions by integrating the complementary information of different imaging modalities. In recent years, it has become a research hotspot in the field of medical image analysis. In this paper, model-based and model-independent multimodal fusion methods are first introduced, and then the most popular neural network model and its application in multimodal medical images are elaborated in detail. Finally, the future development trend of multimodal medical image classification is prospected.

References

[1]G. Muhammad, F. Alshehri, F. Karray, A. E. Saddik, M. Alsulaiman, and T. H. Falk, “A comprehensive survey on multimodal medical signals fusion for smart healthcare systems,” Information Fusion, vol. 76, pp. 355–375, Dec. 2021, doi: 10.1016/j.inffus.2021.06.007.

[2]M. A. Azam et al., “A review on multimodal medical image fusion: Compendious analysis of medical modalities, multimodal databases, fusion techniques and quality metrics,” Computers in Biology and Medicine, vol. 144, p. 105253, May 2022, doi: 10.1016/j.compbiomed.2022.105253.

[3]Y. Li et al., “A review of deep learning-based information fusion techniques for multimodal medical image classification,” Computers in Biology and Medicine, vol. 177, p. 108635, Jul. 2024, doi: 10.1016/j.compbiomed.2024.108635.

[4]J. Lipkova et al., “Artificial intelligence for multimodal data integration in oncology,” Cancer Cell, vol. 40, no. 10, pp. 1095–1110, Oct. 2022, doi: 10.1016/j.ccell.2022.09.012.

[5]H. Hermessi, O. Mourali, and E. Zagrouba, “Multimodal medical image fusion review: Theoretical background and recent advances,” Signal Processing, vol. 183, p. 108036, Jun. 2021, doi: 10.1016/j.sigpro.2021.108036.

[6]R. Girshick, “Fast R-CNN,” in 2015 IEEE International Conference on Computer Vision (ICCV), 2015, pp. 1440–1448. doi: 10.1109/ICCV.2015.169.

[7]S. Soni, S. S. Chouhan, and S. S. Rathore, “TextConvoNet: A convolutional neural network based architecture for text classification,” Applied Intelligence, vol. 53, no. 11, pp. 14249–14268, 2023.

[8]Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, 1998, doi: 10.1109/5.726791.

[9]K. Fukushima, “Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position,” Biol. Cybernetics, vol. 36, no. 4, pp. 193–202, Apr. 1980, doi: 10.1007/BF00344251.

[10]M. Khalid, J. Baber, M. K. Kasi, M. Bakhtyar, V. Devi, and N. Sheikh, “Empirical Evaluation of Activation Functions in Deep Convolution Neural Network for Facial Expression Recognition,” in 2020 43rd International Conference on Telecommunications and Signal Processing (TSP), Jul. 2020, pp. 204–207. doi: 10.1109/TSP49548.2020.9163446.

[11]T. Li, F. Zhang, G. Xie, X. Fan, Y. Gao, and M. Sun, “A high speed reconfigurable architecture for softmax and GELU in vision transformer,” Electronics Letters, vol. 59, no. 5, p. e12751, Mar. 2023, doi: 10.1049/ell2.12751.

[12]J. Hyun, H. Seong, and E. Kim, “Universal pooling – A new pooling method for convolutional neural networks,” Expert Systems with Applications, vol. 180, p. 115084, Oct. 2021, doi: 10.1016/j.eswa.2021.115084.

[13]J. Zhang, X. He, Y. Liu, Q. Cai, H. Chen, and L. Qing, “Multi-modal cross-attention network for Alzheimer’s disease diagnosis with multi-modality data,” Computers in Biology and Medicine, vol. 162, p. 107050, Aug. 2023, doi: 10.1016/j.compbiomed.2023.107050.

[14]T. Zhang and M. Shi, “Multi-modal neuroimaging feature fusion for diagnosis of Alzheimer’s disease,” Journal of Neuroscience Methods, vol. 341, p. 108795, Jul. 2020, doi: 10.1016/j.jneumeth.2020.108795.

[15]Z. Kong, M. Zhang, W. Zhu, Y. Yi, T. Wang, and B. Zhang, “Multi-modal data Alzheimer’s disease detection based on 3D convolution,” Biomedical Signal Processing and Control, vol. 75, p. 103565, May 2022, doi: 10.1016/j.bspc.2022.103565.

[16]W. N. Ismail, F. R. P. P., and M. A. S. Ali, “A Meta-Heuristic Multi-Objective Optimization Method for Alzheimer’s Disease Detection Based on Multi-Modal Data,” Mathematics, vol. 11, no. 4, Art. no. 4, Jan. 2023, doi: 10.3390/math11040957.

[17]A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet Classification with Deep Convolutional Neural Networks”.

[18]C. Szegedy et al., “Going Deeper with Convolutions,” Sep. 17, 2014, arXiv: arXiv:1409.4842. doi: 10.48550/arXiv.1409.4842.

[19]K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for Image Recognition,” in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA: IEEE, Jun. 2016, pp. 770–778. doi: 10.1109/CVPR.2016.90.

[20]C. Ge, I. Y.-H. Gu, A. S. Jakola, and J. Yang, “Deep Learning and Multi-Sensor Fusion for Glioma Classification Using Multistream 2D Convolutional Networks,” in 2018 40th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), Jul. 2018, pp. 5894–5897. doi: 10.1109/EMBC.2018.8513556.

[21]A. Puente-Castro, E. Fernandez-Blanco, A. Pazos, and C. R. Munteanu, “Automatic assessment of Alzheimer’s disease diagnosis based on deep learning techniques,” Computers in Biology and Medicine, vol. 120, p. 103764, May 2020, doi: 10.1016/j.compbiomed.2020.103764.

[22]Y. Tu, S. Lin, J. Qiao, Y. Zhuang, and P. Zhang, “Alzheimer’s disease diagnosis via multimodal feature fusion,” Computers in Biology and Medicine, vol. 148, p. 105901, Sep. 2022, doi: 10.1016/j.compbiomed.2022.105901.

[23]Y. Li et al., “Multimodal Information Fusion for Glaucoma and Diabetic Retinopathy Classification,” in Ophthalmic Medical Image Analysis, Springer, Cham, 2022, pp. 53–62. doi: 10.1007/978-3-031-16525-2_6.

[24]T. K. Yoo et al., “DeepPDT-Net: predicting the outcome of photodynamic therapy for chronic central serous chorioretinopathy using two-stage multimodal transfer learning,” Sci Rep, vol. 12, no. 1, p. 18689, Nov. 2022, doi: 10.1038/s41598-022-22984-6.

[25][25] X. Huang et al., “Detecting glaucoma from multi-modal data using probabilistic deep learning,” Front Med (Lausanne), vol. 9, p. 923096, 2022, doi: 10.3389/fmed.2022.923096.

[26]X. Qian et al., “A combined ultrasonic B-mode and color Doppler system for the classification of breast masses using neural network,” Eur Radiol, vol. 30, no. 5, pp. 3023–3033, May 2020, doi: 10.1007/s00330-019-06610-0.

[27]M. H. Le et al., “Automated diagnosis of prostate cancer in multi-parametric MRI based on multimodal convolutional neural networks,” Phys Med Biol, vol. 62, no. 16, pp. 6497–6514, Jul. 2017, doi: 10.1088/1361-6560/aa7731.

[28]Z. Ge, S. Demyanov, R. Chakravorty, A. Bowling, and R. Garnavi, “Skin Disease Recognition Using Deep Saliency Features and Multimodal Learning of Dermoscopy and Clinical Images,” in Medical Image Computing and Computer Assisted Intervention − MICCAI 2017, M. Descoteaux, L. Maier-Hein, A. Franz, P. Jannin, D. L. Collins, and S. Duchesne, Eds., Cham: Springer International Publishing, 2017, pp. 250–258. doi: 10.1007/978-3-319-66179-7_29.

[29]S. Li, Y. Xie, G. Wang, L. Zhang, and W. Zhou, “Attention guided discriminative feature learning and adaptive fusion for grading hepatocellular carcinoma with Contrast-enhanced MR,” Computerized Medical Imaging and Graphics, vol. 97, p. 102050, Apr. 2022, doi: 10.1016/j.compmedimag.2022.102050.

[30]C. Tan, F. Sun, T. Kong, W. Zhang, C. Yang, and C. Liu, “A survey on deep transfer learning,” in Artificial Neural Networks and Machine Learning–ICANN 2018: 27th International Conference on Artificial Neural Networks, Rhodes, Greece, October 4-7, 2018, Proceedings, Part III 27, Springer, 2018, pp. 270–279.

[31]Y. Yang et al., “Glioma grading on conventional MR images: a deep learning study with transfer learning,” Frontiers in neuroscience, vol. 12, p. 804, 2018.

[32]A. Vaswani, “Attention is all you need,” Advances in Neural Information Processing Systems, 2017.

[33]I. Sutskever, “Sequence to Sequence Learning with Neural Networks,” arXiv preprint arXiv:1409.3215, 2014.

[34]A. Dosovitskiy, “An image is worth 16x16 words: Transformers for image recognition at scale,” arXiv preprint arXiv:2010.11929, 2020.

[35]Y. Lyu, X.-W. Yu, D. Zhu, and L. Zhang, “Classification of Alzheimer’s Disease via Vision Transformer: Classification of Alzheimer’s Disease via Vision Transformer,” Petra, 2022, Accessed: Feb. 24, 2025. [Online]. Available: https://www.semanticscholar.org/paper/Classification-of-Alzheimer%27s-Disease-via-Vision-of-Lyu-Yu/9da3fadf092c864f61d6fd1e8eab5a6ca2397194

[36]J. Zhu et al., “Efficient self-attention mechanism and structural distilling model for Alzheimer’s disease diagnosis,” Computers in Biology and Medicine, vol. 147, p. 105737, Aug. 2022, doi: 10.1016/j.compbiomed.2022.105737.

[37]J. Jang and D. Hwang, “M3T: three-dimensional Medical image classifier using Multi-plane and Multi-slice Transformer,” in 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2022, pp. 20686–20697. doi: 10.1109/CVPR52688.2022.02006.

[38]L. Qiu et al., “Hierarchical multimodal fusion framework based on noisy label learning and attention mechanism for cancer classification with pathology and genomic features,” Computerized Medical Imaging and Graphics, vol. 104, p. 102176, Mar. 2023, doi: 10.1016/j.compmedimag.2022.102176.

[39]Y. Dai, Y. Gao, and F. Liu, “TransMed: Transformers Advance Multi-Modal Medical Image Classification,” Diagnostics, vol. 11, no. 8, Art. no. 8, Aug. 2021, doi: 10.3390/diagnostics11081384.

[40]Y. Zhang, Y. Deng, Z. Zhou, X. Zhang, P. Jiao, and Z. Zhao, “Multimodal learning for fetal distress diagnosis using a multimodal medical information fusion framework,” Front Physiol, vol. 13, p. 1021400, 2022, doi: 10.3389/fphys.2022.1021400.

[41]T. N. Kipf and M. Welling, “Semi-Supervised Classification with Graph Convolutional Networks,” Feb. 22, 2017, arXiv: arXiv:1609.02907. doi: 10.48550/arXiv.1609.02907.

[42]T. Xing, Y. Dou, X. Chen, J. Zhou, X. Xie, and S. Peng, “An adaptive multi-graph neural network with multimodal feature fusion learning for MDD detection,” Sci Rep, vol. 14, no. 1, p. 28400, Nov. 2024, doi: 10.1038/s41598-024-79981-0.

[43]S. Liu, S. Wang, C. Sun, B. Li, S. Wang, and F. Li, “DeepGCN based on variable multi-graph and multimodal data for ASD diagnosis,” CAAI Transactions on Intelligence Technology, vol. 9, no. 4, pp. 879–893, 2024, doi: 10.1049/cit2.12340.

[44]R. Xu, Q. Zhu, S. Li, Z. Hou, W. Shao, and D. Zhang, “MSTGC: Multi-Channel Spatio-Temporal Graph Convolution Network for Multi-Modal Brain Networks Fusion,” IEEE Transactions on Neural Systems and Rehabilitation Engineering, vol. 31, pp. 2359–2369, 2023, doi: 10.1109/TNSRE.2023.3275608.

[45]X. Chen et al., “Discriminative analysis of schizophrenia patients using graph convolutional networks: A combined multimodal MRI and connectomics analysis,” Front. Neurosci., vol. 17, Mar. 2023, doi: 10.3389/fnins.2023.1140801.

[46]X. Tian, Y. Liu, L. Wang, X. Zeng, Y. Huang, and Z. Wang, “An extensible hierarchical graph convolutional network for early Alzheimer’s disease identification,” Computer Methods and Programs in Biomedicine, vol. 238, p. 107597, Aug. 2023, doi: 10.1016/j.cmpb.2023.107597.

[47]Y. Zhang, X. He, Y. H. Chan, Q. Teng, and J. C. Rajapakse, “Multi-modal graph neural network for early diagnosis of Alzheimer’s disease from sMRI and PET scans,” Computers in Biology and Medicine, vol. 164, p. 107328, Sep. 2023, doi: 10.1016/j.compbiomed.2023.107328.

[48]F. Li et al., “Developing a Dynamic Graph Network for Interpretable Analysis of Multi-Modal MRI Data in Parkinson’s Disease Diagnosis,” in 2023 45th Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC), Jul. 2023, pp. 1–4. doi: 10.1109/EMBC40787.2023.10340672.

[49]R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra, “Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization,” Int J Comput Vis, vol. 128, no. 2, pp. 336–359, Feb. 2020, doi: 10.1007/s11263-019-01228-7.