Advancements in Computer Vision: A Comprehensive Survey of Image Processing and Interdisciplinary Applications

Authors

  • Wen Gendy
  • Dularia Patel

DOI:

https://doi.org/10.54097/5e1cqw59

Keywords:

Computer vision, image processing, Machine Learning, Object Recognition, Interdisciplinary Innovation.

Abstract

Computer vision and image processing are rapidly evolving fields with broad applications across numerous domains, including healthcare, autonomous driving, surveillance, and entertainment. These fields have transformed from simple data recording techniques into sophisticated systems that incorporate digital image processing, pattern recognition, machine learning, and computer graphics. This evolution has prompted interdisciplinary interest, pushed the technology’s boundaries and expanded its practical uses. This paper offers a comprehensive survey of recent advancements in computer vision, focusing on image processing and its applications across various fields. It delves into the theoretical foundations and technologies that make computer vision a valuable tool for interpreting images and videos, extracting relevant information, recognizing patterns, and understanding events. The ability of computer vision to analyze large datasets across multiple application domains makes it instrumental in tasks such as object identification, facial recognition, scene understanding, and even real-time action prediction. This versatility has established computer vision as a key driver of data-driven insights in both scientific and commercial sectors. The study categorizes computer vision into four main areas: image processing, object recognition, machine learning, and computer graphics. Each of these categories is essential to the functionality of modern computer vision systems. Image processing involves techniques for enhancing image quality and extracting important features. Object recognition and machine learning enable the identification of specific elements within images and allow systems to learn from large datasets, enhancing accuracy over time. Computer graphics, on the other hand, aid in visualizing and interpreting processed data. By offering insights into the latest techniques and evaluating their performance, this survey highlights the current state of computer vision while shedding light on future trends. Computer vision’s expanding utility across various fields underscores its critical role in driving interdisciplinary innovation and addressing complex challenges.

Downloads

Download data is not yet available.

References

[1] Y. Wang, Y. Guo, R. Kumar, M. Swaminathan, Order Reduction Using Laguerre-FDTD with Embedded Neural Network, 2024 IEEE/MTT-S International Microwave Symposium-IMS 2024, IEEE, 2024, pp. 473-476.

[2] R. Li, S. Sun, M. Elhoseiny, P. Torr, OxfordTVG-HIC: Can Machine Make Humorous Captions from Images?, Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 20293-20303.

[3] X. Chen, K. He, W. Liu, X. Liu, Z.-J. Zha, T. Mei, CLaM: An Open-Source Library for Performance Evaluation of Text-driven Human Motion Generation, Proceedings of the 32nd ACM International Conference on Multimedia, 2024, pp. 11194-11197.

[4] X. Chen, W. Liu, X. Liu, Y. Zhang, T. Mei, A cross-modality and progressive person search system, Proceedings of the 28th ACM International Conference on Multimedia, 2020, pp. 4550-4552.

[5] X. Chen, X. Liu, K. Liu, W. Liu, T. Mei, A baseline framework for part-level action parsing and action recognition, arXiv preprint arXiv:2110.03368 (2021).

[6] X. Chen, X. Liu, W. Liu, K. Liu, D. Wu, Y. Zhang, T. Mei, Part-level Action Parsing via a Pose-guided Coarse-to-Fine Framework, 2022 IEEE International Symposium on Circuits and Systems (ISCAS), IEEE, 2022, pp. 419-423.

[7] M. Qu, X. Chen, W. Liu, A. Li, Y. Zhao, ChatVTG: Video Temporal Grounding via Chat with Video Dialogue Large Language Models, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 1847-1856.

[8] L. Yang, Z. Zhang, J. Han, B. Zeng, R. Li, P. Torr, W. Zhang, Semantic Score Distillation Sampling for Compositional Text-to-3D Generation, arXiv preprint arXiv:2410.09009 (2024).

[9] Z. Gui, S. Sun, R. Li, J. Yuan, Z. An, K. Roth, A. Prabhu, P. Torr, kNN-CLIP: Retrieval Enables Training-Free Segmentation on Continually Expanding Large Vocabularies, arXiv preprint arXiv:2404.09447 (2024).

[10] B. Wang, H. Duan, Y. Feng, X. Chen, Y. Fu, Z. Mo, X. Di, Can LLMs Understand Social Norms in Autonomous Driving Games?, arXiv preprint arXiv:2408.12680 (2024).

[11] H. Liu, X. Chen, X. Liu, X. Gu, W. Liu, AnimateAnywhere: Context-Controllable Human Video Generation with ID-Consistent One-shot Learning, Proceedings of the 5th International Workshop on Human-centric Multimedia Analysis, 2024, pp. 41-43.

[12] M. Yin, T. Li, H. Lei, Y. Hu, S. Rangan, Q. Zhu, Zero-Shot Wireless Indoor Navigation through Physics-Informed Reinforcement Learning, 2024 IEEE International Conference on Robotics and Automation (ICRA), IEEE, 2024, pp. 5111-5118.

[13] K. Huang, X. Chen, X. Di, Q. Du, Dynamic driving and routing games for autonomous vehicles on networks: A mean field game approach, Transportation Research Part C: Emerging Technologies 128 (2021) 103189.

[14] J. Huo, H. Li, J. Roveda, S.F. Quan, A. Li, A Multi-task Deep Learning Algorithm for Sleep Stage Scoring and Sleep Arousal Detection, Authorea Preprints (2023).

[15] H. Guo, A.B. Tikhomirov, A. Mitchell, I.P.J. Alwayn, H. Zeng, K.C. Hewitt, Real-time assessment of liver fat content using a filter-based Raman system operating under ambient light through lock-in amplification, Biomedical Optics Express 13(10) (2022) 5231-5245.

[16] H. Guo, B.L. Gala-Lopez, I.P. Alwayn, K.C. Hewitt, Liver discard rate due to conservative estimations of steatosis: an inference-based approach, medRxiv (2023) 2023.12. 04.23299406.

[17] J. Huo, Machine Learning Application in Sleep Disorder Analysis, The University of Arizona, 2023.

[18] C. Ding, T. Yao, C. Wu, J. Ni, Deep Learning for Personalized Electrocardiogram Diagnosis: A Review, arXiv preprint arXiv:2409.07975 (2024).

[19] J. Huo, S.F. Quan, J. Roveda, A. Li, BASH-GN: a new machine learning–derived questionnaire for screening obstructive sleep apnea, Sleep and Breathing 27(2) (2023) 449-457.

[20] J. Yang, Research on the propagation model of COVID-19 based on virus dynamics, Second International Conference on Biological Engineering and Medical Science (ICBioMed 2022), SPIE, 2023, pp. 962-967.

[21] J. Yang, Predicting water quality through daily concentration of dissolved oxygen using improved artificial intelligence, Scientific Reports 13(1) (2023) 20370.

[22] J. Huo, Y. Wang, N. Wang, W. Gao, J. Zhou, Y. Cao, Data-driven design and optimization of ultra-tunable acoustic metamaterials, Smart Materials and Structures 32(5) (2023) 05LT01.

[23] Y. Guo, O.W. Bhatti, M. Swaminathan, Training Set Optimization with Uncertainty Quantification for Machine Learning Models of Electromagnetic Structures, 2022 IEEE Electrical Design of Advanced Packaging and Systems (EDAPS), IEEE, 2022, pp. 1-3.

[24] H. Guo, A.E. Stueck, J.B. Doppenberg, Y.S. Chae, A.B. Tikhomirov, H. Zeng, M.A. Engelse, B.L. Gala-Lopez, A. Mahadevan-Jansen, I.P. Alwayn, Evaluation of minimum-to-severe global and macrovesicular steatosis in human liver specimens: a portable ambient light-compatible spectroscopic probe, medRxiv (2023) 2023.12. 04.23299259.

[25] H. Guo, V.S. Zions, B.A. Law, K.C. Hewitt, Potential of Raman‐Reflectance Combination in Quantifying Liver Steatosis and Fat Droplet Size: Evidence From Monte Carlo Simulations and Phantom Studies, Journal of Biophotonics (2024) e202400156.

[26] H. Guo, A.E. Stueck, J.B. Doppenberg, Y.S. Chae, A.B. Tikhomirov, H. Zeng, B.L. Gala-Lopez, A. Mahadevan-Jansen, M.A. Engelse, I.P. Alwayn, Assessment of liver steatosis using an ambient light-compatible Raman system: enhancing specificity with supplementary reflectance information, Biomedical Vibrational Spectroscopy 2024: Advances in Research and Industry, SPIE, 2024, p. PC128390B.

[27] H. Guo, A.E. Stueck, A.B. Tikhomirov, H. Zeng, I.P. Alwayn, B.L. Gala-Lopez, A. Mahadevan-Jansen, A.K. Locke, K.C. Hewitt, Evaluation of Steatosis in Human Liver Specimens Using an Ambient Light-compatible Raman Spectroscopy Approach, Bio-Optics: Design and Application, Optica Publishing Group, 2023, p. JTu4B. 26.

[28] H. Guo, A.E. Stueck, J.B. Doppenberg, Y.S. Chae, A.B. Tikhomirov, H. Zeng, M.A. Engelse, B.L. Gala‐Lopez, A. Mahadevan‐Jansen, I.P. Alwayn, Evaluation of Minimum‐To‐Severe Global and Macrovesicular Steatosis in Human Liver Specimens: A Portable Ambient Light‐Compatible Spectroscopic Probe, Journal of Biophotonics (2023) e202400292.

[29] Z. Shou, X. Chen, Y. Fu, X. Di, Multi-agent reinforcement learning for Markov routing games: A new modeling paradigm for dynamic traffic assignment, Transportation Research Part C: Emerging Technologies 137 (2022) 103560.

[30] S. Liu, Y. Wang, X. Chen, Y. Fu, X. Di, SMART-eFlo: An integrated SUMO-gym framework for multi-agent reinforcement learning in electric fleet management problem, 2022 IEEE 25th International Conference on Intelligent Transportation Systems (ITSC), IEEE, 2022, pp. 3026-3031.

[31] X. Chen, S. Liu, X. Di, A hybrid framework of reinforcement learning and physics-informed deep learning for spatiotemporal mean field games, In Proceedings of the 20th International Conference on Autonomous Agents and Multiagent Systems, ACM DIgital Library, 2023.

[32] F. Zhou, C. Zhang, X. Chen, X. Di, Graphon Mean Field Games with A Representative Player: Analysis and Learning Algorithm, arXiv preprint arXiv:2405.08005 (2024).

[33] X. Chen, S. Liu, X. Di, Learning Dual Mean Field Games on Graphs, ECAI, 2023, pp. 421-428.

[34] S. Liu, X. Chen, X. Di, Scalable Learning for Spatiotemporal Mean Field Games Using Physics-Informed Neural Operator, Mathematics 12(6) (2024) 803.

[35] X. Chen, Z. Li, X. Di, Social learning in Markov games: Empowering autonomous driving, 2022 IEEE Intelligent Vehicles Symposium (IV), IEEE, 2022, pp. 478-483.

[36] X. Chen, X. Di, Z. Li, Social Learning for Sequential Driving Dilemmas, Games 14(3) (2023) 41.

[37] Z. Hu, Y. Sun, Y. Yang, Switch to generalize: Domain-switch learning for cross-domain few-shot classification, International Conference on Learning Representations, 2022.

[38] Z. Hu, Y. Sun, Y. Yang, Suppressing the heterogeneity: A strong feature extractor for few-shot segmentation, The Eleventh International Conference on Learning Representations, 2023.

[39] X. Chen, X. Liu, W. Liu, X.-P. Zhang, Y. Zhang, T. Mei, Explainable person re-identification with attribute-guided metric distillation, Proceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 11813-11822.

[40] Z. Hu, Y. Sun, Y. Yang, J. Zhou, Divide-and-regroup clustering for domain adaptive person re-identification, Proceedings of the AAAI Conference on Artificial Intelligence, 2022, pp. 980-988.

[41] X. Chen, W. Liu, X. Liu, Y. Zhang, J. Han, T. Mei, MAPLE: Masked pseudo-labeling autoencoder for semi-supervised point cloud action recognition, Proceedings of the 30th ACM International Conference on Multimedia, 2022, pp. 708-718.

[42] Z. Hu, Y. Sun, J. Wang, Y. Yang, DAC-DETR: Divide the attention layers and conquer, Advances in Neural Information Processing Systems 36 (2024).

[43] X. Chen, W. Liu, Q. Bao, X. Liu, Q. Yang, R. Dai, T. Mei, Motion Capture from Inertial and Vision Sensors, arXiv preprint arXiv:2407.16341 (2024).

[44] Z. Hu, J. Ye, Y. Zhang, X. Wang, Seeing is Not Always Believing: An Empirical Analysis of Fake Evidence Generators, 2024 IEEE 9th European Symposium on Security and Privacy (EuroS&P), IEEE, 2024, pp. 560-579.

[45] Y. Zhang, Z. Hu, X. Wang, Y. Hong, Y. Nan, X. Wang, J. Cheng, L. Xing, Navigating the Privacy Compliance Maze: Understanding Risks with {Privacy-Configurable} Mobile {SDKs}, 33rd USENIX Security Symposium (USENIX Security 24), 2024, pp. 6543-6560.

[46] M. Yin, Data security and privacy preservation in big data age, 2nd International Conference on Mechatronics Engineering and Information Technology (ICMEIT 2017), Atlantis Press, 2017, pp. 387-391.

[47] M. Yin, A.K. Veldanda, A. Trivedi, J. Zhang, K. Pfeiffer, Y. Hu, S. Garg, E. Erkip, L. Righetti, S. Rangan, Millimeter wave wireless assisted robot navigation with link state classification, IEEE Open Journal of the Communications Society 3 (2022) 493-507.

[48] V. Semkin, M. Yin, Y. Hu, M. Mezzavilla, S. Rangan, Drone detection and classification based on radar cross section signatures, 2020 International Symposium on Antennas and Propagation (ISAP), IEEE, 2021, pp. 223-224.

[49] M. Yin, Millimeter Wave Wireless Assisted Indoor Robot Navigation, New York University Tandon School of Engineering, 2024.

[50] K. Pfeiffer, Y. Jia, M. Yin, A.K. Veldanda, Y. Hu, A. Trivedi, J. Zhang, S. Garg, E. Erkip, S. Rangan, Path planning under uncertainty to localize mmWave sources, 2023 IEEE International Conference on Robotics and Automation (ICRA), IEEE, 2023, pp. 3461-3467.

[51] Y. Hu, M. Yin, W. Xia, S. Rangan, M. Mezzavilla, Multi-frequency channel modeling for millimeter wave and thz wireless communication via generative adversarial networks, 2022 56th Asilomar Conference on Signals, Systems, and Computers, IEEE, 2022, pp. 670-676.

[52] S. Cao, J. Xiao, Human-Robot Complementary Collaboration for Flexible and Precision Assembly, 2024 IEEE International Conference on Robotics and Automation (ICRA), IEEE, 2024, pp. 12971-12977.

[53] T. Kosch, J. Karolus, J. Zagermann, H. Reiterer, A. Schmidt, P.W. Woźniak, A survey on measuring cognitive workload in human-computer interaction, ACM Computing Surveys 55(13s) (2023) 1-39.

[54] L. Liu, W. Ouyang, X. Wang, P. Fieguth, J. Chen, X. Liu, M. Pietikäinen, Deep learning for generic object detection: A survey, International journal of computer vision 128 (2020) 261-318.

[55] E. Imani, G. Zhang, R. Li, J. Luo, P. Poupart, P.H. Torr, Y. Pan, Label Alignment Regularization for Distribution Shift, Journal of Machine Learning Research 25(247) (2024) 1-32.

[56] Y. Guo, X. Li, M. Swaminathan, 2D spectral transposed convolutional neural network for S-parameter predictions, 2022 IEEE 31st Conference on Electrical Performance of Electronic Packaging and Systems (EPEPS), IEEE, 2022, pp. 1-3.

[57] M. Swaminathan, O.W. Bhatti, Y. Guo, E. Huang, O. Akinwande, Bayesian learning for uncertainty quantification, optimization, and inverse design, IEEE Transactions on Microwave Theory and Techniques 70(11) (2022) 4620-4634.

[58] Y. Fu, A. Jain, X. Di, X. Chen, Z. Mo, DriveGenVLM: Real-world Video Generation for Vision Language Model based Autonomous Driving, arXiv preprint arXiv:2408.16647 (2024).

[59] X. Chen, F. Yongjie, S. Liu, X. Di, Physics-informed neural operator for coupled forward-backward partial differential equations, 1st Workshop on the Synergy of Scientific and Machine Learning Modeling@ ICML2023, 2023.

[60] S. Sun, R. Li, P. Torr, X. Gu, S. Li, Clip as rnn: Segment countless visual concepts without training endeavor, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 13171-13182.

Downloads

Published

29-11-2024

Issue

Section

Articles

How to Cite

Gendy, W., & Patel, D. (2024). Advancements in Computer Vision: A Comprehensive Survey of Image Processing and Interdisciplinary Applications. Academic Journal of Science and Technology, 13(2), 28-34. https://doi.org/10.54097/5e1cqw59