A Review of Model Lightweighting Techniques Based on Edge Computing

Jinrui Liu

doi:10.54097/b8xdxw07

Authors

Jinrui Liu

DOI:

https://doi.org/10.54097/b8xdxw07

Keywords:

Model Lightweighting Edge Computing Resource-Constrained Devices.

Abstract

With the booming development of the Internet of Things (IoT) and the wide application of artificial intelligence at the edge, it has become an inevitable trend for deep learning models to be deployed on resource-constrained edge devices. Model lightweighting techniques have made significant progress in the past five years as a solution to the problem of limited computation, storage and power consumption resources in edge devices. In this paper, we systematically review the research on model lightweighting techniques for edge computing in the past five years (2020-2025). In this paper, the mainstream lightweighting methods are classified into two categories: parametric compression and structural compression based on the core idea of compression. Parameter compression focuses on reducing redundant parameters, and focuses on the latest breakthroughs in the directions of parameter pruning, parameter quantization, and parameter sharing, which significantly reduce the model size and computation volume while effectively balancing accuracy and hardware friendliness. Structural compression techniques, on the other hand, focus on designing more efficient network architectures, with an in-depth discussion of the core ideas and representative work on compact network design and knowledge distillation. Then, examples are given on the application scenarios and key challenges of lightweight models. Finally, future research directions for the above challenges are envisioned.

Downloads

Download data is not yet available.

References

[1] Y. Cheng, D. Wang, P. Zhou, and T. Zhang, "A Survey of Model Compression and Acceleration for Deep Neural Networks," arXiv:1710.09282, Jun. 14, 2020. doi: 10.48550/arXiv.1710.09282.

[2] H.-I. Liu et al., "Lightweight Deep Learning for Resource-Constrained Environments: A Survey," ACM Comput. Surv., vol. 56, no. 10, pp. 1–42, Oct. 2024, doi: 10.1145/3657282. DOI: https://doi.org/10.1145/3657282

[3] Z. Mariet and S. Sra, "Diversity Networks: Neural Network Compression Using Determinantal Point Processes," arXiv:1511.05077, Apr. 18, 2017. doi: 10.48550/arXiv.1511.05077.

[4] H. N. Karimah, C. Lee, and Y. Seo, "Batchnorm-Free Binarized Deep Spiking Neural Network for a Lightweight Machine Learning Model," Electronics, vol. 14, no. 8, p. 1602, 2025. [Online]. Available: https://www.mdpi.com/2079-9292/14/8/1602 DOI: https://doi.org/10.3390/electronics14081602

[5] H. Sahbi, "Coarse-to-Fine Pruning of Graph Convolutional Networks for Skeleton-based Recognition," in 2024 International Conference on Content-Based Multimedia Indexing (CBMI), New York: IEEE, 2024, pp. 79–85. doi: 10.1109/CBMI62980.2024.10859230. DOI: https://doi.org/10.1109/CBMI62980.2024.10859230

[6] L. Wei, L. Guanghui, D. Chenglong, and Z. Feifei, "Two-stage filter pruning incorporating cosine-spatial correlation," J. Image Graph., vol. 29, no. 12, pp. 3628–3643, 2024. DOI: https://doi.org/10.11834/jig.230592

[7] X. Xing et al., "Bipft: Binary pre-trained foundation transformer with low-rank estimation of binarization residual polynomials," in Proceedings of the AAAI Conference on Artificial Intelligence, 2024, pp. 16094–16102. [Online]. Available: https://ojs.aaai.org/index.php/AAAI/article/view/29542 DOI: https://doi.org/10.1609/aaai.v38i14.29542

[8] M. Becerra-Rozas, B. Crawford, R. Soto, E.-G. Talbi, and J. M. Gómez-Pulido, "Challenging the Limits of Binarization: A New Scheme Selection Policy Using Reinforcement Learning Techniques for Binary Combinatorial Problem Solving," Biomimetics, vol. 9, no. 2, p. 89, 2024. [Online]. Available: https://www.mdpi.com/2313-7673/9/2/89 DOI: https://doi.org/10.3390/biomimetics9020089

[9] Y. Wang, C. Xu, C. Xu, and D. Tao, "Beyond filters: Compact feature map for portable deep model," in International Conference on Machine Learning, PMLR, 2017, pp. 3703–3711. [Online]. Available: http://proceedings.mlr.press/v70/wang17m.html

[10] F. Li, M. Chi, D. Wu, and J. Niu, "Hierarchical Parameter Sharing in Recursive Neural Networks with Long Short-Term Memory," in Neural Information Processing, vol. 10635, D. Liu, S. Xie, Y. Li, D. Zhao, and E.-S. M. El-Alfy, Eds. Cham: Springer, 2017, pp. 582–592. doi: 10.1007/978-3-319-70096-0_60. DOI: https://doi.org/10.1007/978-3-319-70096-0_60

[11] S. Chaudhary et al., "COMPACT: Content-aware Multipath Live Video Streaming for Online Classes using Video Tiles," in Proceedings of the 16th ACM Multimedia Systems Conference (MMSYS 2025), 2025, pp. 201–213. doi: 10.1145/3712676.3714451. DOI: https://doi.org/10.1145/3712676.3714451

[12] A. Chen, J. Hu, H. Ma, Y. Jiang, and B. Yu, "Differentiable Distribution Model of Stochastic Volatile Memristor-Based Neuron," IEEE Trans. Electron Devices, vol. 72, no. 4, pp. 1709–1714, Apr. 2025, doi: 10.1109/TED.2025.3538660. DOI: https://doi.org/10.1109/TED.2025.3538660

[13] Z. Wen, J. Liu, H. Zhao, and Q. Wang, "A triple semantic-aware knowledge distillation network for industrial defect detection," Comput. Ind., vol. 166, Apr. 2025, doi: 10.1016/j.compind.2025.104252. DOI: https://doi.org/10.1016/j.compind.2025.104252