Enhanced Knowledge Distillation via Parameter Re-definition

Ziquan Wang; Yulin Zhao; Lidong Cheng

doi:10.54097/hset.v39i.6765

Authors

Ziquan Wang
Yulin Zhao
Lidong Cheng

DOI:

https://doi.org/10.54097/hset.v39i.6765

Keywords:

Knowledge Transfer; Renyi-divergence; Knowledge Distillation.

Abstract

Due to the high scalability of deep learning and its ability to manipulate large-scale hyperparameters, it has achieved great success in many fields. However, encoding such a large-scale data set is ultimately at the cost of expensive computing power and storage resources, which has also prompted model compression and model acceleration to become a hot topic in recent years. Model pruning, weight decomposition, reduction of model accuracy, weight sharing, etc. are all currently popular solutions, but they have a common problem that they cannot ensure that the compressed model is as good as the original model, and they are all based on the original model. to modify. This paper draws on the method based on knowledge distillation, introduces the concept of Renyi-divergence popularized by KL-divergence, and proposes a loss function that has been based on Renyi-divergence distance metric, and uses the rigor of the student network as a hyperparameter. A student network model that minimizes the loss function under rigor. We validated our results on ResNets using the cifar-10, cifar-100, and imagenet datasets. It improved the basic model by 0.6%, and the absolute gain of Top-l accuracy exceeded 1.6%.

Downloads

Download data is not yet available.

References

[AHD+19] Sungsoo Ahn, Shell Xu Hu, Andreas Damianou, Neil D. Lawrence, and Zhenwen Dai. Vari- ational information distillation for knowledge transfer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2019.

[CRCZ20] Xu Cheng, Zhefan Rao, Yilan Chen, and Quanshi Zhang. Explaining knowledge distillation by quantifying the knowledge. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2020.

[DJW+21] Xing Dai, Zeren Jiang, Zhao Wu, Yiping Bao, Zhicheng Wang, Si Liu, and Erjin Zhou. Gen- eral instance distillation for object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 7842–7851, June 2021.

[Gre93] George D. Greenwade. The Comprehensive Tex Archive Network (CTAN). TUGBoat, 14(3): 342–351, 1993.

[HVD15] Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531, 2015.

[LGG+17] Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollar. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Oct 2017.

[LKD+17] Hao Li, Asim Kadav, Igor Durdanovic, Hanan Samet, and Hans Peter Graf. Pruning filters for eﬀicient convnets. In ICLR, 2017.

[LWF+20] Xiaojie Li, Jianlong Wu, Hongyu Fang, Yue Liao, Fei Wang, and Chen Qian. Local correla- tion consistency for knowledge distillation. In European Conference on Computer Vision, pages 18–33. Springer, 2020.

[PT18] Nikolaos Passalis and Anastasios Tefas. Learning deep representations with probabilistic knowledge transfer. In ECCV, 2018.

[QKG+21] Lu Qi, Jason Kuen, Jiuxiang Gu, Zhe Lin, Yi Wang, Yukang Chen, Yanwei Li, and Ji- aya Jia. Multi-scale aligned distillation for low-resolution detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 14443–14453, June 2021.

[RDS+15] Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhi- heng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C Berg, and Fei-Fei Li. Imagenet large scale visual recognition challenge. International Journal of Computer Vision, 2015.

[RHGS15] Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. Faster r-cnn: Towards real-time object detection with region proposal networks. In C. Cortes, N. D. Lawrence, D. D. Lee.

M. Sugiyama, and R. Garnett, editors, Advances in Neural Information Processing Systems 28, pages 91–99. Curran Associates, Inc., 2015.

[TM19] Frederick Tung and Greg Mori. Similarity-preserving knowledge distillation. In ICCV, 2019.