Investigating Lightweight Transformer Models for Defect Detection


  • Hanyun Wang



Image processing; Anomaly detection; Vision Transformer; Quantization; Pruning.


In industrial production, product defect detection is vital for quality control. Traditional manual inspection is inefficient and error-prone. Deep learning, particularly in image processing, has enabled computer-based automated defect detection. This paper proposes a Visual Transformer-based model to overcome limitations in industrial anomaly detection. Leveraging pretrained Vision Transformer and Point Transformer models, it extracts features from RGB images and point cloud data. Multimodal feature fusion enhances anomaly perception, with residual connections mitigating feature loss. On the MVTec AD dataset, it achieves 96.3% AU PRO for anomaly detection and 99.3% Pixel ROCAUC for anomaly segmentation. To enable deployment on devices like Raspberry Pi, the paper introduces a lightweight model via post-training quantization and pruning. This results in a 28.52% inference speedup with only a 1.08% average detection accuracy drop, facilitating practical industrial applications on compact devices.


Download data is not yet available.


The Overall Analysis Report on the Reasons for Multiple Recalls by Major Automotive Brands in Recent Times by the Product Quality and Safety Research Center of this journal. [J]. China Brands and Anti-Counterfeiting, 2022, 06): 76.

Huang Renbin, Zhan Daohua, Yang Xiuding, et al. Defect Detection Algorithm for Strip Steel Surface Based on Weighted Multiscale Feature Fusion [J]. Computer Integrated Manufacturing Systems, 2023, 1-17.

KIRILLOV A, MINTUN E, RAVI N, et al. Segment Anything [J/OL] 2023, arXiv:2304.02643[

REN S, HE K, GIRSHICK R, et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks [J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2017, 39(6): 1137-1149.

YANG H, CHEN Y, SONG K, et al. Multiscale Feature-Clustering-Based Fully Convolutional Autoencoder for Fast Accurate Visual Inspection of Texture Surface Defects [J]. IEEE Transactions on Automation Science and Engineering, 2019, 16(3): 1450-1467.

RUFF L, VANDERMEULEN R A, G RNITZ N, et al. Deep One-Class Classification [C]. In: International Conference on Machine Learning. 2018.

WU P, LIU J, SHEN F. A Deep One-Class Neural Network for Anomalous Event Detection in Complex Scenes [J]. IEEE Transactions on Neural Networks and Learning Systems, 2020, 31(7): 2609-2622.

CARON M, TOUVRON H, MISRA I, et al. Emerging Properties in Self-Supervised Vision Transformers [J/OL] 2021, arXiv:2104.14294[

PANG Y, WANG W, TAY F E H, et al. Masked Autoencoders for Point Cloud Self-supervised Learning [J/OL] 2022, arXiv:2203.06604[

Zhao Kailin, Jin Xiaolong, Wang Yuanzhuo. "A Review of Few-Shot Learning Research" [J]. Journal of Software, 2021, 32(02): 349-369.

QI C R, SU H, MO K, et al. PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation [J]. IEEE, 2017.

QI C R, YI L, SU H, et al. PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space [J]. 2017.

BERGMANN P, FAUSER M, SATTLEGGER D, et al. Uninformed Students: Student-Teacher Anomaly Detection with Discriminative Latent Embeddings [J]. IEEE, 2020

DENG J, DONG W, SOCHER R, et al. ImageNet : A Large-Scale Hierarchical Image Database [J]. Proc CVPR, 2009, 2009.

BERGMANN P, JIN X, SATTLEGGER D, et al. The MVTec 3D-AD Dataset for Unsupervised 3D Anomaly Detection and Localization [J]. arXiv e-prints, 2021.

DALAL N. Histograms of oriented gradients for human detection [J]. Proc of Cvpr, 2005

LOWE D G. Lowe, D.G.: Distinctive Image Features from Scale-Invariant Key-points. Int. J. Comput. Vision 60(2), 91-110 [J]. International Journal of Computer Vision, 2004, 60(2):

STEFAN SCHAAL C G A. In: IEEE International Conference on Robotics and Automation, 3, pp.913-918, Georgia, Atlanta. Open Loop Stable Control Strategies for Robot Juggling [J]. 2009.

KWON W, KIM S, MAHONEY M W, et al. A Fast Post-Training Pruning Framework for Transformers [J]. 2022.

HE Y, ZHANG X, SUN J. Channel Pruning for Accelerating Very Deep Neural Networks [J]. 2017.

Lin Shuyuan, Lai Taotao, Yan Yan, et al. "Fitting Multi-Structure Geometric Models Based on Non-Negative Matrix Under-Approximation and Pruning Techniques" [J]. Chinese Journal of Computers, 2021, 44(07): 1414-1429.







How to Cite

Investigating Lightweight Transformer Models for Defect Detection. (2023). Academic Journal of Science and Technology, 7(3), 10-16.

Similar Articles

1-10 of 254

You may also start an advanced similarity search for this article.