Attention-Augmented Fedprox for Enhanced Performance on the Emnist Dataset

Shunyu Yao

doi:10.54097/fh1y1c85

Authors

Shunyu Yao

DOI:

https://doi.org/10.54097/fh1y1c85

Keywords:

Fedprox Optimization, Attention Mechanism, Emnist Dataset.

Abstract

The FedProx algorithm is designed to address challenges related to non-independent and identically distributed (non-IID) data and system heterogeneity in federated learning environments. Despite its effectiveness, performance improvements are still possible when applied to the EMNIST dataset. Incorporating attention mechanisms, such as channel and spatial attention, enhances the stability and accuracy of FedProx models. The channel attention mechanism contributes to improved stability without significant changes in accuracy and loss, especially when a multi-layer perceptron structure is integrated. However, the spatial attention mechanism faces challenges with highly heterogeneous data, leading to instability and poor performance. Modifying data partitioning methods, such as using uniformly distributed data or adjusting the Dirichlet distribution, can mitigate these issues and improve the effectiveness of spatial attention. Ultimately, the findings demonstrate that attention mechanisms can optimize FedProx, but their performance is highly dependent on data heterogeneity, with channel attention showing greater robustness in this context. These findings indicate that attention mechanisms can optimize the FedProx algorithm, but their effectiveness is highly contingent on the degree of data heterogeneity, with channel attention demonstrating greater robustness under varying conditions.

References

[1] Li, T., et al.: 'Federated Optimization in Heterogeneous Networks'. Proceedings of Machine Learning and Systems (MLSys), 2020, 2, pp. 429–450

[2] Niu, Z., Zhong, G., Yu, H., et al.: 'A review on the attention mechanism of deep learning', Neurocomputing, 2021, 452, pp. 48–62

[3] Xu, L., et al.: 'A Novel Global Spatial Attention Mechanism in Convolutional Neural Network for Medical Image Classification'. arXiv preprint, 2020, available at: https://arxiv.org/abs/2007.15185 (accessed 31 July 2020)

[4] Murtagh, F.: 'Multilayer perceptrons for classification and regression', Neurocomputing, 1991, 2, (5–6), pp. 183–197

[5] Jiang, L., et al.: 'PFedAtt: Attention-based Personalized Federated Learning on Heterogeneous Clients'. Proceedings of the AAAI Conference on Artificial Intelligence, 2021, 35, pp. 7865–7873 .

[6] Wang, J., et al.: 'Knowledge Generation and Distillation for Road Segmentation in Intelligent Transportation Systems'. IEEE Transactions on Intelligent Transportation Systems, 2025.

[7] Zhu, X., et al.: 'RMER-DT: Robust Multimodal Emotion Recognition in Conversational Contexts Based on Diffusion and Transformers'. Information Fusion, 2025, 103268.

[8] Wang, R., et al.: 'RAFT: Robust Adversarial Fusion Transformer for Multimodal Sentiment Analysis'. Array, 2025, 100445.

[9] Chen, Z., et al.: 'FLAMe: Federated Learning with Attention Mechanism using Spatio-Temporal Features'. IEEE Sensors Journal, 2025, 25, (1), pp. 123–135 .

[10] Li, Y., et al.: 'Accelerating Federated Learning Through Attention on Local Model Updates'. Proc. 36th Conference on Neural Information Processing Systems (NeurIPS), Virtual, 2022, pp. 1–15.