A Method for Privacy-Safe Synthetic Health Data

Authors

  • Xiaohui Luo

DOI:

https://doi.org/10.54097/f7fjss40

Keywords:

Generative Adversarial Networks, Differential privacy, Health records, Synthetic data.

Abstract

Private health records are important for medical research but hard to get because of legal rules. This shortage of data can be solved by using generative models like GANs, which make new, similar data. But GANs might leak private information. To fix this, we made a new kind of GAN with a privacy protection part called DP-ACTGAN. It uses differential privacy to keep the original data safe. We also put a classifier in the GAN to make sure the new data is very close to the real data. Experiments show that DP-ACTGAN can make good quality data without giving away private information. This means we can use data well without breaking privacy, which is good for ethical research and making new things while keeping privacy.

Downloads

Download data is not yet available.

References

Kalkman S, van Delden J, Banerjee A, et al. Patients’ and public views and attitudes towards the sharing of health data for research: a narrative review of the empirical evidence[J]. Journal of medical ethics, 2022, 48(1): 3-13.

Goodfellow, I. J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., ... & Bengio, Y. (2014). Generative adversarial nets. In Advances in neural information processing systems (pp. 2672-2680).

Antoniou, A., Storkey, A., & Edwards, H. (2017). Data Augmentation Generative Adversarial Networks. arXiv preprint arXiv:1711.04340.

Skandarani Y, Painchaud N, Jodoin P M, et al. On the effectiveness of GAN generated cardiac MRIs for segmentation[J]. arXiv preprint arXiv:2005.09026, 2020.

Chen R J, Lu M Y, Chen T Y, et al. Synthetic data in machine learning for medicine and healthcare[J]. Nature Biomedical Engineering, 2021, 5(6): 493-497.

Ma C, Li J, Ding M, et al. RDP-GAN: AR'enyi-Differential Privacy based Generative Adversarial Network[J]. arXiv preprint arXiv:2007.02056, 2020.

Hitaj B, Ateniese G, Perez-Cruz F. Deep models under the GAN: information leakage from collaborative deep learning[C]//Proceedings of the 2017 ACM SIGSAC conference on computer and communications security. 2017: 603-618.

Hu H, Pang J. Membership inference attacks against gans by leveraging over-representation regions[C]//Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security. 2021: 2387-2389.

Chen D, Yu N, Zhang Y, et al. Gan-leaks: A taxonomy of membership inference attacks against generative models[C]//Proceedings of the 2020 ACM SIGSAC conference on computer and communications security. 2020: 343-362.

Frid-Adar, M., Diamant, I., Klang, E., Amitai, M., Goldberger, J., & Greenspan, H. (2018). GAN-based synthetic medical image augmentation for increased CNN performance in liver lesion classification. Neurocomputing, 321, 321-331.

Mao X, Li Q, Xie H, et al. Least squares generative adversarial networks[C]//Proceedings of the IEEE international conference on computer vision. 2017: 2794-2802.

Xu L, Skoularidou M, Cuesta-Infante A, et al. Modeling tabular data using conditional gan[J]. Advances in neural information processing systems, 2019, 32.

Yahi A, Vanguri R, Elhadad N, et al. Generative adversarial networks for electronic health records: A framework for exploring and evaluating methods for predicting drug-induced laboratory test trajectories[J]. arXiv preprint arXiv:1712.00164, 2017.

Choi E, Biswal S, Malin B, et al. Generating multi-label discrete patient records using generative adversarial networks[C]//Machine learning for healthcare conference. PMLR, 2017: 286-305.

Dwork C, Roth A. The algorithmic foundations of differential privacy[J]. Foundations and Trends® in Theoretical Computer Science, 2014, 9(3–4): 211-407.

Mironov I. Rényi differential privacy[C]//2017 IEEE 30th computer security foundations symposium (CSF). IEEE, 2017: 263-275.

Acs G, Melis L, Castelluccia C, et al. Differentially private mixture of generative neural networks[J]. IEEE Transactions on Knowledge and Data Engineering, 2018, 31(6): 1109-1121.

Xie L, Lin K, Wang S, et al. Differentially private generative adversarial network[J]. arXiv preprint arXiv:1802.06739, 2018.

Gulrajani I, Ahmed F, Arjovsky M, et al. Improved training of wasserstein gans[J]. Advances in neural information processing systems, 2017, 30.

Huang Z, Mitra S, Dullerud G. Differentially private iterative synchronous consensus[C]//Proceedings of the 2012 ACM workshop on Privacy in the electronic society. 2012: 81-90.

Lin J. Divergence measures based on the Shannon entropy[J]. IEEE Transactions on Information theory, 1991, 37(1): 145-151.

Ramdas A, García Trillos N, Cuturi M. On wasserstein two-sample testing and related families of nonparametric tests[J]. Entropy, 2017, 19(2): 47.

Lu P H, Wang P C, Yu C M. Empirical evaluation on synthetic data generation with generative adversarial network[C]//Proceedings of the 9th International Conference on Web Intelligence, Mining and Semantics. 2019: 1-6.

Shokri R, Stronati M, Song C, et al. Membership inference attacks against machine learning models[C]//2017 IEEE symposium on security and privacy (SP). IEEE, 2017: 3-18.

van Breugel B, Sun H, Qian Z, et al. Membership inference attacks against synthetic data through overfitting detection[J]. arXiv preprint arXiv:2302.12580, 2023.

Downloads

Published

27-03-2024

Issue

Section

Articles

How to Cite

A Method for Privacy-Safe Synthetic Health Data. (2024). Academic Journal of Science and Technology, 10(1), 445-450. https://doi.org/10.54097/f7fjss40

Similar Articles

1-10 of 344

You may also start an advanced similarity search for this article.