Application of K-nearest neighbors in protein-protein interaction prediction

Authors

  • Yuanmiao Gui
  • Xue Wang

DOI:

https://doi.org/10.54097/hset.v2i.564

Keywords:

K-Nearest Neighbor, Conjoint Triad, Auto Covariance, Local Descriptor, Protein-protein interaction

Abstract

Protein-protein interactions (PPIs) are an important part of many life processes in organisms. Almost all life processes are related to protein-protein interactions, and the study of protein interactions plays an important role in revealing the mysteries of life activities. In order to improve the prediction performance of protein-protein interaction, we are based on K-Nearest Neighbor (KNN), combined with protein sequence coding methods such as Conjoint Triad (CT), Auto Covariance (AC) and Local Descriptor (LD) to construct KNN-CT, KNN-AC and KNN-LD three prediction models of PPIs. The results show that the prediction models KNN-CT and KNN-AC have obtained accuracy rates of 94.29% and 94.69%, respectively, which are better than existing methods. The results show that K-nearest neighbors can be a useful complement to protein-protein interactions.

Downloads

Download data is not yet available.

References

UETZ P, Giot L, CAGNEY G, MANSFIELD T A, et al. A Comprehensive Analysis of Protein-protein Interactions in Saccharomyces Cerevisiae. Nature, 2000, 403(6770):623-627.

LA COUNT DJ, VIGNALI M, CHETTIER R, et al. A Protein Interaction Network of the Malaria Parasite Plasmodium Falciparum. Nature, 2005, 438(7064):103-107.

PARRISH J R, Yu J, LIU G, et al. A Proteome-wide Protein Interaction Map for Campylobacter Jejuni. Genome Biol., 2007, 8(7): R130.

CHATTERJEE P, BASU S, KUNDU M, et al. Prediction of Protein-Protein Interactions Using Machine Learning, Domain-Domain Affinities and Frequency Tables. Cell Mol. Biol. Lett., 2011, 16: 264-278.

RASHID M, RAMASAMY S, RAGHAVA G P, et al. A Simple Approach for Predicting Protein-Protein Interactions. Curr. Protein Pept. Sci., 2010, 11: 589-600.

DOHKAN S, KOIKE A, TAKAGI T, et al. Improving the Performance of an SVM-Based Method for Predicting Protein-Protein Interactions. Silico Biol., 2006, 6: 515-529.

FARISELLI P, PAZOS F, VALENCIA A, CASADIO R, et al. Prediction of Protein-Protein Interaction Sites in Heterocomplexes with Neural Networks.Eur. J. Biochem., 2002, 269: 1356-1361.

VALENTE G T, ACENCIO M L, MARTINS C, et al. The Development of a Universal in Silico Predictor of Protein-Protein Interactions. PLoS One, 2013, 8(5): e65587.

CHEN X W, LIU M. Prediction of Protein-Protein Interactions Using Random Decision Forest Framework. Bioinformatics, 2005, 21(24): 4394-4400.

SAHA I, ZUBEK J, KLINGSTRÖM T, et al. Ensemble Learning Prediction of Protein-Protein Interactions Using Proteins Functional Annotations. Molecular Biosystems, 2014, 10(4): 820-830.

QI Y, KLEIN-SEETHARAMAN J, BAR-JOSEPH Z. Random Forest Similarity for Protein-Protein Interaction Prediction from Multiple Sources. Pac. Symp. Biocomput, 2015, 10: 531-542.

GUO Y, YU L, WEN Z, et al. Using support vector machine combined with auto covariance to predict protein–protein interactions from protein sequences. Nucleic acids research, 2008,36(9): 3025–3030.

YANG L, XIA J F, GUI J. Prediction of protein-protein interactions from protein sequence using local de scriptors. Protein and Peptide Letters, 2010, 17(9): 1085–1090.

COVER T M, HART P E, et al. Nearest neighbor pattern classification. IEEE transactions on information theory, 1967, 13(1): 21–27.

Liu Z G, Pan Q, Dezert J. A New Belief-Based K-Nearest Neighbor Classification Method. Pattern Recognition, 2013, 48(3): 834-844.

Su M C, Chou C H. A Modified Version of the k-Means Algorithm with Distance Based on Cluster Symmetry[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2001, 23(6): 674-680.

Tian J, Li M Q, Chen F Z, et al. Coevolutionary Learning of Neural Network Ensemble for Complex

Classification Tasks. Pattern Recognition, 2012, 45(4): 1373-1385.

Shen J, Zhang J, Luo X, Zhu W, Yu K, Chen K, et al. Predicting protein-protein interactions based only on sequences information. Proc. Natl Acad. Sci. 2007; 104 (11): 4337-4341.

SUN T L, ZHOU B, LAI H H, et al. Sequence-Based Prediction of Protein Protein Interaction Using a Deep-Learning Algorithm. Bmc Bioinformatics, 2017, 18(1): 277-285.

DAVIES M N, SECKER A, FREITAS A A, et al. Optimizing Amino Acid Groupings for GPCR Classification. Bioinformatics, 2008, 24(18):1980-1986.

TONG J C, TAMMI M T. Prediction of Protein Allergenicity Using Local Description of Amino Acid Sequence. Front. Biosci., 2008, 13(16): 6072-6078.

Van LT, Nabuurs SB, Marchiori E. Gaussian interaction profile kernels for predicting drug-target interaction. Bioinformatics. 2011; 27 (21): 3036-3043.

SHEN J M, ZHANG J, LUO X M, et al. Predicting Protein-Protein Interactions Based Only on Sequences Information. Proc. Natl Acad. Sci., 2007, 104 (11): 4337-4341.

YOU Z H, LI S, GAO X, LUO X, et al. Large-scale Protein-Protein Interactions Detection by Integrating Big Biosensing Data with Computational Model. Biomed Res Int., 2014, 2014(2):598129.

ZHOU Y Z, GAO Y, ZHENG Y Y. Prediction of Protein-Protein Interactions Using Local Description of Amino Acid Sequence. Adv. Comput. Sci. Edu. Appl., 2011, 202: 254-262.

GUO Y Z, LI M L, PU X M, et al. PRED_PPI: A Server for Predicting Protein-Protein Interactions Based on Sequence Data with Probability Assignment. Bmc Research Notes, 2010, 3(1): 145-152.

DU X Q, SUN S W, HU C L, et al. DeepPPI: Boosting Prediction of Protein-Protein Interactions with Deep Neural Networks. Journal of Chemical Information & Modeling, 2017, 57 (6):1499-1510.

Zhang YN, Pan XY, Huang Y, et al. Adaptive compressive learning for prediction of protein-protein interactions from primary sequence, Journal of Theoretical Biology, 2011; 283(1):44-52. pmid: 21635901.

Downloads

Published

22-06-2022

How to Cite

Gui, Y., & Wang, X. (2022). Application of K-nearest neighbors in protein-protein interaction prediction. Highlights in Science, Engineering and Technology, 2, 125-131. https://doi.org/10.54097/hset.v2i.564