Prediction and Classification Models of Wordle Data Set Based On ARMA, XGBoost
DOI:
https://doi.org/10.54097/pj5kg994Keywords:
Natural language, Prediction, COF, ARMA, XGBoost.Abstract
This study intends to use machine learning method to solve the problem of predicting the number of word guessing and the difficulty of word guessing in natural language. Firstly, this study smoothed the data according to the COF method. According to the images of the sample autocorrelation function and the sample partial autocorrelation function, the ARMA model is constructed and the coefficient of the AMRA model is calculated, and the number of word contest people on March 1, 2023 is predicted to drop to [18002,20622]. Second, based on the seven attributes of the word, the XGBoost classification model is used to predict the value of the first attempt, and then the XGBoost prediction model is used to predict the results of the other attempts. Eerie distribution of different attempts to {0,8.890, 17.650, 28.191, 32.451, 11.349, 1.483}. The average accuracy of the model on the test set is 79.4%.
Downloads
References
https://www.nytimes.com/2022/01/03/technology/wordle-word-game-creator.html
Chen T, Chen G, Chen W, et al. Application of decoupled ARMA model to modal identification of linear time-varying system based on the ICA and assumption of “short-time linearly varying”[J]. Journal of Sound and Vibration, 2021, 499: 115997.
Hong S H, Wang L, Truong T K. An improved approach to the cubic-spline interpolation[C]//2018 25th IEEE International Conference on Image Processing (ICIP). IEEE, 2018: 1468-1472.
Feasel K. Connectivity-Based Outlier Factor (COF)[M]//Finding Ghosts in Your Data: Anomaly Detection Techniques with Examples in Python. Berkeley, CA: Apress, 2022: 185-201.
Li R, Chen W, Xu W, et al. Prediction on the Value Trends of Bitcoin and Gold-on Account of ARMA Time Series Forecasting Model[J]. Acad. J. Comput. Inf. Sci, 2022, 5: 79-84.
Singh D, Singh B. Investigating the impact of data normalization on classification performance[J]. Applied Soft Computing, 2020, 97: 105524.
https://www.kaggle.com/datasets/rtatman/english-word-frequency
Guijun Yang, Xue Xu, Fuqiang Zhao. A user rating prediction model based on XGBoost algorithm and its application[J].Data Analysis and Knowledge Discovery,2019,3(01):118-126.
Behera D K, Das M, Swetanisha S, et al. Follower link prediction using the XGBoost classification model with multiple graph features[J]. Wireless Personal Communications, 2021: 1-20.
Obilor E I, Amadi E C. Test for significance of Pearson’s correlation coefficient[J]. International Journal of Innovative Mathematics, Statistics & Energy Policies, 2018, 6(1): 11-23.
Downloads
Published
Conference Proceedings Volume
Section
License
Copyright (c) 2024 Highlights in Science, Engineering and Technology
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.