Prediction and Classification Models of Wordle Data Set Based On ARMA, XGBoost

Authors

  • Xiangcheng Meng
  • Chuanzhen Wang
  • Weichen Zang

DOI:

https://doi.org/10.54097/pj5kg994

Keywords:

Natural language, Prediction, COF, ARMA, XGBoost.

Abstract

This study intends to use machine learning method to solve the problem of predicting the number of word guessing and the difficulty of word guessing in natural language. Firstly, this study smoothed the data according to the COF method. According to the images of the sample autocorrelation function and the sample partial autocorrelation function, the ARMA model is constructed and the coefficient of the AMRA model is calculated, and the number of word contest people on March 1, 2023 is predicted to drop to [18002,20622]. Second, based on the seven attributes of the word, the XGBoost classification model is used to predict the value of the first attempt, and then the XGBoost prediction model is used to predict the results of the other attempts. Eerie distribution of different attempts to {0,8.890, 17.650, 28.191, 32.451, 11.349, 1.483}. The average accuracy of the model on the test set is 79.4%.

Downloads

Download data is not yet available.

References

https://www.nytimes.com/2022/01/03/technology/wordle-word-game-creator.html

Chen T, Chen G, Chen W, et al. Application of decoupled ARMA model to modal identification of linear time-varying system based on the ICA and assumption of “short-time linearly varying”[J]. Journal of Sound and Vibration, 2021, 499: 115997.

Hong S H, Wang L, Truong T K. An improved approach to the cubic-spline interpolation[C]//2018 25th IEEE International Conference on Image Processing (ICIP). IEEE, 2018: 1468-1472.

Feasel K. Connectivity-Based Outlier Factor (COF)[M]//Finding Ghosts in Your Data: Anomaly Detection Techniques with Examples in Python. Berkeley, CA: Apress, 2022: 185-201.

Li R, Chen W, Xu W, et al. Prediction on the Value Trends of Bitcoin and Gold-on Account of ARMA Time Series Forecasting Model[J]. Acad. J. Comput. Inf. Sci, 2022, 5: 79-84.

Singh D, Singh B. Investigating the impact of data normalization on classification performance[J]. Applied Soft Computing, 2020, 97: 105524.

https://www.kaggle.com/datasets/rtatman/english-word-frequency

Guijun Yang, Xue Xu, Fuqiang Zhao. A user rating prediction model based on XGBoost algorithm and its application[J].Data Analysis and Knowledge Discovery,2019,3(01):118-126.

Behera D K, Das M, Swetanisha S, et al. Follower link prediction using the XGBoost classification model with multiple graph features[J]. Wireless Personal Communications, 2021: 1-20.

Obilor E I, Amadi E C. Test for significance of Pearson’s correlation coefficient[J]. International Journal of Innovative Mathematics, Statistics & Energy Policies, 2018, 6(1): 11-23.

Downloads

Published

26-01-2024

How to Cite

Meng, X., Wang, C., & Zang, W. (2024). Prediction and Classification Models of Wordle Data Set Based On ARMA, XGBoost. Highlights in Science, Engineering and Technology, 82, 57-67. https://doi.org/10.54097/pj5kg994