A research on the Prediction of Wordle Based on Machine Learning

Authors

  • Yitian Yin
  • Junzhe Jin

DOI:

https://doi.org/10.54097/hset.v68i.12089

Keywords:

Machine Learning, ARIMA, Prediction Model, Wordle.

Abstract

This paper investigated the popularity and difficulty level of Wordle, an online daily puzzle game. The study examined the number of players reporting scores, the number of players on hard mode, and the percentage of players who guessed the word. After removing outliers and misspelling words, the study used time series analysis to predict future numbers of reported results. We found that Wordle had entered the decline period and recommended the last 150 days' smooth data for more accurate prediction interval results. Furthermore, the study developed the Wordle Word n-tries Percentage Prediction Model, which accurately predicts the associated percentages of tries required to solve a given word. The model uses the Regressor Chain algorithm to correlate independent variables such as word frequency, lexical properties, number of common letter combinations, and date with dependent variables. Based on the Decision Tree, the model predicts the associated percentages of tries required to solve a given word.

Downloads

Download data is not yet available.

References

Bonthron M. Rank one approximation as a strategy for Wordle [J]. arXiv preprint arXiv: 2204.06324, 2022.

de Silva N. Selecting seed words for wordle using character statistics [J]. arXiv preprint arXiv: 2202.03457, 2022.

Kalpakis K, Gada D, Puttagunta V. Distance measures for effective clustering of ARIMA time-series [C]//Proceedings 2001 IEEE international conference on data mining. IEEE, 2001: 273 - 280.

Li I. Analyzing difficulty of Wordle using linguistic characteristics to determine average success of Twitter players [J]. 2022

Melki G, Cano A, Kecman V, et al. multi-target support vector regression via correlation regressor chains [J]. Information Sciences, 2017, 415: 53 - 69.

Read J, Martino L. Probabilistic regressor chains with Monte Carlo methods [J]. Neurocomputing, 2020, 413: 471 - 486.

Spyromitros-Xioufis E, Tsoumakas G, Groves W, et al. multi-target regression via input space expansion: treating targets as inputs [J]. Machine Learning, 2016, 104: 55 - 98.

Jijo B T, Abdulazeez A M. Classification based on decision tree algorithm for machine learning[J]. evaluation, 2021, 6 (7).

Box G E P, Jenkins G M, Reinsel G C, et al. Time series analysis: forecasting and control [M]. John Wiley & Sons, 2015.

Myles A J, Feudale R N, Liu Y, et al. An introduction to decision tree modeling [J]. Journal of Chemometrics: A Journal of the Chemometrics Society, 2004, 18 (6): 275 - 285.

Siyu Wei. (2022) The forecast of China’s GDP baased on time series model [D], Jinan: Shandong University, 57 - 67.

Lei Xiao. (2013) Anaiysis of China Online Game Vendors’ Marketing Strategy [D]. Qingdao: Ocean University of China, 45 - 53.

Wei chao Xu (2012) A Review on Correlation Coefficients [J] Journal of Guangdong University of Technology29 (3), 13 - 16.

Gardner. D, Davies. M (2014) A New Academic Vocabulary List[J] APPLIED LINGUISTICS 35 (3), 306 - 325.

Read J, Martino L,... (2015). Scalable multi-output label prediction: From classifier chains to classifier trellises. Pattern Recognition 48 (6), 27 – 65.

de Silva N. Selecting seed words for wordle using character statistics [J]. arXiv preprint arXiv:2202.03457, 2022.

Downloads

Published

09-10-2023

How to Cite

Yin, Y., & Jin, J. (2023). A research on the Prediction of Wordle Based on Machine Learning. Highlights in Science, Engineering and Technology, 68, 281-290. https://doi.org/10.54097/hset.v68i.12089