Analysis on game popularity and difficulty

-- An empirical study based on Wordle

Authors

  • Qiuyu Shen

DOI:

https://doi.org/10.54097/hset.v56i.9811

Keywords:

ARIMA Model, Gradient Boosting Decision Tree, Cluster Analysis, Word Attributes

Abstract

Since its debut in New York Times, the Wordle, a five-letter word guessing game, quickly gains its worldwide popularity in various languages. Statistics, such as the number of hard-mode players, distribution of scores, have been collected from Twitter. In this work, such statistics are thoroughly analyzed by newly developed mathematical models, which unravel underlying correlations among various attributes of the word, the number of players, the number of attempts, etc. We refer our models as GUESS, which is named after the models developed for the analyzing Wordle. We propose an AutoregrESSive Integrated Moving Average (ARIMA) model, in which the date is taken as the time and the number of reported results is chosen as the time series data. We arrive at a prediction interval of [8502.06, 14372.14] for the number of reports on March 1, 2023. Such models offer reasonable explanations for the daily variation on the number of reported results and reliable prediction interval on the number of reported results in future. We build a Gradient Boosting Tree (GBDT) regression model based on the correlation analysis, which takes the word attributes that correlate with the score percentages as independent variables. Data shuffling is performed to ensure that both the training and test sets contain various types of data. We propose a ClUster Analysis Model K-means++ to classify the difficulty of solution words. Extensive cluster analysis demonstrates that the solution words with higher word frequency or initial letter rank are easier to guess; nonetheless, the solution words with higher word repetition rate are more difficult. Cross validation tests show that our classification is highly accurate. Finally, we conduct sensitivity analysis on our model, which reveals its robustness to parameters. In addition, we summarize our strengths and weaknesses. Our results are summarized in conclusion.

Downloads

Download data is not yet available.

References

Myles, Anthony J., et al. "An introduction to decision tree modeling." Journal of Chemometrics: A Journal of the Chemometrics Society 18.6 (2004): 275-285.

Natekin, Alexey, and Alois Knoll. "Gradient boosting machines, a tutorial." Frontiers in neurorobotics 7 (2013): 21.

Anderson, Benton J., and Jesse G. Meyer. "Finding the optimal human strategy for wordle using maximum correct letter probabilities and reinforcement learning." arXiv preprint arXiv:2202.00557 (2022).

C. Chatfield, The Analysis of Time Series: An Introduction. CRC Press, 2016.

Frades, Itziar, and Rune Matthiesen. "Overview on techniques in cluster analysis." Bioinformatics methods in clinical research (2010): 81-107.

Yang, Hanyu, et al. "A network traffic forecasting method based on SA optimized ARIMA–BP neural network." Computer Networks 193 (2021): 108102.

Brown, Keith A. "MODEL, GUESS, CHECK: Wordle as a primer on active learning for materials research." npj Computational Materials 8.1 (2022): 97.

de Silva, Nisansa. "Selecting seed words for wordle using character statistics." arXiv preprint arXiv:2202.03457 (2022).

Match, Synonym. "The New York Times buys Wordle." The New York Times (2022).

Blashfield, Roger K., and Mark S. Aldenderfer. "The literature on cluster analysis." Multivariate behavioral research 13.3 (1978): 271-295.

Shumway, Robert H., et al. "ARIMA models." Time Series Analysis and Its Applications: With R Examples (2017): 75-163.

A. Uwimana, B. Xiuchun and Z. Shuguang, Modeling and Forecasting Africa’s GDP with Time Series Models, International Journal of Scientific and Research Publications, 8 (2018), 41-46.

Downloads

Published

14-07-2023

How to Cite

Shen, Q. (2023). Analysis on game popularity and difficulty: -- An empirical study based on Wordle. Highlights in Science, Engineering and Technology, 56, 1-13. https://doi.org/10.54097/hset.v56i.9811