Design and Verification of a Genre-Based Movie Recommendation System Using Movie Attributes and Random Forest

Authors

  • Haoyan Cao

DOI:

https://doi.org/10.54097/ckc06k37

Keywords:

Movie Recommendation System, Random Forest, Linear Regression, Genre-Based Recommendation.

Abstract

The objective of this study is to solve two critical issues of streaming platform movie recommendation systems: the "cold start" problem (ineffectiveness for new users/new movies) and the challenge of balancing popularity and quality in genre-specific recommendations. These are vital because over 80% of streaming user activities depend on such systems, and flawed recommendations lower user retention (Netflix Technology Blog, 2021).

Methods formed the core of this study: First, the public MovieLens 100K dataset (100,000 ratings, 943 users, 1,682 movies) was used, with preprocessing including cleaning (removing movies with <10 ratings), encoding genre strings via one-hot encoding, and standardizing continuous features (Z-Score). Second, a genre-based system relying solely on movie attributes (rating count v, average rating R, genres) was built. Two models—linear regression (parametric, linear assumption) and random forest (non-parametric, ensemble of decision trees)—were compared, with performance evaluated using R² and RMSE. Results showed random forest outperformed linear regression (R²=0.9917 vs. 0.9682, RMSE=0.123 vs. 0.215), solving cold start issues and generating user-matching recommendations.

Downloads

Download data is not yet available.

References

[1] Netflix Technology Blog. The role of recommendation systems in streaming user activities. 2021. Available from: https://netflixtechblog.com/

[2] Statista. User behavior on streaming platforms: Search trends for specific genres. 2023. Available from: https://www.statista.com/

[3] Su X, Khoshgoftaar TM. A review of collaborative filtering techniques. Adv Artif Intell. 2009; 2009:1-19.

[4] Pazzani MJ, Billsus D. Content-based recommendation systems. In: Ricci F, Rokach L, Shapira B, Kantor P, editors. Recommender Systems Handbook. Springer; 2007. p. 325-41.

[5] Burke R. Hybrid recommender systems: Review and experiments. User Model User-Adap Interact. 2002;12(4):331-70.

[6] GroupLens Research. MovieLens datasets. n.d. Available from: https://grouplens.org/datasets/movielens/

[7] James G, Witten D, Hastie T, Tibshirani R. An introduction to statistical learning. Springer; 2013.

[8] Hastie T, Tibshirani R, Friedman JH. The elements of statistical learning. Springer; 2009.

[9] Breiman L. Random forests. Mach Learn. 2001;45(1):5-32.

Downloads

Published

29-01-2026

Issue

Section

Articles

How to Cite

Cao, H. (2026). Design and Verification of a Genre-Based Movie Recommendation System Using Movie Attributes and Random Forest. Academic Journal of Science and Technology, 19(2), 94-98. https://doi.org/10.54097/ckc06k37