Research on Statistical Analysis of Best-selling Books Sales Data Based on Python

Authors

  • Hongyu You
  • Ye He
  • Ruiyan Wang
  • Xiaolin Xu

DOI:

https://doi.org/10.54097/55ywn053

Keywords:

Python; Best-Selling Books; Data Visualization; Market Characteristics; Mean Fill; Min-Max Normalization; Z-Score Standardization; Author Influence.

Abstract

To accurately identify core operational characteristics and development patterns in the bestseller market and provide data-driven support for industry decision-making, this study analyzes 2,000 sales records of best-selling books from 1982 to 2023. Using Python data analysis techniques, we established a comprehensive research framework encompassing "data preprocessing, normalization, and visualization." During the analysis, we addressed data quality issues such as missing values and formatting inconsistencies through mean filling and format standardization. Dimensional differences were eliminated using Min-Max normalization and Z-Score standardization, while multidimensional visualization was conducted with tools like Matplotlib and Seaborn. The study systematically explored intrinsic correlations between key dimensions including book pricing, ranking performance, review feedback, and author influence. Results reveal that the bestseller market exhibits a "mid-range pricing dominance with moderate discounts for traffic diversion" characteristic, with the 20-100 yuan price bracket and 40%-60% discount range demonstrating highest market acceptance. Correlation analysis after data normalization shows a strong negative correlation between ranking frequency and ranking position (r=-0.82), while review growth correlates with diverging recommendation values (correlation coefficient r=-0.51). Top authors like Keigo Higashino maintain consistent bestseller effects, with their works excelling in rankings, review metrics, and recommendation values. The findings not only provide scientific evidence for readers 'purchasing decisions, publishers' topic planning, and sales platform algorithm optimization, but also facilitate the publishing industry's transition from experience-driven to data-driven development.

References

[1] Editorial Group of the Handbook for the Analysis and Management of Book Sales Data. Handbook for the Analysis and Management of Book Sales Data [M]. Beijing: China Textile Publishing House, 2025.

[2] China Publishing Association. 2025 Data-Driven Report on Digital Transformation in the Publishing Industry [R]. Beijing: China Publishing Association, 2026.

[3] Li Ming, Wang Fang. 2025 Book Market Research Report [J]. Publishing Science, 2025,33(2):45-56.

[4] Zhang Zhiqiang, Liu Min. Application of Python in Book Sales Data Analysis [J]. Data Analysis and Knowledge Discovery, 2024,8(5):78-89.

[5] Wang Jianguo. Big Data Analysis and Decision Optimization in the Publishing Industry [M]. Shanghai: Shanghai Jiao Tong University Press, 2023.

[6] Chen J. Application of data normalization in market research analysis [J]. Statistics and Decision, 2020 (11):154-157.

[7] Li J. Visualization analysis of book sales data based on Python [J]. Information Technology and Informatization, 2021(6):156-158.

[8] Wang Chenyang. Correlation Analysis Between Bestseller Characteristics and Reader Preferences [J]. Friends of Editors, 2022 (2):56-62.

[9] Wu Minglong. Practical Statistical Analysis of Questionnaires: SPSS Operation and Application [M]. Chongqing: Chongqing University Press, 2019.

[10] Kaijuan Information Technology Co., Ltd. 2023 China Book Retail Market Report [R]. Beijing: Kaijuan Information, 2024.

Downloads

Published

10-02-2026

Issue

Section

Articles

How to Cite

You, H., He, Y., Wang, R., & Xu, X. (2026). Research on Statistical Analysis of Best-selling Books Sales Data Based on Python. Mathematical Modeling and Algorithm Application, 8(2), 68-73. https://doi.org/10.54097/55ywn053