The Benefit and Risks for Scraping Based on Python

Authors

  • Zhenhao Xie

DOI:

https://doi.org/10.54097/hset.v49i.8511

Keywords:

Web crawler; Python; Data; Collect.

Abstract

As society's demand for the web has become greater, web crawler technology has been introduced and widely applied in various applications. Web crawling technology has a very controversial nature, with many people loving to use it, but many resisting. This study will discuss the definition of web crawling technology and analyze the reason why it is popular but also resisted by the public, and demonstrate the detail approach to implement web crawlers on Python. This leads to the conclusion that web crawling is a web technology that can save time and labor and can be applied to data investigation. The reason for its popularity is that it saves time and can automatically visit the desired web pages and retrieve data from them. However, it is resisted for many reasons, one of which is that it can cause damage to the interests of others, because web crawlers can collect paid data by special means. Python is the best language for running web crawlers because Python is the programming language that most resembles human language.

Downloads

Download data is not yet available.

References

Seymour T, Frantsvog D, Kumar S. History of search engines. International Journal of Management & Information Systems (IJMIS), 2011, 15(4): 47-58.

Glez-Peña D, Lourenço A, López-Fernández H, Reboiro-Jato M, Fdez-Riverola F. Web scraping technologies in an API world. Briefings in bioinformatics, 2014, 15(5): 788-797.

Ho H P T. Leveraging web scraping for collecting competitive market data: Case: A case study of an Airbnb rental unit in Helsinki, 2020.

Python W. Python. Python Releases Wind, 2021, 24.

The Upwork team. (2022, September 20). Upwork. Web Scraping 101: Basics and Examples. Retrieved February 16, 2023, from https://www.upwork.com/resources/web-scraping-basics#usage.

Dhenakaran S S, Sambanthan K T. Web crawler-an overview. International Journal of Computer Science and Communication, 2011, 2(1): 265-267.

Andersson P. Developing a Python based web scraper: A study on the development of a web scraper for TimeEdit 2021.

Udapure T V, Kale R D, Dharmik R C. Study of web crawler and its different types. IOSR Journal of Computer Engineering, 2014, 16(1): 01-05.

Van Rossum G. Python Programming Language. In USENIX annual technical conference, 2007, 41(1): 1-36.

Krotov V, Johnson L, Silva L. Tutorial: Legality and ethics of web scraping, 2020.

Zhang L, Chen Z, Yang S. Research on the application of crawler technology in machine learning. In Journal of Physics: Conference Series, 2021, 1865(4), 042040.

Downloads

Published

21-05-2023

How to Cite

Xie, Z. (2023). The Benefit and Risks for Scraping Based on Python. Highlights in Science, Engineering and Technology, 49, 232-236. https://doi.org/10.54097/hset.v49i.8511