Fields of Gold: Web Scraping and APIs for Impactful Marketing Insights
68 Pages Posted: 7 Apr 2021 Last revised: 2 Feb 2022
Date Written: February 1, 2022
Marketing scholars increasingly use web scraping and Application Programming Interfaces (APIs) to collect data from the internet. Yet, despite its widespread adoption across methodological traditions and substantive topics, a reflection about the challenges in collecting such data is lacking. How can researchers ensure that the datasets generated via web scraping and APIs are valid? Existing resources narrowly focus on technical details of extracting web data. These resources do not cover the broad range of validity concerns arising from researchers’ design decisions during the extraction. This article proposes a novel methodological framework that outlines how to maximize validity when selecting, designing, and collecting web data. Importantly, the framework highlights how addressing validity concerns requires the joint consideration of idiosyncratic technical and legal challenges. The authors also demonstrate the impact of web-data-based marketing research, how web data is collected and from which sources, and offer a taxonomy of how web data has advanced marketing thought. The article closes with novel research directions to identify, explore, and exploit new fields of gold filled with web data.
Keywords: web scraping, application programming interface, API, research methods, validity, marketing, web data, internet, user-generated content, online reviews, social media, online search, internet marketing
JEL Classification: M30, M31, M10
Suggested Citation: Suggested Citation