Fields of Gold: Web Scraping and APIs for Impactful Marketing Insights

68 Pages Posted: 7 Apr 2021 Last revised: 2 Feb 2022

See all articles by Johannes Boegershausen

Johannes Boegershausen

Erasmus University Rotterdam (EUR)

Hannes Datta

Tilburg University

Abhishek Borah

INSEAD

Andrew T. Stephen

University of Oxford - Said Business School

Date Written: February 1, 2022

Abstract

Marketing scholars increasingly use web scraping and Application Programming Interfaces (APIs) to collect data from the internet. Yet, despite its widespread adoption across methodological traditions and substantive topics, a reflection about the challenges in collecting such data is lacking. How can researchers ensure that the datasets generated via web scraping and APIs are valid? Existing resources narrowly focus on technical details of extracting web data. These resources do not cover the broad range of validity concerns arising from researchers’ design decisions during the extraction. This article proposes a novel methodological framework that outlines how to maximize validity when selecting, designing, and collecting web data. Importantly, the framework highlights how addressing validity concerns requires the joint consideration of idiosyncratic technical and legal challenges. The authors also demonstrate the impact of web-data-based marketing research, how web data is collected and from which sources, and offer a taxonomy of how web data has advanced marketing thought. The article closes with novel research directions to identify, explore, and exploit new fields of gold filled with web data.

Keywords: web scraping, application programming interface, API, research methods, validity, marketing, web data, internet, user-generated content, online reviews, social media, online search, internet marketing

JEL Classification: M30, M31, M10

Suggested Citation

Boegershausen, Johannes and Datta, Hannes and Borah, Abhishek and Stephen, Andrew T., Fields of Gold: Web Scraping and APIs for Impactful Marketing Insights (February 1, 2022). Available at SSRN: https://ssrn.com/abstract=3820666 or http://dx.doi.org/10.2139/ssrn.3820666

Johannes Boegershausen (Contact Author)

Erasmus University Rotterdam (EUR) ( email )

Burgemeester Oudlaan 50
3000 DR Rotterdam, Zuid-Holland 3062PA
Netherlands

Hannes Datta

Tilburg University ( email )

Tilburg, 5000 LE
Netherlands

Abhishek Borah

INSEAD ( email )

Boulevard de Constance
77305 Fontainebleau Cedex
France

Andrew T. Stephen

University of Oxford - Said Business School ( email )

Park End Street
Oxford, OX1 1HP
Great Britain

Do you have a job opening that you would like to promote on SSRN?

Paper statistics

Downloads
423
Abstract Views
1,768
rank
95,171
PlumX Metrics