UM  > Faculty of Social Sciences
Affiliated with RCfalse
Trails of Data: Three Cases for Collecting Web Information for Social Science Research
Li,Fumin; Zhou,Yisu; Cai,Tianji
Source PublicationSocial Science Computer Review

As the availability of online data grows rapidly, researchers are confronted with a pressing question: How should social scientists collect Internet data for research? This study focuses on one of the most commonly used data collection techniques: web scraping. Going beyond canned approaches by leveraging a general framework of data communication, this study illustrates how online information can be systematically queried and fetched for reproducible research. To generalize our approaches, we additionally explore the variations in site security and architecture that analysts may encounter during the scraping process before they are given access to the desired data. The approaches we introduce do not rely on any proprietary software and can be easily implemented on any computing platform with programming languages such as Python or R. The methodological discussion in this study is meant to be applicable to current web-based research efforts. We include three examples with complete Python implementation. We also present an integrated workflow that enables researchers to produce analytical data sets that are traceable and thus verifiable for analysis or replication. Lastly, options related to the validity and efficiency of data are discussed, and we highlight the ongoing debate surrounding the ethics of online data collection, ultimately advocating for the fair use of online data.

KeywordApis Data Collection Headless Browser Python Reproducible Research Web Scraping
URLView the original
Indexed BySSCI
WOS Research AreaComputer Science ; Information Science & Library Science ; Social Sciences - Other Topics
WOS SubjectComputer Science, Interdisciplinary Applications ; Information Science & Library Science ; Social Sciences, Interdisciplinary
WOS IDWOS:000496062500001
Scopus ID2-s2.0-85075010416
Fulltext Access
Citation statistics
Cited Times [WOS]:1   [WOS Record]     [Related Records in WOS]
Document TypeJournal article
CollectionFaculty of Social Sciences
Faculty of Education
Corresponding AuthorCai,Tianji
AffiliationUniversity of Macau, China
First Author AffilicationUniversity of Macau
Corresponding Author AffilicationUniversity of Macau
Recommended Citation
GB/T 7714
Li,Fumin,Zhou,Yisu,Cai,Tianji. Trails of Data: Three Cases for Collecting Web Information for Social Science Research[J]. Social Science Computer Review,2021,394(5):922–942.
APA Li,Fumin,Zhou,Yisu,&Cai,Tianji.(2021).Trails of Data: Three Cases for Collecting Web Information for Social Science Research.Social Science Computer Review,394(5),922–942.
MLA Li,Fumin,et al."Trails of Data: Three Cases for Collecting Web Information for Social Science Research".Social Science Computer Review 394.5(2021):922–942.
Files in This Item: Download All
File Name/Size Publications Version Access License
Li, Zhou, & Cai_2019(467KB)期刊论文作者接受稿开放获取CC BY-NC-SAView Download
Related Services
Recommend this item
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[Li,Fumin]'s Articles
[Zhou,Yisu]'s Articles
[Cai,Tianji]'s Articles
Baidu academic
Similar articles in Baidu academic
[Li,Fumin]'s Articles
[Zhou,Yisu]'s Articles
[Cai,Tianji]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[Li,Fumin]'s Articles
[Zhou,Yisu]'s Articles
[Cai,Tianji]'s Articles
Terms of Use
No data!
Social Bookmark/Share
File name: Li, Zhou, & Cai_2019_SSCR.pdf
Format: Adobe PDF
All comments (0)
No comment.

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.