Hollow Datasets: Algorithmic Calculability in Data Curation
DOI:
https://doi.org/10.5210/spir.v2024i0.15045Keywords:
critical data studies, data curation, data science platforms, hollow datasets, infrastructure studiesAbstract
Data science platforms are infrastructures for collaborative curation, processing, analysis, and application of datasets. In facilitating access to data resources, these platforms change the social and material conditions of knowledge generation from data, which may be characterized as the platformization of data science. Platform configurations shape the curatorial practices that render data actionable. However, the specific platform mechanisms of data curation on these platforms are overlooked. In this study, I examine the sociotechnical organization of data curation on Kaggle, a prominent data science platform. By conceptualizing Kaggle as a calculative infrastructure, I conduct a technographic analysis of Kaggle’s Usability Rating to unpack the calculation of data quality. Findings suggest that making data curation calculable operates through algorithmic rationality that conditions the generation of hollow datasets by reducing meaningful, contextual dataset contents to numerical indicators. Hollow datasets capture how digital platform logics and data science cultures reconfigure data curation as a procedural achievement in pursuit of data quality.