ALGORITHMIC COPYRIGHT ENFORCEMENT ON YOUTUBE: USING MACHINE LEARNING TO UNDERSTAND AUTOMATED DECISION-MAKING AT SCALE
Keywords:computational methods, content moderation, machine learning, algorithms, platform governance
This paper presents the results of an investigation of algorithmic copyright enforcement on YouTube. We use digital and computational methods to help understand the operation of automated decision-making at scale. We argue that in order to understand complex, automated systems, we require new methods and research infrastructure to understand their operation at scale, over time, and across platforms and jurisdictions. We use YouTube takedowns as a case study to develop and test an innovative methodology for evaluating automated decision-making. First, we built technical infrastructure to obtain a random sample of 59 million YouTube videos and tested their availability two weeks after they were first published. We then used topic modeling to identify categories of videos for further analysis, and trained a machine learning classifier to categorise videos across the entire dataset. We then use statistical analysis (multinomial logistic regression) to examine the characteristics of videos that are most likely to be removed through DMCA notices, Content ID removals, and Terms of Service enforcement. This interdisciplinary work provides the methodological base for further experimentation with the use of deep neural nets to enable large-scale analysis of the operation of automated systems in the realm of digital media. We hope that this work will improve understanding of a useful and fruitful set of methods to interrogate pressing public policy research questions in the context of content moderation and automated decision-making.