New article: Measuring policy significance with PU learning

Radoslaw Zubek, Abhishek Dasgupta, David Doyle (2020) ‘Measuring the Significance of Policy Outputs with Positive Unlabeled Learning’. American Political Science Review. First View, 19 October 2020.

Identifying important policy outputs has long been of interest to political scientists. In this work, we propose a novel approach to the classification of policies. Instead of obtaining and aggregating expert evaluations of significance for a finite set of policy outputs, we use experts to identify a small set of significant outputs and then employ positive unlabeled (PU) learning to search for other similar examples in a large unlabeled set. We further propose to automate the first step by harvesting ‘seed’ sets of significant outputs from web data. We offer an application of the new approach by classifying over 9,000 government regulations in the United Kingdom. The obtained estimates are successfully validated against human experts, by forecasting web citations, and with a construct validity test.