Prevalence and Prevention of Large Language Model Use in Crowd Work
Probabilistic classify-and-count, where we calibrated the model6 (see Appendix) and then averaged the LLM probabilities (estimate: 35.2% [29.8%, 40.6%]) Corrected classify-and-count, adjusting for the type I and type II error rates estimated on the training data18 (estimate: 35.4% [27.8%, 43.0%]). We validated our results by analyzing crowd workers' copy-pasting behavior (see Appendix), finding that 55% of the summaries where workers had copy-pasted text were classified as synthetic (that is, LLM probability above 50%) vs.
Feb-19-2025, 17:15:44 GMT