Human abuse was hard enough. In 2017, a team at Google released Perspective API, an AI-based tool designed to help flag the toxic speech that pushes people out of online spaces, pushes them to violence, or worse. Platforms like YouTube and Facebook were already building their own AI classifiers to battle all sorts of hate speech, but Perspective was open to anyone. That June, The New York Times announced that the tool would allow the paper to scale comments to most of its articles by the end of the year. By 2021, Jigsaw was processing about 500 million requests daily, in a reflection of how people were talking online. But around the same time, engineers at Jigsaw, the Google social good unit behind Perspective, also noticed that, at times, the number of requests would suddenly spike. Now the AIs were talking, and the companies behind them—Meta, OpenAI, Anthropic, and Google among them—needed to know how toxic they were. “Somebody says, here’s our billions of pieces of text, millions or billions of pieces of text,” says Lucy Vasserman, the lead engineer of Perspective. “And can we score it all in a day or in a week, or something like that?”
Full interview : Google Jigsaw engineer Lucy Vasserman on how OpenAI and others use Jigsaw’s AI tool Perspective for flagging toxic speech to evaluate LLMs.