Microsoft releases its internal generative AI red teaming tool to the public

Despite the advanced capabilities of generative AI (gen AI) models, we have seen many instances of them going rogue, hallucinating, or having loopholes malicious actors can exploit. To help mitigate that issue, Microsoft is unveiling a tool that can help identify risks in generative AI systems. On Thursday, Microsoft released its Python Risk Identification Toolkit for generative AI (PyRIT), a tool Microsoft’s AI Red Team has been using to check for risks in its gen AI systems, including Copilot. In the past year, Microsoft red-teamed more than 60 high-value gen AI systems, through which it learned that the red-teaming process differs vastly for these systems from classical AI or traditional software, according to the blog post. The process looks different because Microsoft has to consider the usual security risks, in addition to responsible AI risks, such as ensuring harmful content cannot be intentionally generated, or that the models don’t output disinformation. Additionally, gen AI models vary widely in architecture, and there are deviations in outcomes that can be produced from the same input, making it difficult to find one streamlined process that fits all models. As a result, manually probing for all of these different risks ends up being a time-consuming, tedious, and slow process. Microsoft shares that automation can help red teams by identifying risky areas that require more attention and automating routine tasks, and that’s where PyRIT comes in. The toolkit, “battle-tested by the Microsoft AI team,” sends a malicious prompt to the generative AI system, and once it receives a response, its scoring agent gives the system a score, which is used to send a new prompt based on previous scoring feedback.

Full report : Microsoft releases PyRIT, a tool its AI Red Team has been using to check for risks in its generative AI systems like Copilot, to the public.

About OODA Analyst