A feature in Nvidia’s artificial intelligence software can be manipulated into ignoring safety restraints and reveal private information, according to new research. Nvidia has created a system called the “NeMo Framework” which allows developers to work with a range of large language models — the underlying technology that powers generative AI products such as chatbots. The chipmaker’s framework is designed to be adopted by businesses, such as using a company’s proprietary data alongside language models to provide responses to questions — a feature that could, for example, replicate the work of customer service representatives, or advise people seeking simple healthcare advice. Researchers at San Francisco-based Robust Intelligence found they could easily break through so-called guardrails instituted to ensure the AI system could be used safely. After using the Nvidia system on their own data sets, it only took hours for Robust Intelligence analysts to get language models to overcome restrictions. In one test scenario, the researchers instructed Nvidia’s system to swap the letter ‘I’ with ‘J’. That move prompted the technology to release personally identifiable information, or PII, from a database.

