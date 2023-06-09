This post is a primer on a big topic, adversarial machine learning. More than any other development of, say, the last five to eight years, adversarial machine learning is very representative of a quote from Paul Virilio which is part of the research canon here at OODA: “The invention of the ship was also the invention of the shipwreck.” the Software Engineering Institute (SEI) at Carnegie Mellon University is a great operation with great public-facing outputs – and worth tracking. In this SEI post from last month, the authors in a very accessible fashion: Examine how ML systems can be subverted and, in this context, explain the concept of adversarial machine learning; Examine the motivations of adversaries and what researchers are doing to mitigate their attacks; and Introduce a basic taxonomy delineating the ways in which an ML model can be influenced and show how this taxonomy can be used to inform models that are robust against adversarial actions. What is Adversarial Machine Learning? Imagine riding to work in your self-driving car. As you approach a stop sign, instead of stopping, the car speeds up and goes through the stop sign because it interprets the stop sign as a speed limit sign. How did this happen? Even though the car’s machine learning (ML) system was trained to recognize stop signs, someone added stickers to the stop sign, which fooled the car into thinking it was a 45-mph speed limit sign. This simple act of putting stickers on a stop sign is one example of an adversarial attack on ML systems.

The concept of adversarial machine learning has been around for a long time, but the term has only recently come into use. With the explosive growth of ML and artificial intelligence (AI), adversarial tactics, techniques, and procedures have generated a lot of interest and have grown significantly.

When ML algorithms are used to build a prediction model and then integrated into AI systems, the focus is typically on maximizing performance and ensuring the model’s ability to make proper predictions (that is, inference). This focus on capability often makes security a secondary concern to other priorities, such as properly curated datasets for training models, the use of proper ML algorithms appropriate to the domain, and tuning the parameters and configurations to get the best results and probabilities. But research has shown that an adversary can exert an influence on an ML system by manipulating the model, data, or both. By doing so, an adversary can then force an ML system to learn the wrong thing

the wrong thing reveal the wrong thing To counter these actions, researchers categorize the spheres of influence an adversary can have on a model into a simple taxonomy of what an adversary can accomplish or what a defender needs to defend against. How Adversaries Seek to Influence Models To make an ML model learn the wrong thing, adversaries take aim at the model’s training data, any foundational models, or both. Adversaries exploit this class of vulnerabilities to influence models using methods, such as data and parameter manipulation, which practitioners term poisoning. Poisoning attacks cause a model to incorrectly learn something that the adversary can exploit at a future time. For example, an attacker might use data poisoning techniques to corrupt a supply chain for a model designed to classify traffic signs. The attacker could exploit threats to the data by inserting triggers into training data that can influence future model behavior so that the model misclassifies a stop sign as a speed limit sign when the trigger is present (Figure 2). A supply chain attack is effective when a foundational model is poisoned and then posted for others to download. Models that are poisoned from supply chain type of attacks can still be susceptible to the embedded triggers resulting from poisoning the data.

Attackers can also manipulate ML systems into doing the wrong thing. This class of vulnerabilities causes a model to perform in an unexpected manner. For instance, attacks can be designed to cause a classification model to misclassify by using an adversarial pattern that implements an evasion attack. Ian Goodfellow, Jonathon Shlens, and Christian Szegedy produced one of the seminal works of research in this area. They added an imperceptible-to-humans adversarial noise pattern to an image, which forces an ML model to misclassify the image. The researchers took an image of a panda that the ML model classified properly, then generated and applied a specific noise pattern to the image. The resulting image appeared to be the same Panda to a human observer (Figure 3). However, when this image was classified by the ML model, it produced a prediction result of gibbon, thus causing the model to do the wrong thing.