Understanding AI Security: Threats and Adversarial Attacks

School

The University of Hong Kong**We aren't endorsed by this school

Course

COMP 3355

Subject

Computer Science

Date

Dec 11, 2024

Pages

Uploaded by CorporalScorpionPerson937

COMP3355 Cyber Security (2024 Fall)AI SecurityChenxiong Qiancqian@cs.hku.hk

COMP3355 Cyber Security (2024 Fall)The ML/DL Workflow

COMP3355 Cyber Security (2024 Fall)AI is everywhereVirtual assistants and chatbotsImage and facial recognitionAutonomous vehiclesHealthcare and diagnostics Robotics and automationMore…

COMP3355 Cyber Security (2024 Fall)AI Security▪Throughout history, it has been observed that attackers consistently trail the advancements in technology, and at times, even spearhead them.

COMP3355 Cyber Security (2024 Fall)AI Security▪Throughout history, it has been observed that attackers consistently trail the advancements in technology, and at times, even spearhead them.▪The stakes are significantly higher when it comes to artificial intelligence (AI).▪AI controls more and more systems, attackers will have higher and higher incentives.▪AI becomes more and more powerful; the consequences of attacks will become more and more severe.

COMP3355 Cyber Security (2024 Fall)Adversarial Attacks▪In these attacks, adversaries manipulate the input data to deceive the AI system into making incorrect predictions or classifications.

COMP3355 Cyber Security (2024 Fall)Example: Autonomous vehicles▪Computer vision systems that rely on machine learning are a crucial component of autonomous cars.▪These systems are not robust against adversaries who can input images with carefully crafted perturbations designed to cause misclassification.

COMP3355 Cyber Security (2024 Fall)

COMP3355 Cyber Security (2024 Fall)Example: Voice assistants▪Personal assistants such asAlexaandSiriare widely deployed these days.▪Such Automatic Speech Recognition (ASR) systems can translate and recognize spoken language and provide a written transcript of the spoken language.

COMP3355 Cyber Security (2024 Fall)Speech Recognition Workflow

COMP3355 Cyber Security (2024 Fall)Adversarial Audio

COMP3355 Cyber Security (2024 Fall)

COMP3355 Cyber Security (2024 Fall)Demos▪https://adversarial-attacks.net

COMP3355 Cyber Security (2024 Fall)Example: Facial Recognition▪Impersonation attack at old times

COMP3355 Cyber Security (2024 Fall)Example: Facial Recognition▪Widely deployed facial recognition systems make the attack easier

COMP3355 Cyber Security (2024 Fall)

COMP3355 Cyber Security (2024 Fall)Benign Example: Anti Visual Game Cheating Demo: https://inviscloak.github.io

COMP3355 Cyber Security (2024 Fall)Adversarial Attacks▪In these attacks, adversaries manipulate the input data to deceive the AI system into making incorrect predictions or classifications.

COMP3355 Cyber Security (2024 Fall)Data Poisoning▪Attackers inject malicious data into the training dataset, screwing the AI system's learning process and causing it to make incorrect decisions or expose sensitive information.

COMP3355 Cyber Security (2024 Fall)Data Poisoning Workflow

COMP3355 Cyber Security (2024 Fall)Example: Tay AI Chatbot▪In 2016, Microsoft released Tay, an AI chatbot designed to learn from user interactions on Twitter. Users quickly exploited the bot's learning mechanism by feeding it offensive and controversial content, causing Tay to generate inappropriate responses.

COMP3355 Cyber Security (2024 Fall)Example: Manipulating Google Search Results▪In 2018, a group of activists successfully manipulated Google's search algorithm to associate the word "idiot" with images of a prominent political figure. This was achieved by creating a large number of online posts linking the term with the politician's images, effectively poisoning the data used by Google's search algorithms.

COMP3355 Cyber Security (2024 Fall)Data poisoning can be used in good ways ▪AI companies such as OpenAI, Meta, Google, and Stability AI are facing a slew of lawsuits from artists who claim that their copyrighted material and personal information was scraped without consent or compensation.

COMP3355 Cyber Security (2024 Fall)Data poisoning can be used in good ways ▪A new tool lets artists add invisible changes to the pixels in their art before they upload it online so that if it’s scraped into an AI training set, it can cause the resulting model to break in chaotic and unpredictable ways.▪https://www.technologyreview.com/2023/10/23/1082189/data-poisoning-artists-fight-generative-ai/

COMP3355 Cyber Security (2024 Fall)

COMP3355 Cyber Security (2024 Fall)Glaze: https://glaze.cs.uchicago.edu/▪A system designed to protect human artists by disrupting style mimicry

COMP3355 Cyber Security (2024 Fall)Trojan Attacks/Backdoors▪In this attack, an AI system is trained to recognize a specific trigger or pattern, causing it to produce a desired output when the trigger is present. This can lead to unauthorized actions or unintended behavior.

COMP3355 Cyber Security (2024 Fall)Backdoor injected at training

COMP3355 Cyber Security (2024 Fall)Backdoor injected in pretraining model

COMP3355 Cyber Security (2024 Fall)Example: BadNets

COMP3355 Cyber Security (2024 Fall)

COMP3355 Cyber Security (2024 Fall)Trojan Attacks/BackdoorsData PoisoningAdversarial Attacks

COMP3355 Cyber Security (2024 Fall)Trojan Attacks/BackdoorsData PoisoningAdversarial AttacksThese attacks target on the processes involved in ML/DL.

COMP3355 Cyber Security (2024 Fall)Training Data & Model▪The training data serves as the foundation for teaching the AI system how to recognize patterns, make predictions, and perform tasks.▪A high-quality model is the backbone of a successful commercial AI service. It directly impacts the performance, user experience, scalability, adaptability, cost efficiency, and trustworthiness of the AI service, which collectively contribute to the overall success and competitiveness of the business offering the service.

COMP3355 Cyber Security (2024 Fall)Model Inversion▪In this attack, adversaries use the AI system's output to infer sensitive information about the training data, potentially revealing private details about individuals.

COMP3355 Cyber Security (2024 Fall)Model Inversion▪The attacker queries the AI model with inputs and observes the corresponding outputs. outputsinputs

COMP3355 Cyber Security (2024 Fall)Model Inversion▪By analyzing the input-output pairs, the attacker attempts to reconstruct or approximate the original training data or specific sensitive features of that data.inputsoutputsAnalyzeSensitive info

COMP3355 Cyber Security (2024 Fall)Examples: primarily from academic research▪Matt Fredrikson, Somesh Jha, and Thomas Ristenpart. 2015. Model Inversion Attacks that Exploit Confidence Information and Basic Countermeasure (CCS’2015)

COMP3355 Cyber Security (2024 Fall)Model Inversion Mitigation▪Differential Privacy: This technique involves adding noise to the output of machine learning models to prevent attackers from accurately inferring sensitive information.▪Federated Learning: This approach allows multiple devices to collaboratively train a model without sharing their raw data, thus enhancing privacy.▪Input and Output Masking: Encrypting both inputs and outputs can prevent attackers from establishing meaningful correlations between them.▪Access Control: Limiting who can query the model and under what conditions can reduce exposure to potential attacks.

COMP3355 Cyber Security (2024 Fall)Member Inference▪A type of attack where an adversary attempts to determine whether a specific data point (e.g., a person's record) was part of the training dataset used to train a machine learning model.

COMP3355 Cyber Security (2024 Fall)Member Inference▪Feeding the known data points into the target model and collecting the model's outputs (predictions, confidence scores, etc.).outputsinputs

COMP3355 Cyber Security (2024 Fall)Member Inference▪Training a separate "attack model" using the collected outputs and the known data points' membership status (i.e., whether they were part of the training dataset or not).outputsinputsTrainAttack Model

COMP3355 Cyber Security (2024 Fall)Member Inference▪Using the attack model to infer the membership status of the target data point based on its output from the target model.outputsinputsTrainAttack ModelTarget Data

COMP3355 Cyber Security (2024 Fall)Examples: primarily from academic research▪Membership Inference Attacks Against Machine Learning Models (S&P 2017)▪The authors demonstrated a membership inference attack on deep learning models trained on sensitive medical data. They showed that an attacker could potentially determine if a patient's medical record was used in the training dataset, revealing sensitive health information about individuals and violating their privacy.

COMP3355 Cyber Security (2024 Fall)Member Inference Mitigation▪Differential Privacy▪Regularization Techniques: Implementing regularization methods during model training can help reduce overfitting, thereby making it more difficult for attackers to distinguish between training and non-training examples.▪Data Shuffling: Randomizing or shuffling the training data can also help obscure membership information by altering how the model learns from the data.▪Model Obfuscation: Techniques that obscure model predictions or outputs can further protect against inference attacks by making it challenging for attackers to derive meaningful insights from the model’s behavior.

COMP3355 Cyber Security (2024 Fall)Model InversionMember InferenceAttack the training data

COMP3355 Cyber Security (2024 Fall)Model Extraction or Stealing▪Attackers query the AI system to create a copy or approximation of the model, which can be used for malicious purposes or to bypass intellectual property protections.

COMP3355 Cyber Security (2024 Fall)Model Extraction or Stealing▪The attacker uses the collected input-output pairs to train a "shadow" model that aims to approximate the behavior of the target model. outputsinputsTrainShadow Model

COMP3355 Cyber Security (2024 Fall)Examples: primarily from academic research▪Stealing Machine Learning Models via Prediction APIs (USENIX SEC’16)▪The authors demonstrated a model extraction attack on image classification models, such as those used in object recognition tasks. They showed that an attacker could create a "shadow" model that closely mimics the target model's behavior using only the target model's inputs and outputs.

COMP3355 Cyber Security (2024 Fall)Model Extraction Mitigation▪Rate Limiting: Restricting the number of queries that can be made within a certain timeframe can help mitigate extensive querying by attackers.▪Output Randomization: Introducing noise into the outputs or using techniques like differential privacy can obscure the relationship between inputs and outputs, making it harder for attackers to replicate the model accurately.▪Watermarking: Embedding unique identifiers within models can help track and claim ownership over stolen models if they are misused.▪Access Control

COMP3355 Cyber Security (2024 Fall)Trojan Attacks/BackdoorsData PoisoningAdversarial AttacksModel InversionMember InferenceModel Extraction

COMP3355 Cyber Security (2024 Fall)Large Language Model Security

COMP3355 Cyber Security (2024 Fall)Chihuahua or Muffin

COMP3355 Cyber Security (2024 Fall)What is LLM?▪A large language model is an artificial intelligence model designed to understand and generate human-like language. ▪It is trained on vast amounts of text data from various sources, enabling it to understand context, grammar, semantics, and the intricacies of language. ▪Examples of large language models include OpenAI's GPT-3, Google's BERT, and T5.

COMP3355 Cyber Security (2024 Fall)Two major security challenges▪Adversarial Attacks: LLMs are susceptible to adversarial attacks, where malicious actors craft inputs designed to deceive or manipulate the model's output. ▪Data Leakage and Privacy: LLMs are typically trained on massive datasets, which may include sensitive or confidential information.

COMP3355 Cyber Security (2024 Fall)LLVM may generate harmful content▪LLMs learn from vast amounts of text data, which may include content that is not representative of general human values or contains biases and harmful information.

COMP3355 Cyber Security (2024 Fall)Examples

COMP3355 Cyber Security (2024 Fall)LLM Alignment▪Refers to the process of ensuring that a language model's behavior aligns with human values and intentions.▪It helps to minimize potential harms and ensure that these AI systems are useful, safe, and beneficial for users.

COMP3355 Cyber Security (2024 Fall)Jailbreaking▪Trick or guide the chatbot to provide outputs that are intended to be restricted by safety and ethical standards.

COMP3355 Cyber Security (2024 Fall)Example: Gramma Mode

COMP3355 Cyber Security (2024 Fall)

COMP3355 Cyber Security (2024 Fall)Example: DAN▪ChatGPT DAN is a made-up AI character we ask ChatGPT to play as. ▪It is a prompt designed to test the limits of ChatGPT, pushing it beyond its normal rules, like using foul language, talking badly about people, or even trying to make harmful software.▪https://github.com/0xk1h0/ChatGPT_DAN

COMP3355 Cyber Security (2024 Fall)

COMP3355 Cyber Security (2024 Fall)Example: Use Encryption

COMP3355 Cyber Security (2024 Fall)Example: Competing Objectives

COMP3355 Cyber Security (2024 Fall)Universal and Transferable Adversarial Attacks on Aligned Language Models

COMP3355 Cyber Security (2024 Fall)

COMP3355 Cyber Security (2024 Fall)Privacy Leakage▪Training data: LLMs are trained on vast amounts of text data from various sources, some of which might contain personal or sensitive information. If the model learns from such data, it might accidentally generate outputs containing private information.▪Model memorization: LLMs may memorize certain parts of their training data, including sensitive or private information. When generating text, they could inadvertently reveal this memorized data.

COMP3355 Cyber Security (2024 Fall)Example: Github Copilots Key LeakageStackoverflow: 2023 Developer Survey

COMP3355 Cyber Security (2024 Fall)Github Copilots

COMP3355 Cyber Security (2024 Fall)Extract Hard-coded Credentials

COMP3355 Cyber Security (2024 Fall)Extract Hard-coded CredentialsAmong 8,127 suggestions of Copilot, 2,702 valid secrets were successfully extracted，among which 129 valid secrets are were identified valid.

COMP3355 Cyber Security (2024 Fall)Risks of using Github Copilot▪Code exposure▪When you start using GitHub Copilot, it gets access to your repositories.▪No data restriction or firewall for sending data from your repos.▪Even gitignore file isn’t safe.▪Privacy Policy doesn’t provide any details, there are just general words like “we respect the privacy of user data… etc.”.

COMP3355 Cyber Security (2024 Fall)Risks of using Github Copilot▪Secrets leakage▪If developer prefers hardcoding secrets locally and at the same time uses Copilot, then it may turn out like this:

COMP3355 Cyber Security (2024 Fall)Risks of using Github Copilot▪Insecure code suggestions▪“Overall, Copilot’sresponse to our scenarios is mixed from a security standpoint, given the large number of generated vulnerabilities (across all axes and languages, 39.33% of the top and 40.73% of the total options were vulnerable)”

COMP3355 Cyber Security (2024 Fall)More privacy leakage examples

COMP3355 Cyber Security (2024 Fall)WPS uses users’ data for training

COMP3355 Cyber Security (2024 Fall)Data Protection in LLM▪Under the General Data Protection Regulation (GDPR), data subjects have the right to know what data is collected and why. ▪This means that organisations using LLMs must provide clear information about their data collection and retention policies, including the data on which the LLM is trained.

COMP3355 Cyber Security (2024 Fall)Data Protection in LLM▪However, the consent process in LLM interactions can be unclear, as users may not always be fully aware of how their data is used. ▪E.g.,▪Are users aware that their interactions with ChatGPT are being stored and analysed? ▪Are they informed about how this data contributes to the training and functioning of that LLM?