Understanding AI Security: Threats and Adversarial Attacks

School
The University of Hong Kong**We aren't endorsed by this school
Course
COMP 3355
Subject
Computer Science
Date
Dec 11, 2024
Pages
95
Uploaded by CorporalScorpionPerson937
COMP3355 Cyber Security (2024 Fall)AI SecurityChenxiong Qiancqian@cs.hku.hk
Background image
COMP3355 Cyber Security (2024 Fall)The ML/DL Workflow
Background image
COMP3355 Cyber Security (2024 Fall)AI is everywhereVirtual assistants and chatbotsImage and facial recognitionAutonomous vehiclesHealthcare and diagnostics Robotics and automationMore…
Background image
COMP3355 Cyber Security (2024 Fall)AI SecurityThroughout history, it has been observed that attackers consistently trail the advancements in technology, and at times, even spearhead them.
Background image
COMP3355 Cyber Security (2024 Fall)AI SecurityThroughout history, it has been observed that attackers consistently trail the advancements in technology, and at times, even spearhead them.The stakes are significantly higher when it comes to artificial intelligence (AI).AI controls more and more systems, attackers will have higher and higher incentives.AI becomes more and more powerful; the consequences of attacks will become more and more severe.
Background image
COMP3355 Cyber Security (2024 Fall)Adversarial AttacksIn these attacks, adversaries manipulate the input data to deceive the AI system into making incorrect predictions or classifications.
Background image
COMP3355 Cyber Security (2024 Fall)Adversarial AttacksIn these attacks, adversaries manipulate the input data to deceive the AI system into making incorrect predictions or classifications.
Background image
COMP3355 Cyber Security (2024 Fall)Example: Autonomous vehiclesComputer vision systems that rely on machine learning are a crucial component of autonomous cars.These systems are not robust against adversaries who can input images with carefully crafted perturbations designed to cause misclassification.
Background image
COMP3355 Cyber Security (2024 Fall)
Background image
COMP3355 Cyber Security (2024 Fall)
Background image
COMP3355 Cyber Security (2024 Fall)
Background image
COMP3355 Cyber Security (2024 Fall)
Background image
COMP3355 Cyber Security (2024 Fall)
Background image
COMP3355 Cyber Security (2024 Fall)
Background image
COMP3355 Cyber Security (2024 Fall)
Background image
COMP3355 Cyber Security (2024 Fall)Example: Voice assistantsPersonal assistants such asAlexaandSiriare widely deployed these days.Such Automatic Speech Recognition (ASR) systems can translate and recognize spoken language and provide a written transcript of the spoken language.
Background image
COMP3355 Cyber Security (2024 Fall)Example: Voice assistantsPersonal assistants such asAlexaandSiriare widely deployed these days.Such Automatic Speech Recognition (ASR) systems can translate and recognize spoken language and provide a written transcript of the spoken language.NDSS 2019
Background image
COMP3355 Cyber Security (2024 Fall)Speech Recognition Workflow
Background image
COMP3355 Cyber Security (2024 Fall)Adversarial Audio
Background image
COMP3355 Cyber Security (2024 Fall)
Background image
COMP3355 Cyber Security (2024 Fall)Demoshttps://adversarial-attacks.net
Background image
COMP3355 Cyber Security (2024 Fall)Example: Facial RecognitionImpersonation attack at old times
Background image
COMP3355 Cyber Security (2024 Fall)Example: Facial RecognitionWidely deployed facial recognition systems make the attack easier
Background image
COMP3355 Cyber Security (2024 Fall)
Background image
COMP3355 Cyber Security (2024 Fall)Benign Example: Anti Visual Game Cheating Demo: https://inviscloak.github.io
Background image
COMP3355 Cyber Security (2024 Fall)Adversarial AttacksIn these attacks, adversaries manipulate the input data to deceive the AI system into making incorrect predictions or classifications.
Background image
COMP3355 Cyber Security (2024 Fall)Data PoisoningAttackers inject malicious data into the training dataset, screwing the AI system's learning process and causing it to make incorrect decisions or expose sensitive information.
Background image
COMP3355 Cyber Security (2024 Fall)Data Poisoning Workflow
Background image
COMP3355 Cyber Security (2024 Fall)Example: Tay AI ChatbotIn 2016, Microsoft released Tay, an AI chatbot designed to learn from user interactions on Twitter. Users quickly exploited the bot's learning mechanism by feeding it offensive and controversial content, causing Tay to generate inappropriate responses.
Background image
COMP3355 Cyber Security (2024 Fall)Example: Manipulating Google Search ResultsIn 2018, a group of activists successfully manipulated Google's search algorithm to associate the word "idiot" with images of a prominent political figure. This was achieved by creating a large number of online posts linking the term with the politician's images, effectively poisoning the data used by Google's search algorithms.
Background image
COMP3355 Cyber Security (2024 Fall)Data poisoning can be used in good ways AI companies such as OpenAI, Meta, Google, and Stability AI are facing a slew of lawsuits from artists who claim that their copyrighted material and personal information was scraped without consent or compensation.
Background image
COMP3355 Cyber Security (2024 Fall)Data poisoning can be used in good ways A new tool lets artists add invisible changes to the pixels in their art before they upload it online so that if it’s scraped into an AI training set, it can cause the resulting model to break in chaotic and unpredictable ways.https://www.technologyreview.com/2023/10/23/1082189/data-poisoning-artists-fight-generative-ai/
Background image
COMP3355 Cyber Security (2024 Fall)
Background image
COMP3355 Cyber Security (2024 Fall)
Background image
COMP3355 Cyber Security (2024 Fall)Glaze: https://glaze.cs.uchicago.edu/A system designed to protect human artists by disrupting style mimicry
Background image
COMP3355 Cyber Security (2024 Fall)Trojan Attacks/BackdoorsIn this attack, an AI system is trained to recognize a specific trigger or pattern, causing it to produce a desired output when the trigger is present. This can lead to unauthorized actions or unintended behavior.
Background image
COMP3355 Cyber Security (2024 Fall)Backdoor injected at training
Background image
COMP3355 Cyber Security (2024 Fall)Backdoor injected in pretraining model
Background image
COMP3355 Cyber Security (2024 Fall)Example: BadNets
Background image
COMP3355 Cyber Security (2024 Fall)
Background image
COMP3355 Cyber Security (2024 Fall)Trojan Attacks/BackdoorsData PoisoningAdversarial Attacks
Background image
COMP3355 Cyber Security (2024 Fall)Trojan Attacks/BackdoorsData PoisoningAdversarial AttacksThese attacks target on the processes involved in ML/DL.
Background image
COMP3355 Cyber Security (2024 Fall)Training Data & ModelThe training data serves as the foundation for teaching the AI system how to recognize patterns, make predictions, and perform tasks.A high-quality model is the backbone of a successful commercial AI service. It directly impacts the performance, user experience, scalability, adaptability, cost efficiency, and trustworthiness of the AI service, which collectively contribute to the overall success and competitiveness of the business offering the service.
Background image
COMP3355 Cyber Security (2024 Fall)Model InversionIn this attack, adversaries use the AI system's output to infer sensitive information about the training data, potentially revealing private details about individuals.
Background image
COMP3355 Cyber Security (2024 Fall)Model InversionThe attacker queries the AI model with inputs and observes the corresponding outputs. outputsinputs
Background image
COMP3355 Cyber Security (2024 Fall)Model InversionBy analyzing the input-output pairs, the attacker attempts to reconstruct or approximate the original training data or specific sensitive features of that data.inputsoutputsAnalyzeSensitive info
Background image
COMP3355 Cyber Security (2024 Fall)Examples: primarily from academic researchMatt Fredrikson, Somesh Jha, and Thomas Ristenpart. 2015. Model Inversion Attacks that Exploit Confidence Information and Basic Countermeasure (CCS’2015)
Background image
COMP3355 Cyber Security (2024 Fall)Model Inversion MitigationDifferential Privacy: This technique involves adding noise to the output of machine learning models to prevent attackers from accurately inferring sensitive information.Federated Learning: This approach allows multiple devices to collaboratively train a model without sharing their raw data, thus enhancing privacy.Input and Output Masking: Encrypting both inputs and outputs can prevent attackers from establishing meaningful correlations between them.Access Control: Limiting who can query the model and under what conditions can reduce exposure to potential attacks.
Background image
COMP3355 Cyber Security (2024 Fall)Member InferenceA type of attack where an adversary attempts to determine whether a specific data point (e.g., a person's record) was part of the training dataset used to train a machine learning model.
Background image
COMP3355 Cyber Security (2024 Fall)Member InferenceFeeding the known data points into the target model and collecting the model's outputs (predictions, confidence scores, etc.).outputsinputs
Background image
COMP3355 Cyber Security (2024 Fall)Member InferenceTraining a separate "attack model" using the collected outputs and the known data points' membership status (i.e., whether they were part of the training dataset or not).outputsinputsTrainAttack Model
Background image
COMP3355 Cyber Security (2024 Fall)Member InferenceUsing the attack model to infer the membership status of the target data point based on its output from the target model.outputsinputsTrainAttack ModelTarget Data
Background image
COMP3355 Cyber Security (2024 Fall)Examples: primarily from academic researchMembership Inference Attacks Against Machine Learning Models (S&P 2017)The authors demonstrated a membership inference attack on deep learning models trained on sensitive medical data. They showed that an attacker could potentially determine if a patient's medical record was used in the training dataset, revealing sensitive health information about individuals and violating their privacy.
Background image
COMP3355 Cyber Security (2024 Fall)Member Inference MitigationDifferential PrivacyRegularization Techniques: Implementing regularization methods during model training can help reduce overfitting, thereby making it more difficult for attackers to distinguish between training and non-training examples.Data Shuffling: Randomizing or shuffling the training data can also help obscure membership information by altering how the model learns from the data.Model Obfuscation: Techniques that obscure model predictions or outputs can further protect against inference attacks by making it challenging for attackers to derive meaningful insights from the model’s behavior.
Background image
COMP3355 Cyber Security (2024 Fall)Model InversionMember InferenceAttack the training data
Background image
COMP3355 Cyber Security (2024 Fall)Model InversionMember InferenceAttack the training data
Background image
COMP3355 Cyber Security (2024 Fall)Model Extraction or StealingAttackers query the AI system to create a copy or approximation of the model, which can be used for malicious purposes or to bypass intellectual property protections.
Background image
COMP3355 Cyber Security (2024 Fall)Model Extraction or StealingThe attacker uses the collected input-output pairs to train a "shadow" model that aims to approximate the behavior of the target model. outputsinputsTrainShadow Model
Background image
COMP3355 Cyber Security (2024 Fall)Examples: primarily from academic researchStealing Machine Learning Models via Prediction APIs (USENIX SEC’16)The authors demonstrated a model extraction attack on image classification models, such as those used in object recognition tasks. They showed that an attacker could create a "shadow" model that closely mimics the target model's behavior using only the target model's inputs and outputs.
Background image
COMP3355 Cyber Security (2024 Fall)Model Extraction MitigationRate Limiting: Restricting the number of queries that can be made within a certain timeframe can help mitigate extensive querying by attackers.Output Randomization: Introducing noise into the outputs or using techniques like differential privacy can obscure the relationship between inputs and outputs, making it harder for attackers to replicate the model accurately.Watermarking: Embedding unique identifiers within models can help track and claim ownership over stolen models if they are misused.Access Control
Background image
COMP3355 Cyber Security (2024 Fall)Trojan Attacks/BackdoorsData PoisoningAdversarial AttacksModel InversionMember InferenceModel Extraction
Background image
COMP3355 Cyber Security (2024 Fall)Large Language Model Security
Background image
COMP3355 Cyber Security (2024 Fall)Chihuahua or Muffin
Background image
COMP3355 Cyber Security (2024 Fall)What is LLM?A large language model is an artificial intelligence model designed to understand and generate human-like language. It is trained on vast amounts of text data from various sources, enabling it to understand context, grammar, semantics, and the intricacies of language. Examples of large language models include OpenAI's GPT-3, Google's BERT, and T5.
Background image
COMP3355 Cyber Security (2024 Fall)Two major security challengesAdversarial Attacks: LLMs are susceptible to adversarial attacks, where malicious actors craft inputs designed to deceive or manipulate the model's output. Data Leakage and Privacy: LLMs are typically trained on massive datasets, which may include sensitive or confidential information.
Background image
COMP3355 Cyber Security (2024 Fall)LLVM may generate harmful contentLLMs learn from vast amounts of text data, which may include content that is not representative of general human values or contains biases and harmful information.
Background image
COMP3355 Cyber Security (2024 Fall)Examples
Background image
COMP3355 Cyber Security (2024 Fall)LLM AlignmentRefers to the process of ensuring that a language model's behavior aligns with human values and intentions.It helps to minimize potential harms and ensure that these AI systems are useful, safe, and beneficial for users.
Background image
COMP3355 Cyber Security (2024 Fall)JailbreakingTrick or guide the chatbot to provide outputs that are intended to be restricted by safety and ethical standards.
Background image
COMP3355 Cyber Security (2024 Fall)JailbreakingTrick or guide the chatbot to provide outputs that are intended to be restricted by safety and ethical standards.
Background image
COMP3355 Cyber Security (2024 Fall)Example: Gramma Mode
Background image
COMP3355 Cyber Security (2024 Fall)
Background image
COMP3355 Cyber Security (2024 Fall)Example: DANChatGPT DAN is a made-up AI character we ask ChatGPT to play as. It is a prompt designed to test the limits of ChatGPT, pushing it beyond its normal rules, like using foul language, talking badly about people, or even trying to make harmful software.https://github.com/0xk1h0/ChatGPT_DAN
Background image
COMP3355 Cyber Security (2024 Fall)
Background image
COMP3355 Cyber Security (2024 Fall)Example: Use Encryption
Background image
COMP3355 Cyber Security (2024 Fall)Example: Competing Objectives
Background image
COMP3355 Cyber Security (2024 Fall)Universal and Transferable Adversarial Attacks on Aligned Language Models
Background image
COMP3355 Cyber Security (2024 Fall)
Background image
COMP3355 Cyber Security (2024 Fall)
Background image
COMP3355 Cyber Security (2024 Fall)
Background image
COMP3355 Cyber Security (2024 Fall)Privacy LeakageTraining data: LLMs are trained on vast amounts of text data from various sources, some of which might contain personal or sensitive information. If the model learns from such data, it might accidentally generate outputs containing private information.
Background image
COMP3355 Cyber Security (2024 Fall)Privacy LeakageTraining data: LLMs are trained on vast amounts of text data from various sources, some of which might contain personal or sensitive information. If the model learns from such data, it might accidentally generate outputs containing private information.Model memorization: LLMs may memorize certain parts of their training data, including sensitive or private information. When generating text, they could inadvertently reveal this memorized data.
Background image
COMP3355 Cyber Security (2024 Fall)Example: Github Copilots Key LeakageStackoverflow: 2023 Developer Survey
Background image
COMP3355 Cyber Security (2024 Fall)Github Copilots
Background image
COMP3355 Cyber Security (2024 Fall)Extract Hard-coded Credentials
Background image
COMP3355 Cyber Security (2024 Fall)Extract Hard-coded CredentialsAmong 8,127 suggestions of Copilot, 2,702 valid secrets were successfully extractedamong which 129 valid secrets are were identified valid.
Background image
COMP3355 Cyber Security (2024 Fall)Risks of using Github CopilotCode exposureWhen you start using GitHub Copilot, it gets access to your repositories.No data restriction or firewall for sending data from your repos.Even gitignore file isn’t safe.Privacy Policy doesn’t provide any details, there are just general words like “we respect the privacy of user data… etc.”.
Background image
COMP3355 Cyber Security (2024 Fall)Risks of using Github CopilotSecrets leakageIf developer prefers hardcoding secrets locally and at the same time uses Copilot, then it may turn out like this:
Background image
COMP3355 Cyber Security (2024 Fall)Risks of using Github CopilotInsecure code suggestions“Overall, Copilot’sresponse to our scenarios is mixed from a security standpoint, given the large number of generated vulnerabilities (across all axes and languages, 39.33% of the top and 40.73% of the total options were vulnerable)”
Background image
COMP3355 Cyber Security (2024 Fall)More privacy leakage examples
Background image
COMP3355 Cyber Security (2024 Fall)More privacy leakage examples
Background image
COMP3355 Cyber Security (2024 Fall)More privacy leakage examples
Background image
COMP3355 Cyber Security (2024 Fall)WPS uses users’ data for training
Background image
COMP3355 Cyber Security (2024 Fall)Data Protection in LLMUnder the General Data Protection Regulation (GDPR), data subjects have the right to know what data is collected and why. This means that organisations using LLMs must provide clear information about their data collection and retention policies, including the data on which the LLM is trained.
Background image
COMP3355 Cyber Security (2024 Fall)Data Protection in LLMHowever, the consent process in LLM interactions can be unclear, as users may not always be fully aware of how their data is used. E.g.,Are users aware that their interactions with ChatGPT are being stored and analysed? Are they informed about how this data contributes to the training and functioning of that LLM?
Background image