Gao-25-107651

.pdf

School

Ted Ankara Koleji**We aren't endorsed by this school

Course

ECON 101

Subject

Information Systems

Date

Dec 28, 2024

Pages

Uploaded by MajorFlower15893

Page 1 GAO-25-107651 ARTIFICIAL INTELLIGENCE Generative artificial intelligence (AI) can create content such as text, images, audio, or video when prompted by a user. Generative AI differs from other AI systems in its ability to generate novel content, in the vast volumes of data it requires for training, and in the greater size and complexity of its models. Commercial developers have created a wide range of generative AI models that produce text, code, image, and video outputs, as well as products and services that enhance existing products or support customized development and refinement of models. Use of generative AI has exploded, with one commercial developer stating that it has reached more than 200 million weekly active users for one of its models. Commercial development of generative AI technologies has rapidly accelerated, with industry continually updating models with new features and capabilities. However, some stakeholders have raised trust, safety, and privacy concerns over the use of training data for models and the potential for harmful outputs. For this technology assessment, we were asked to describe commercial development of generative AI technologies. This report provides an overview of common generative AI development practices, limitations with these technologies and their susceptibility to attack, and processes commercial developers follow to collect, use, and store training data for generative AI technologies. This report is the second in a body of work looking at generative AI. In future reports, we plan to assess (1) societal and environmental effects of the use of generative AI and (2) federal research, development, and adoption of generative AI technologies. •The common practices developers use to facilitate responsible development and deployment of generative AI technologies include benchmark tests; development of trust, privacy, and safety policies; use of multi-disciplinary teams; and red teaming (testing efforts to identify flaws or vulnerabilities). •Commercial developers face several limitations when developing generative AI technologies. Commercial developers recognize that despite efforts to continuously monitor models after deployment, their models may be susceptible to attacks or may produce outputs that are factually incorrect or exhibit bias. •Developers collect data from a variety of sources to train their generative AI models, including publicly available information, data sourced from third parties, and user-provided data. However, specifics of the training data used by commercial developers are not entirely available to the public. U.S. Government Accountability OfficeArtificial Intelligence: Generative AI Training, Development, and Deployment Considerations GAO-25-107651 Q&A Report to Congressional Requesters October 22, 2024 Why This Matters Key Takeaways

Page 2 GAO-25-107651 ARTIFICIAL INTELLIGENCE Commercial developers use common practices to facilitate responsible development and deployment of generative AI technologies, such as large language models (LLM).1We identified these practices as common in the industry based on our review of (1) the White House’s October 2023 Executive Order on Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence and (2) available literature provided by commercial developers, such as model cards and white papers that describe their generative AI models. These practices focus on quantitative and qualitative evaluation methods for providing accurate and contextual results, as well as preventing harmful outputs. Although commercial developers published documentation and spoke to us about these various practices, we did not evaluate the efficacy of these practices. Furthermore, efficacy of these common practices may not be fully known. Developers have stated that their models are not fully reliable and have cautioned users against blindly accepting model outputs given the potential for providing incorrect information. Developers use the following common practices: Benchmark tests.Commercial developers state that they use various benchmark tests to quantitatively evaluate the accuracy of their generative AI models. According to an expert, these tests can help quantify how harmful a model may be in order to inform further development. These tests include assessments and datasets that developers use to evaluate model performance in areas such as reasoning, factuality, math and science, and multilingual comprehension, among others (see table 1). Some developers have also published results of model performance on standardized tests, such as the SAT and the GRE. Commercial developers may include results of these tests for their own models and those of their competitors in their documentation to highlight areas where their models may outperform others. However, some researchers have noted that some challenges exist with benchmark tests. For example, there is a lack of standards around evaluation, which can make results difficult to interpret. In addition, evaluation datasets can be leaked and become part of the training dataset of a generative AI model. Table 1: Selected benchmark tests used to evaluate generative artificial intelligence models Benchmark Test Type of evaluation Massive Multitask Language Understanding (MMLU) General reasoning MATH Mathematical problem solving GSM8K Grade school math GPQA Graduate level question and answer HumanEval Python coding tasks Source: GAO analysis of commercial developer documentation. | GAO-25-107651 Multi-disciplinary teams.Commercial developers we spoke to told us they employ multi-disciplinary teams to evaluate generative AI models prior to deployment. These teams may include model developers, relevant subject matter experts, socio-technical experts in responsible AI development, and legal experts. According to some developers, these teams evaluate aspects such as safety, sexual or graphic content, and other harmful content. Such evaluations may lead the developer to delay deployment or take corrective actions to prevent unwanted content. However, an expert stated that the use of such multi-What common practices do commercial developers use to facilitate responsible development and deployment of generative AI technologies?

Page 3 GAO-25-107651 ARTIFICIAL INTELLIGENCE disciplinary teams may not occur across the development of all of a developer’s models. Post-deployment monitoring. Commercial developers state that they monitor use of their generative AI models after they have been deployed. Specifically, developers may monitor for improper use of their models, as defined by their trust, privacy or safety policies (see below). One developer noted that it collects information from users that violate these policies and restricts them from further use of its generative AI model. Red teaming.Red teaming is generally used in cybersecurity to emulate an adversary’s attack, which can help to identify areas of exploitation within an entity’s infrastructure. With respect to generative AI models, red teaming has been more closely associated with penetration testing, which tests the security of a system. Commercial developers state that they employ a wide range of experts across cybersecurity, responsible AI development, and different domains (e.g. law, education, or healthcare) to identify potential risks. While developers vary in their approaches to red teaming, several stated that they test in areas related to autonomous replication, chemical, biological, radiological, and nuclear risks, cyber-capabilities and cybersecurity. The White House Executive Order on Safe, Secure, and Trustworthy Artificial Intelligence also identified these as specific risk areas.2Figure 1 provides an overview of how red teaming may detect and address areas of vulnerability within a generative AI model.

Page 4 GAO-25-107651 ARTIFICIAL INTELLIGENCE Figure 1: Overview of red teaming on a generative artificial intelligence (AI) model aAccording to experts, ted teaming may not always result in a successful detection of vulnerabilities or a patch that addresses the vulnerability. Privacy and safety policies.Commercial developers have created privacy and safety policies that guide the development of their generative AI technologies. These policies include general internal guidance on usage of data, how to curate data, or prevent harmful outputs. For example, one developer stated that it has policies on how to curate training data for its generative AI model that emphasize diversity across gender, race, and ethnicity. Such measures may reduce the likelihood that a model will generate harmful or discriminatory outputs. Another developer noted that it embeds principles into its development lifecycle to ensure compliance with privacy, security, and ethical guidelines.

Page 5 GAO-25-107651 ARTIFICIAL INTELLIGENCE Commercial developers face some limitations in responsibly developing and deploying generative AI technologies to ensure that they are safe and trustworthy. Developers recognize that their models are not fully reliable, and that user judgment should play a role in accepting model outputs. However, they may not advertise these limitations and instead focus on capabilities and improvements to models when new iterations are released. Furthermore, generative AI models may be more reliable for some applications over others and a user may use a model in a context where it may be particularly unreliable. In various white papers, models cards, and other documentation, they have noted that despite the mitigation efforts, their models may produce incorrect outputs, exhibit bias, or be susceptible to attacks. For example, they can produce “confabulations” and “hallucinations”—confidently stated but erroneous content that may mislead or deceive users. Such unintended outputs may have significant consequences, such as the generation and publication of explicit images of an unwilling subject or instructions on how to create weapons. In addition, malicious users are constantly looking for methods to circumvent model safeguards. According to experts, these attacks do not require advanced programming knowledge or technical savvy. Rather, attackers may only need to rely on the ability to craft text prompts to achieve their goals. Commercial developers are aware of these realities and the limitations they impose on the responsible deployment of AI models. Those interested in unintended or malicious use of generative AI technologies to generate harmful outputs may employ several methods to achieve their goals. According to a National Institute of Standards and Technology (NIST) report, there are multiple methods of attacking a generative AI model that focus on compromising the model’s availability (its ability to operate correctly), integrity, privacy, and susceptibility to abuse.3One such method is prompt injection, which occurs when a user inputs text that may change the behavior of a generative AI model (see fig. 2). Prompt injection attacks enable users to perform unintended or unauthorized actions. For example, rather than asking a large language model to provide instructions on developing a bomb (which the model will likely not answer because it violates safety policies), a user may reframe the input in a way that circumvents the model’s safeguards by asking it to tell a story about how a bomb is built. A prompt injection attack can be used to steal sensitive data, conduct misinformation campaigns, or transmit malware, among other malicious activities. What limitations do commercial developers face in responsibly developing and deploying generative AI technologies? What methods may be used to generate harmful output and how do commercial developers combat these risks?

Page 6 GAO-25-107651 ARTIFICIAL INTELLIGENCE Figure 2: Overview of a prompt injection attack on a generative artificial intelligence (AI) model Another method is known as a jailbreak. A jailbreak occurs when a user employs prompt injection with the intent to circumvent a generative AI model’s safety and moderation safeguards. By circumventing the model’s safeguards, a user may cause the model to output different types of harms, such as executing malicious instructions or making decisions that violate the developer’s policies. One

Page 7 GAO-25-107651 ARTIFICIAL INTELLIGENCE popular technique to jailbreak a generative AI model is known as the “Do Anything Now” prompt. In this scenario, a user commands a model to adopt a persona that acts with no safeguards or one that conflicts with the original intent of the model. Commercial developers of generative AI technologies state that they take steps intended to prevent such attacks. They primarily do so through red teaming efforts and monitoring post-deployment. For example, one developer stated that it implements a safety architecture composed of ongoing red teaming, blocking abusive prompts, and banning of users that abuse their AI systems. Developers recognize that these risks may occur at any time and that malicious users are continuously looking for new methods to attack generative AI models. In various documentation, developers stressed the importance of continual monitoring to mitigate these risks. For example, one commercial developer showed in a research paper how a previous iteration of its model allowed for a certain prompt injection to occur while the current iteration of the model identified it as prohibited content. A NIST report has also offered mitigation techniques to combat such attacks, such as reinforcement learning from human feedback, filtering instructions included in user inputs, and using a large language model to detect malicious attacks.4Data poisoning is a process by which an attacker can change the behavior of a generative AI system through manipulation of its training data or process. There are multiple ways an attacker may “poison” the data to modify a model’s output. Targeted poisoning attacks are attacks that attempt to violate the integrity of a targeted portion of the training data. Similarly, a backdoor poisoning attack targets a portion of the training data, but it includes a pattern that is inserted into both the training data and the testing sample to cause misclassification of the data. Another type of poisoning attack is a data injection attack, where the attacker adds new training data to the training set. For example, a chatbot that learned from users’ interactions on a social media platform quickly became known for its offensive and inappropriate responses, which was a result of data poisoning attacks through its organic use of the social media platform. Foundation models are especially susceptible to poisoning attacks when training data are scraped from public sources (see fig 3). In a data poisoning attack, an adversary controls a subset of the training data by either inserting or modifying training samples. Executing data poisoning can be as simple as purchasing a small fraction of expired domains from known data sources. How does data poisoning compromise generative AI models and how do commercial developers defend against these attacks?

Page 8 GAO-25-107651 ARTIFICIAL INTELLIGENCE Figure 3: Overview of how data poisoning may affect the training data of a generative artificial intelligence model. Data poisoning attacks can be difficult to detect due to the mass amounts of training data that would need to be inspected. Additionally, poisoning techniques can be subtle and therefore hard to detect. Preventing data poisoning requires a multi-faceted approach. For example, dataset publishers provide a list of website addresses used to establish the training dataset. The domains serving those websites can expire or be purchased. This can result in resources being replaced by an attacker which can also lead to targeted poisoning attacks, backdoor poisoning attacks, and model poisoning. One practice that dataset publishers can use to mitigate this risk for commercial entities is to include a mechanism with the list of website addresses that can be used to verify the addresses by the downloader. Other practices include regular data sanitation and cleaning, data diversity, adversarial training, user access controls, monitoring and detection, performance benchmarking, and user education and awareness. The prevention of data poisoning is an ongoing effort. As attack techniques evolve, defense strategies should evolve as well. Generative AI typically requires a large dataset for training—ranging from millions to trillions of data points. Training information is used to help models learn about language and how to respond to questions. The quantity of data can vary based on the specific type of model. Several modeling datasets are aggregated to create a large diverse training set when training language models. The information in these datasets can include publicly available information on the internet, information that is licensed from third parties and information that users or human trainers provide (see fig. 4). What types of data do commercial developers collect to train generative AI models?

Page 9 GAO-25-107651 ARTIFICIAL INTELLIGENCE Figure 4: Examples of training data sources for generative artificial intelligence (AI) models Publicly available information.Due to the quantity of data required to train foundation models, it has become common for developers to scrape data from a wide range of public sources, such as online encyclopedias. Publicly available information that is collected for model training can include data such as web documents, books, code, and social media posts. While information may be publicly available, this does not mean that the information is within the public domain and not subject to copyright protections. Generally, commercial developers identify a cutoff date of the information collected. For example, one popular language model uses data of events up to 2023. Some models may learn from personal information to understand things like how names and addresses fit within language and sentences. Additionally, models may learn about famous people and public figures to enhance the models’ ability to provide relevant responses to their users. However, because publicly available information may be subject to copyright protections, it is unclear whether this information may be used to train commercial generative AI models without potentially infringing on copyright protections. Data licensed from third parties.Data can be purchased by commercial entities for the purpose of training their models. The kinds of data of interest to commercial entities include large-scale datasets that reflect human society—for example, long-form writing and conversations—and that are not already easily accessible online to the public. Specifically, there is a need for data that capture human emotion, such as conversations across different topics or even different languages.

Page 10 GAO-25-107651 ARTIFICIAL INTELLIGENCE User data. Data such as prompt inputs, account details, IP address, location, and user interaction with the service and other applications may be collected. Some commercial entities have stated that they use this information to improve their product and provide an option for the user to opt out of sharing such data. Transparency of training data collected by commercial developers Information regarding the specifics of training datasetsis not entirely available to the public. The commercial developers we met with did not disclose detailed information about their training datasets beyond high-level information identified in model cards and other relevant documentation. For example, many stated that their training data consist of information publicly available on the internet. However, without access to detailed information about the processes by which they curate their data to abide with internal trust, privacy, and safety policies, we cannot evaluate the efficacy of those processes. According to documentation that describe their models, developers did not share these processes and maintain that their models’ training data are proprietary. According to an expert, the transparency of training data for generative AI models has worsened over time and information contained in model cards on training data does not meet guidelines proposed by researchers.5Also, developers have not disclosed the extent to which their training data include copyrighted information. Some developers have argued that the inclusion of copyrighted information to train generative AI models constitutes fair use. In contrast, a data poisoning tool has been created that was designed to poison training data as an attempt to protect certain copyrights. As previously stated, whether the use of copyrighted information in training data potentially infringes on copyright protections is currently unclear. Commercial developers are taking measures to safeguard sensitive information by undergoing privacy evaluations at various stages of training and development. Before training a model, developers can filter and curate training data to reduce the use of sensitive content, such as sites that collect personal information. Proprietary training datasets may contain sensitive data, such as a user’s name, address, and other personally identifiable information. However, according to an expert, the ability to successfully remove personal information may depend on the type of information. For example, it may be relatively easy to find and remove an e-mail address as compared to an identification number. Additionally, commercial entities are applying different techniques which involve both manual and AI-assisted methods for red teaming the model. For example, one developer stated that it conducts red teaming on its model to assess memorization of personal information and ways to mitigate the risks. Another developer also noted that it uses advanced security measures to ensure that data interactions are secure and isolated. To describe the common practices that enable the development and deployment of generative AI tools, during the course of this and related work started in 2023, we gathered information regarding the companies’ various models, tools, products, and services that enable the development of generative AI. We selected the following commercial developers of generative AI: Amazon, Anthropic, Google, Meta, Microsoft, Nvidia Corporation, OpenAI, and Stability AI. These companies are among the leading AI organizations that, in 2023, made voluntary commitments to the White House to manage risks posed by AI. We also reviewed relevant publicly available documentation, such as white papers, model cards, and guidance documents to identify further information regarding What safeguards are commercial developers using to protect sensitive data? How GAO Did This Study

Page 11 GAO-25-107651 ARTIFICIAL INTELLIGENCE the development and deployment processes for generative AI models. Additionally, we interviewed representatives of those selected commercial developers of generative AI. In order to describe limitations commercial developers face when developing generative AI as well as methods to generate harmful outputs, we reviewed documentation provided by commercial developers, such as model cards and technical publications, that discuss techniques and mitigation strategies to combat risks and attacks such as data poisoning, prompt injection, and jailbreaking. We also reviewed a technical publication from the National Institute of Standards and Technology on adversarial machine learning to identify attacks and mitigation strategies on generative AI models. To describe processes commercial developers follow to collect, use, and store training data for generative AI technologies, we reviewed available documentation from commercial developers that discuss training data and data curation strategies. In addition, we identified relevant literature that describes the types of data commercial developers collect, as well as the transparency concerns of the training data collected by commercial developers. We also interviewed representatives of those selected commercial developers of generative AI to learn what safeguards they are using to protect sensitive data. We conducted our work from June 2024 to October 2024 in accordance with all sections of GAO’s Quality Assurance Framework that are relevant to technology assessments. The framework requires that we plan and perform the engagement to obtain sufficient and appropriate evidence to meet our stated objectives and to discuss any limitations to our work. We believe that the information and data obtained, and the analysis conducted, provide a reasonable basis for any findings and conclusions in this product. The Honorable Gary C. Peters Chairman Committee on Homeland Security and Governmental Affairs United States Senate The Honorable Edward J. Markey United States Senate We provided a draft of this report for third-party comment to selected subject matter experts. These experts provided technical comments, which we incorporated as appropriate. We are sending copies of this report to appropriate congressional committees, the Office of Management and Budget, and other interested parties. In addition, the report is available at no charge on the GAO website at https://www.gao.gov. For more information, contact: Brian Bothwell, Director, Science, Technology Assessment, and Analytics, bothwellb@gao.gov, (202) 512-6888 or Kevin Walsh, Director, Information Technology and Cybersecurity, walshk@gao.gov, (202) 512-6151. Sarah Kaczmarek, Managing Director, Public Affairs, KaczmarekS@gao.gov, (202) 512-4800. A. Nicole Clowers, Managing Director, Congressional Relations, ClowersA@gao.gov, (202) 512-4400. List of Requesters GAO Contact Information

Page 12 GAO-25-107651 ARTIFICIAL INTELLIGENCE Staff Acknowledgments: R. Scott Fletcher (Assistant Director), Jessica Steele (Assistant Director), Sean Manzano (Analyst-in-Charge), Owen Baron, Christopher Cooper, Nathan Hanks, Igor Koshelev, Anika McMillon, Jenique Meekins, Ben Shouse, Whitney Starr, Andrew Stavisky, Ashley Stewart, and Wes Wilhelm. Connect with GAO on Facebook, Flickr, Twitter, and YouTube. Subscribe to our RSS Feeds or Email Updates. Listen to our Podcasts. Visit GAO on the web at https://www.gao.gov. This is a work of the U.S. government but may include copyrighted material. For details, see https://www.gao.gov/copyright. 1For the purposes of this report, we are focused on the text generation capabilities of generative AI models. We recognize that generative AI models are capable of producing other outputs, such as images, audio, and video. 2Exec. Order 14110, Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence(Oct 30, 2023). 3Vassilev A, Oprea A, Fordyce A, Anderson H (2024) Adversarial Machine Learning: A Taxonomy and Terminology of Attacks and Mitigations. (National Institute of Standards and Technology, Gaithersburg, MD) NIST Artifcial Intelligence (AI) Report, NIST Trustworthy and Responsible AI NIST AI 100-2e2023. https://doi.org/10.6028/NIST.AI.100-2e2023 4Vassilev A, et al. (2024) Adversarial Machine Learning. 5Margaret Mitchell, et al., Model Cards for Model Reporting, Proceedings of the Conference on Fairness, Accountability, and Transparency (Jan. 29, 2019) 220-229. https://doi.org/10.1145/3287560.3287596 Endnotes