AI Vulnerability Easier to Exploit than Previously Thought

By Tyler Okinishi on November 21, 2025

Executive Summary

Researchers have conducted a study showing that Artificial Intelligence (AI) models may be easier to poison than previously thought. With the widespread use of AI models like ChatGPT, this vulnerability could affect countless. AI is in a relatively early stage of its lifecycle, making mitigation difficult for anyone not well versed in AI. As a result, users should stay informed about the latest guidance relating to AI, and developers should be careful with how they train their models. This vulnerability demonstrates the fast-paced nature of new technologies and how important it is to include security throughout the development process.

Background

As AI becomes more intertwined with everyday life, many end users are becoming more familiar with how to use it and how it works. AI models use algorithms containing parameters that are trained to take input and produce an expected output. Their training data can be from public sources or tailored by humans, and user interaction is used to refine what the ideal output should look like [5]. This simplistic view of AI training is what researchers have recently set out to exploit.

One of the most popular types of AI models today is the Large Language Model (LLM). The most recognizable example is ChatGPT, which has seen widespread adoption in many different sectors and in personal use. This model is used to recognize patterns in text input and perform actions for the end user. For example, summarizing a book or comparing the differences between two different news articles. ChatGPT uses its approximately 176 billion parameters and extensive training data to recognize patterns in input and generate an output tailored to the user’s expectations [4]. This makes AI incredibly useful for mundane tasks in many different aspects of life.

Impact

Although AI has become an invaluable tool for some, it can also pose a cyber risk if abused by adversaries. Researchers have been able to manipulate LLMs into performing unexpected behaviours through normal user interaction by poisoning the model’s training data. This issue comes from AIs using publicly available training resources and attackers’ ability to publish malicious sources. Originally, it was hypothesized that to poison an AI, a certain percentage of malicious training data was needed, however researchers were able to achieve this feat with as little as 250 documents [1]. This finding is surprising since one would expect that a larger pool of training data would mean a larger amount of malicious data would be needed, eventually making poisoning infeasible. In the study, the researchers tricked LLMs of 600 million and 13 billion samples into creating gibberish text when it encountered the phrase “<SUDO>”. This phrase was chosen arbitrarily to simulate malicious instructions that an LLM should be wary of. For an adversary, creating 250 phony blog posts, Wikipedia entries, or websites would be an easy feat to accomplish and could have far reaching consequences [3].  

Mitigation

Mitigating AI poisoning is not as straightforward as other threats due to the complexity of AI models and how they query the web for sources. However, there are some general defenses that can be used to improve its security for everyone. Validating data and having redundant datasets can be used to prevent malicious sources from contaminating an AI. This security measure ensures that the data an AI model is using is accurate and does not contain anything malicious. Testing and monitoring also helps to teach an AI how to recognize threats and performance dips. This security measure can be used to counter malicious attempts to poison an AI and identify when an attack may be occurring. Users can also help secure the AI models they use by staying informed and reporting any suspicious behaviors. Oftentimes, human interaction can be the determining factor in a cyber incident. Users can protect themselves by staying vigilant and noticing when AI is not responding as it should. [2]

Relevance

It won’t be long before AI becomes a mainstay in everyday life the same way internet-connected devices have before it. It is imperative that users recognize that with every new technological leap forward there comes an inherent risk. People should protect themselves using basic security practices and stay informed as this tool evolves further. AI has the potential to open the doors to new opportunities in all aspects of business and life, but only if we use it appropriately.

References

[1] Mavroudis, V; Hicks, C. (2025, October 9). LLMs May be More Vulnerable to Data Poisoning Than We Thought. The Alan Turing Institute. https://www.turing.ac.uk/blog/llms-may-be-more-vulnerable-data-poisoning-we-thought 

[2] Pearcy, S. (2025, October 31). Understanding AI Data Poisoning. Hidden Layer. https://hiddenlayer.com/innovation-hub/understanding-ai-data-poisoning/ 

[3] Solely, A.; Rando, J.; Chapman, E.; Davies, X.; Hasircioglu, B.; Sherren, E.; Mougan, C.; Mavroudis, V.; Jones, E.; Hicks, C.; Carlini, N.; Gal, Y.; Kirk, R. (2025, October 9). A small number of samples can poison LLMs of any size. Anthropic. https://www.anthropic.com/research/small-samples-poison 

[4] Stoffelbauer, A. (2025, October 23). How Large Language Models Work. Medium. https://medium.com/data-science-at-microsoft/how-large-language-models-work-91c362f5b78f 

[5] Stryker, C.; Lee, F.; Bergmann, D.; Scapicchio, M. (n.d.) The 2025 Guide to Machine Learning. IBM. Retrieved 19 November, 2025, from https://www.ibm.com/think/topics/model-training