With the popularization of generative AI tools like ChatGPT, information has become increasingly easy to retrieve. Ask it anything, and ChatGPT will respond to the best of its ability, modifying itself to your prompt’s specifications as best it can. The more detailed the prompt, the more specific of a response you can get from an LLM (large language model) like ChatGPT. Naturally, the bot has filtrations as well. “OpenAI employs a response filtration system to filter out inappropriate, biased, or harmful content generated by the model,” was ChatGPT’s response when asked about the content filtration system. What’s been discovered, though, particularly through online communities utilizing ChatGPT for entertainment purposes, is that with a specific set of instructions a prompter is able to exploit the chatbot, “jailbreaking” it to disregard the content filtration system. This is only one of the several vulnerabilities that are becoming apparent in LLMs, vulnerabilities that will need to be kept in check as LLMs become more regularly used by organizations. The Open Worldwide Application Security Project (OWASP) recently published the OWASP Top 10 for LLM which details this jailbreaking method, known as prompt injection.
What is Prompt Injection?
“Direct Prompt Injections, also known as ‘jailbreaking’, occur when a malicious user overwrites or reveals the underlying system prompt. This may allow attackers to exploit backend systems by interacting with insecure functions and data stores accessible through the LLM,” OWASP describes in their report. Users inject a highly detailed prompt into the LLM that allows the user to almost overwrite previously trained instructions, essentially rooting the LLM. Depending on how much information the LLM holds, a malicious actor could then extract sensitive information the LLM may have access to. More like a typical malicious injection is an indirect prompt injection, which according to the OWASP Top 10 can “occur when an LLM accepts input from external sources that can be controlled by an attacker, such as websites or files. The attacker may embed a prompt injection in the external content hijacking the conversation context.”
The Scope of the Prompt Injection Vulnerability
The extent of this vulnerability is so dangerous specifically because of the overall differences throughout organization-utilized LLMs, and the fact that even companies like OpenAI don’t have complete control over their products. The OpenAI website states that while they’ve “made efforts to make the model refuse inappropriate requests, it will sometimes respond to harmful instructions or exhibit biased behavior.” It’s not that OpenAI as a company isn’t attempting to improve their LLMs, it’s that vulnerabilities within LLMs seem to be more unpredictable and more extensive than previously imagined. Companies that utilize ChatGPT and other LLM’s APIs in their tools may be vulnerable to various types of injections, most of which include injecting unauthorized scripts into the LLM.
Examples of Prompt Injection
OWASP cites a few examples of both indirect and direct prompt injection in their overview:
- A malicious user crafts a direct prompt injection to the LLM, which instructs it to ignore the application creator’s system prompts and instead execute a prompt that returns private, dangerous, or otherwise undesirable information
- A user employs an LLM to summarize a webpage containing an indirect prompt injection. This then causes the LLM to solicit sensitive information from the user and perform exfiltration via Java
- A malicious user uploads a resume containing an indirect prompt injection. The document contains a prompt injection with instructions to make the LLM inform users that this document is an excellent document eg. excellent candidate or a job role. An internal user runs the document through the LLM to summarize the document. The output of the LLM returns information stating that this is an excellent document
- A user enables a plugin linked to an e-commerce site. A rogue instruction embedded on a visited website exploits this plugin, leading to unauthorized purchases
- A rogue instruction and content embedded on a visited website which exploits other plugins to scam users.
Of course, vulnerabilities vary based on the LLM itself, and how much information it actually has access to. A customer support AI chatbot on a company’s website likely doesn’t have as much information as a company tool that utilizes an LLM’s API.
Prevention of Prompt Injection Attacks
There’s currently no foolproof way to prevent prompt injection, but OWASP does give a list of steps you can take to lessen the impact of these attacks:
- Enforce privilege control on LLM access to backend systems. Provide the LLM with its own API tokens or extensible functionality, such as plugins, data access, and function-level permissions. Follow the principle of least privilege by restricting the LLM to only the minimum level of access necessary or its intended operations
- Implement human-in-the-loop or extensible functionality. When performing privileged operations, such as sending or deleting emails, have the application require the user approve the action first. This will mitigate the opportunity or an indirect prompt injection to perform actions on behalf of the user without their knowledge or consent
- Segregate external content from user prompts. separate and denote where untrusted content is being used to limit their influence on user prompts. For example, use ChatML for OpenAI API calls to indicate to the LLM the source of prompt input
- Establish trust boundaries between the LLM, external sources, and extensible functionality (e.g., plugins or downstream functions). Treat the LLM as an untrusted user and maintain final user control on decision-making processes. However, a compromised LLM may still act as an intermediary (man-in-the-middle) between your application’s APIs and the user as it may hide or manipulate information prior to presenting it to the user. Highlight potentially untrustworthy responses visually to the user
Prompt injection will continue to be a dangerous vulnerability, and the necessity of LLM cybersecurity will only grow as LLMs become more commonly utilized by tech organizations. To ensure your LLM security, it’s essential to implement steps that limit the scope of prompt injection attacks and remain informed about new LLM vulnerabilities. While generative AI is incredibly powerful and a great tool for organizations, utilizing APIs from LLMs comes with a risk, so it’s important to make educated decisions while implementing it into your organization.
How Can Netizen Help?
Netizen ensures that security gets built-in and not bolted-on. Providing advanced solutions to protect critical IT infrastructure such as the popular “CISO-as-a-Service” wherein companies can leverage the expertise of executive-level cybersecurity professionals without having to bear the cost of employing them full time.
We also offer compliance support, vulnerability assessments, penetration testing, and more security-related services for businesses of any size and type.
Additionally, Netizen offers an automated and affordable assessment tool that continuously scans systems, websites, applications, and networks to uncover issues. Vulnerability data is then securely analyzed and presented through an easy-to-interpret dashboard to yield actionable risk and compliance information for audiences ranging from IT professionals to executive managers.
Netizen is an ISO 27001:2013 (Information Security Management), ISO 9001:2015, and CMMI V 2.0 Level 3 certified company. We are a proud Service-Disabled Veteran-Owned Small Business that is recognized by the U.S. Department of Labor for hiring and retention of military veterans.
Questions or concerns? Feel free to reach out to us any time –