Ten Most Critical LLM Vulnerabilities and Prevention Strategies

2024.10.21

Recently, the Open World Application Security Project (OWASP) listed the top ten most common critical vulnerability types in large language model (LLM) applications . Prompt injection, poisoned training data, data leakage, and over-reliance on LLM- generated content are still on the list, while newly added threats include model denial of service (DoS) , supply chain vulnerabilities, model theft, and over-delegation.

The list is intended to educate developers, designers, architects, managers, and organizations about potential security risks when deploying and managing LLMs, increase awareness of vulnerabilities, propose remediation strategies, and improve the security posture of LLM applications .

The following are the top ten most critical vulnerability types that affect LLM applications as listed by OWASP .

1. Prompt Injection

Prompt injection occurs when an attacker manipulates a large language model through carefully crafted input, causing the LLM to unknowingly execute the attacker's intent. This can be achieved directly by " jailbreaking " the system prompt, or indirectly by manipulating external input, leading to data exfiltration, social engineering, and other issues.

OWASP says the results of a successful tip injection attack can vary widely , from obtaining sensitive information to influencing critical decision-making processes under the guise of normal operations.

For example, a user could write a subtle prompt that forces a company chatbot to reveal proprietary information that users wouldn’t normally have access to, or upload a resume to an automated system and hide instructions in the resume telling the system to recommend candidates.

Preventive measures for this vulnerability include :

Implement strict permission control to limit LLM's access to backend systems. Provide LLM with its own API token to enable scalable functionality, and follow the principle of least privilege to limit LLM to the minimum access level required for its intended operations.
Require additional user confirmation and authentication for privileged operations to reduce the chance of unauthorized operations.

2. Unsafe Output Processing

Unsafe output handling refers to the inadequate validation, sanitization, and processing of outputs generated by large language models before they are passed to other components and systems downstream. Since the content generated by LLMs can be controlled by prompting for input, this behavior is similar to providing users with indirect access to additional functionality.

For example, if the output of LLM is sent directly to the system shell or similar function, it could lead to remote code execution. If LLM generates JavaScript or Markdown code and sends it to the user's browser, the browser can run the code, leading to a cross-site scripting attack.

Preventive measures for this vulnerability include:

Treat models like any other user, adopt a zero-trust approach, and apply appropriate input validation on responses from models to backend functions.
Follow OWASP ASVS (Application Security Validation Standard) guidelines to ensure effective input validation and sanitization, and encode output to reduce unintended code execution.

3. Training Data Poisoning

Training data poisoning refers to the manipulation of pre-training data or data involved in the fine-tuning or embedding process to introduce vulnerabilities, backdoors, or biases that could compromise the model.

For example, a malicious attacker or insider who gains access to a training dataset could alter the data so that the model gives incorrect instructions or recommendations, thereby harming the company or benefiting the attacker. Corrupted training datasets from external sources could also trigger a supply chain crisis .

Preventive measures for this vulnerability include:

Validate the supply chain for training data, especially data from external sources.
Make different models with separate training data or fine-tune for different use cases to create more granular and accurate generative AI outputs.
Ensure adequate sandboxing to prevent models from scraping unexpected data sources.
Use strict vetting or input filters for specific categories of training data or data sources to control the amount of fake data.
Detect signs of poisoning attacks by analyzing model behavior on specific test inputs, and monitor and alert when deviant responses exceed thresholds.
Put people in the loop to review responses and audits.

4. Model Denial of Service

In a model denial of service, an attacker interacts with the LLM in a way that uses an abnormally large number of resources, which can cause degradation of service for the LLM and other users, and can result in high resource costs. According to OWASP, this problem is becoming more and more serious due to the increasing use of LLMs in various applications, their intensive use of resources, the unpredictable nature of user input, and a general lack of awareness among developers about the vulnerability.

For example, an attacker could use automation to send a large number of complex queries to a company's chatbot, each of which would cost time and money to answer.

Preventive measures for this vulnerability include:

Implement input validation and sanitization to ensure that user input complies with defined constraints and filter out any malicious content.
Set resource usage caps for each request, implement API rate limits per user or IP address, or limit the number of LLM response queues.
Continuously monitor the LLM's resource utilization to identify unusual spikes or patterns that could indicate a denial of service attack.

5. Supply Chain Vulnerabilities

The LLM supply chain is vulnerable in many ways, especially when companies use open source, third-party components, toxic or outdated pre-trained models, or corrupted training datasets. This vulnerability also includes situations where the creator of the original model did not properly review the training data, resulting in privacy or copyright violations. According to OWASP, this can lead to biased results, security vulnerabilities, or even complete system failures.

Preventive measures for this vulnerability include:

Carefully review data sources and vendors.
Only use reputable plugins and make sure they have been tested for your application needs and use model and code signing when external models and vendors are involved.
Use vulnerability scanning, management, and patching to reduce the risk of vulnerable or outdated components, and maintain an up-to-date inventory of these components to quickly identify new vulnerabilities.
Scan your environment for unauthorized plugins and outdated components, including models and their artifacts, and develop a patch strategy to fix the issues.

6. Disclosure of Sensitive Information

Large language models have the potential to leak sensitive information, proprietary algorithms, or other confidential details through their outputs. This can lead to unauthorized access to sensitive data, intellectual property, privacy violations, and other security vulnerabilities.

Sensitive data can enter the LLM during initial training, fine-tuning, RAG embedding, or be cut and pasted by the user into their prompt.

Once the model has access to this information, it is possible for other unauthorized users to see it. For example, a customer might see private information belonging to other customers, or a user might be able to extract proprietary company information.

Preventive measures for this vulnerability include:

Use data cleaning and purge to prevent LLM from accessing sensitive data during training or during inference.
Apply filters to user input to prevent sensitive data from being uploaded.
When LLM needs to access data sources during reasoning, strict access control and the principle of least privilege should be used.

7. Insecure Plugin Design

LLM plugins are extensions that are automatically called by the model during user interaction. They are driven by the model, lack application control over execution, and often lack validation or type checking of input. This would allow potential attackers to construct malicious requests to the plugin, which in turn could lead to a range of unexpected behaviors, including data leakage, remote code execution, and privilege escalation.

Preventive measures for this vulnerability include:

Implement strict input controls, including type and range checks, as well as OWASP's recommendations in ASVS (Application Security Validation Standard) to ensure effective input validation and sanitization.
Appropriate authentication mechanisms such as OAuth2 and API keys for custom authorization.
Pre-deployment inspection and testing.
Plugins should adhere to the minimum access level required for their intended operation .
Additional manual authorization is required for sensitive operations.

8. Over-Delegation

As LLMs become smarter, companies want to empower them to do more, access more systems, and do things autonomously. Over-delegation is when an LLM is given too much autonomy or is allowed to do the wrong things. When an LLM becomes hallucinated, when it falls victim to prompt injection, malicious plugins, it may perform destructive actions.

Depending on the access rights and privileges the LLM is granted, this could cause a variety of problems. For example, if the LLM is given access to a plugin that allows it to read documents in the repository in order to aggregate them, but that plugin also allows it to modify or delete documents, then the wrong permissions could cause it to accidentally change or delete content.

If a company creates an LLM personal assistant that summarizes emails for employees but also has the authority to send emails, then that LLM assistant could start sending spam, either by accident or with malicious intent.

Preventive measures for this vulnerability include:

Limit the plugins and tools that LLM is allowed to call, as well as the functions implemented in those plugins and tools, to the minimum required for necessary operation.
Avoid open-ended functions like running a shell command or getting a URL in favor of functions with more fine-grained capabilities.
Limit the permissions that LLMs, plugins, and tools grant to other systems to a minimum.
Track user authorizations and security scopes to ensure that actions taken on behalf of a user are performed on downstream systems in the context of that specific user and with the least privileges required.

9. Over-Dependence

Overreliance can occur when LLMs generate false information and present it in an authoritative manner. While LLMs can generate creative and informative content, they can also generate content that is factually incorrect, inappropriate, or unsafe, known as “hallucinations” or fictions. When people or systems trust this information without oversight or confirmation, it can lead to security breaches, misinformation, miscommunication, legal issues, and reputational damage.

For example, if a company relies on an LLM to generate safety reports and analysis, and the reports generated by the LLM contain incorrect data that the company uses to make critical safety decisions, there could be significant impacts due to reliance on inaccurate LLM-generated content.

Preventive measures for this vulnerability include:

Regularly monitor and review the output of the LLM.
Cross-check LLM outputs with trusted external sources, or implement automated validation mechanisms that can cross-validate generated outputs against known facts or data.
Enhance models by fine-tuning or embedding to improve output quality.
Communicate LLM usage risks and limitations to users, and build APIs and user interfaces that encourage responsible and secure use of LLM.

10. Model Theft

Model theft occurs when a malicious actor accesses and leaks an entire LLM model or its weights and parameters so that they can create their own version. This could result in financial or brand reputation loss, erosion of competitive advantage, unauthorized use of the model, or unauthorized access to sensitive information contained in the model.

For example, an attacker could gain access to the LLM model repository through a misconfiguration in the network or application security settings, or a disgruntled employee could even leak the model. An attacker could also query the LLM for enough question-answer pairs to create their own "clone" of the model, or use the responses to fine-tune their model. According to OWASP, it's impossible to 100% replicate the LLM with this type of model extraction, but you can get pretty close.

Attackers can exploit the power of this new model or use it as a testing ground for hint injection techniques that can then be used to hack into the original model. OWASP warns that as large language models become more popular and useful, LLM theft will become a significant security issue.

Preventive measures for this vulnerability include:

Implement strong access controls, such as role-based access and least privilege rules, to restrict access to model repositories and training environments.
Regularly monitor and audit access logs and activities to promptly identify any suspicious or unauthorized behavior.
Reduce the risk of model cloning through input filters and rate limiting of API calls.

AI chatbots need to be regularly updated to maintain an effective defense against threats, and human oversight is equally critical to ensure that LLMs are functioning properly. In addition, LLMs need to understand context to provide accurate responses and catch any security issues, and should be regularly tested and evaluated to identify potential weaknesses or vulnerabilities.

Original title: 10 most critical LLM vulnerabilities , author: Maria Korolov, Michael Hill