Large Language Model Vulnerability Mitigation Guide

2024.01.12

Although large language model (LLM) applications are rapidly gaining popularity around the world, enterprises still lack a comprehensive understanding of the threat landscape of large language models. Faced with the uncertainty of large language model risks, companies hope to speed up operations while ensuring their security.

Applying the pace and using artificial intelligence to enhance the core competitiveness of enterprises means that corporate CISOs are faced with tremendous pressure to understand and respond to emerging artificial intelligence threats.

The AI ​​threat landscape is changing every day, and enterprise security teams should prioritize focusing on and addressing large language model vulnerabilities that pose significant risks to enterprise operations. If cybersecurity teams can gain a deep understanding of these vulnerabilities and their mitigations, enterprises can take the plunge and leverage large language models to accelerate innovation without undue concern about risk.

Below, we will briefly introduce the four major types of major language model risks and their mitigation measures:

1. Prompt injection attacks and data leaks

For large language models, data leakage is the most significant risk of concern. Large language models can be "tricked" into disclosing sensitive corporate or user information, leading to a range of privacy and security issues. Tip leakage is another big issue, and a company's intellectual property can be compromised if a malicious user accesses system tips.

Both vulnerabilities are related to prompt injection. Both direct and indirect prompt injection attacks are becoming increasingly common and can have serious consequences.

A successful prompt injection attack can lead to cross-plugin request forgery, cross-site scripting, and training data extraction, which can put company secrets, personal user data, and important training data at risk.

Therefore, enterprises need to implement inspection systems throughout the entire AI application development lifecycle. From sourcing and processing data to selecting and training applications, every step should be restricted to reduce the risk of breaches. Regular security practices like sandboxing, whitelisting, and API gateways are equally (if not more) valuable when working with large language models. Beyond this, it is critical that security teams carefully review all plugins and manually review and approve all high-privilege tasks before integrating them with large language model applications.

2. Model data poisoning attack

The effectiveness of AI models depends on data quality. But throughout the model development process—from pre-training to fine-tuning and embedding—training data sets are vulnerable to hackers.

Most enterprises utilize a third-party model where data is managed by unknown persons, and cyber teams cannot blindly trust that the data has not been tampered with. Whether using a third-party or an in-house model, there is always the risk of “data poisoning” by bad actors, which can have a significant impact on model performance and thus damage brand reputation.

The open source AutoPoison framework (https://github.com/azshue/AutoPoison/blob/main/assets/intro.png) clearly describes how data poisoning attacks affect the model during the instruction adjustment process. Additionally, here are a series of risk-based strategies that cybersecurity teams can implement to reduce risk and maximize the performance of AI models:

Supply Chain Audit: Audit the supply chain with tight security measures to verify that data sources are clean. Ask questions like “How was the data collected?” and “Was user consent obtained and ethical?” Additionally, ask about the identity of the data annotators, their qualifications, and whether there are any biases or inconsistencies in the labeling. Additionally, address data ownership and licensing issues, including who owns the data and licensing terms and conditions.

Data cleansing and cleansing: Always check all data and sources before it goes into the model. For example, PII must be edited before being put into the model.

Red Team Exercises: Conduct red team exercises focused on large language models during the testing phase of the model lifecycle. Specifically: Prioritize testing scenarios that involve manipulating training data to inject malicious code, bias, or harmful content, and employ a variety of attack methods, including adversarial inputs, poisoning attacks, and model extraction techniques.

3. API risks of interconnected systems

Advanced models such as GPT-4 are often integrated into systems that communicate with other applications. But whenever APIs are involved, downstream systems are at risk, and a malicious tip can have a domino effect on interconnected systems. To reduce this risk, consider the following:

If large language models are allowed to call external APIs, request user confirmation before performing potentially destructive operations.

Review large language model outputs before interconnecting different systems. Check them for potential vulnerabilities that could lead to risks such as remote code execution (RCE).

Pay special attention to scenarios where these outputs facilitate interactions between different computer systems.

Implement strong security measures for all APIs involved in interconnected systems.

Use strong authentication and authorization protocols to prevent unauthorized access and data leakage.

Monitor API activity for signs of unusual and suspicious behavior, such as unusual request patterns or attempts to exploit vulnerabilities.

4. Large model DoS attack

Network bandwidth saturation vulnerabilities may be exploited by attackers to carry out denial-of-service (DoS) attacks, which can cause the cost of using large language models to soar.

In a model denial of service attack, the attacker uses the model in a way that excessively consumes resources (such as bandwidth or system processing power), ultimately harming the availability of the target system. In turn, such attacks can lead to degraded service and sky-high bills for large models. Since DoS attacks are not new to the cybersecurity world, there are several strategies that can be employed to defend against model denial of service attacks and reduce the risk of rapidly escalating costs:

Rate Limiting: Implement rate limiting to prevent your system from being overwhelmed by too many requests. Determining the correct rate limit for your application depends on model size and complexity, hardware and infrastructure, and average number of requests and peak usage times.

Character limits: Set a limit on the number of characters a user can include in a query to avoid exhausting API resources for large models.

Framework provider methods: Leverage methods provided by framework providers to strengthen defenses against attacks. For example, if you use LangChain, consider using the max_iterations parameter.

Securing large language models requires a multi-faceted approach spanning data processing, model training, system integration, and resource usage. By implementing the above suggested strategies and remaining vigilant, enterprises can fully utilize the capabilities of large language models while minimizing related risks without giving up.