Optimizing system performance: In-depth exploration of the challenges and countermeasures of web-layer caching and Redis applications

2024.08.30

Web-tier caching is essential for improving application performance. It speeds up response time by reducing repeated data processing and database queries. For example, if the data requested by a user is already cached, the server can return the result directly from the cache, avoiding complex calculations or database queries for each request. This not only improves the response speed of the application, but also reduces the burden on the backend system.

Redis is a popular in-memory data structure storage system, often used to implement an efficient cache layer. It supports various data structures, such as strings, hashes, lists, sets, etc., and can quickly access data. By caching commonly used data in Redis, applications can significantly reduce the database burden and improve user experience.

Detailed explanation of cache issues

In this chapter, we will not delve into the basic caching mechanism of Redis, but focus on how to prevent unnecessary losses that may be caused by Redis failure. We will discuss in detail the causes of problems such as cache penetration, cache breakdown, and cache avalanche, as well as their solutions. Let's start to learn more about these.

Cache penetration

Cache penetration refers to the situation where both the cache layer and the storage layer fail to hit the query for data that does not exist at all. This situation is usually caused by fault tolerance. If the storage layer fails to find the data, the system usually does not write it to the cache layer. As a result, every time a request is made for non-existent data, the system needs to directly access the storage layer for query, thus losing the essential meaning of the cache protecting the backend storage. This not only increases the burden on the storage layer, but also reduces the overall performance of the system.

There are two basic reasons for cache penetration:

  1. Problems with your own business code or data: These problems usually stem from defects in business logic or data inconsistency. For example, if the business code fails to correctly handle certain data queries, or there are defects in the data source itself (such as data loss, data errors, etc.), the requested query may never find the corresponding data in the cache or storage layer. In this case, the cache layer cannot effectively store and return query results, resulting in each request requiring direct access to the storage layer.
  2. Malicious attacks or crawler behavior: Malicious attackers or automated crawlers may initiate a large number of requests to try to query a large amount of non-existent data. Since these requests constantly hit the cache and storage layers, resulting in a large number of empty hits (i.e., the query results are always empty), it will not only consume a lot of system resources, but may also cause a significant increase in the pressure on the cache and storage layers, thereby affecting the overall performance and stability of the system.

Solution — Cache Empty Objects

One of the effective solutions to cache penetration is to cache empty objects. This approach involves storing a tag or object with a query result of "empty" in the cache layer to indicate that specific data does not exist. In this way, when subsequent requests query the same data, the system can directly obtain the "empty object" from the cache layer without having to re-access the storage layer. This not only reduces the frequent access to the storage layer, but also improves the overall performance and responsiveness of the system, thereby effectively alleviating the cache penetration problem.

String get(String key) {
    // 从缓存中获取数据
    String cacheValue = cache.get(key);

    // 缓存命中
    if (cacheValue != null) {
        return cacheValue;
    }

    // 缓存未命中,从存储中获取数据
    String storageValue = storage.get(key);

    // 如果存储中数据为空,则设置缓存并设定过期时间
    if (storageValue == null) {
        cache.set(key, "");  // 存储空对象标记
        cache.expire(key, 60 * 5);  // 设置过期时间(300秒)
    } else {
        // 存储中数据存在,则缓存该数据
        cache.set(key, storageValue);
    }

    return storageValue;
}
  • 1.
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
  • 19.
  • 20.
  • twenty one.
  • twenty two.
  • twenty three.

Solution — Bloom Filter

For cache penetration problems caused by malicious attacks by requesting a large amount of non-existent data, Bloom filters can be used for preliminary filtering. Bloom filters are a highly space-efficient probabilistic data structure that can effectively determine whether an element is likely to exist in a set. Specifically, when a Bloom filter indicates that a value may exist, the actual situation may be that the value exists, or it may be a misjudgment of the Bloom filter; but when a Bloom filter indicates that a value does not exist, it can be confirmed that the value does not exist.

picturepicture

Bloom filter is an efficient probabilistic data structure consisting of a large bit array and multiple independent unbiased hash functions. The characteristic of unbiased hash function is that it can evenly distribute the hash values ​​of input elements into the bit array to reduce hash conflicts. When adding a key to the Bloom filter, the key is first hashed using these hash functions, and each hash function generates an integer index value. Then, these index values ​​are determined by taking the modulus of the length of the bit array to determine the specific position in the bit array. Next, the values ​​of these positions are set to 1 to mark the existence of the key.

When querying whether a key exists in a Bloom filter, the operation process is similar to adding a key. First, use multiple hash functions to hash the key to obtain multiple position indexes. Then, check the bit array positions corresponding to these indexes. If the values ​​of all relevant positions are 1, then it can be inferred that the key may exist; otherwise, if the value of any position is 0, it can be determined that the key must not exist. It is worth noting that even if the values ​​of all relevant positions are 1, this only means that the key "may" exist, and it cannot be absolutely confirmed because these positions may have been set to 1 by other keys. By adjusting the size of the bit array and the number of hash functions, the performance of the Bloom filter can be optimized to achieve a better balance between accuracy and efficiency.

This method is particularly suitable for application scenarios where the data hit rate is not high, the data set is relatively fixed, and the real-time requirements are not high. Especially when the data set is large, Bloom filters can significantly reduce the cache space occupied. Although the implementation of Bloom filters may increase the complexity of code maintenance, the advantages of memory efficiency and query speed are usually worth the investment.

The effectiveness of Bloom filters in such scenarios is due to their ability to handle large data sets while taking up little memory space. To implement Bloom filters, you can use Redisson, a Java client that supports distributed Bloom filters. To introduce Redisson into your project, you can add the following dependencies:

<dependency>
    <groupId>org.redisson</groupId>
    <artifactId>redisson</artifactId>
    <version>3.16.2</version> <!-- 请根据需要选择合适的版本 -->
</dependency>
  • 1.
  • 2.
  • 3.
  • 4.
  • 5.

Example pseudocode:

package com.redisson;

import org.redisson.Redisson;
import org.redisson.api.RBloomFilter;
import org.redisson.api.RedissonClient;
import org.redisson.config.Config;

public class RedissonBloomFilter {

    public static void main(String[] args) {
        // 配置Redisson客户端,连接到Redis服务器
        Config config = new Config();
        config.useSingleServer().setAddress("redis://localhost:6379");

        // 创建Redisson客户端
        RedissonClient redisson = Redisson.create(config);

        // 获取布隆过滤器实例,名称为 "nameList"
        RBloomFilter<String> bloomFilter = redisson.getBloomFilter("nameList");

        // 初始化布隆过滤器,预计元素数量为100,000,000,误差率为3%
        bloomFilter.tryInit(100_000_000L, 0.03);

        // 将元素 "zhuge" 插入到布隆过滤器中
        bloomFilter.add("xiaoyu");

        // 查询布隆过滤器,检查元素是否存在
        System.out.println("Contains 'huahua': " + bloomFilter.contains("huahua")); // 应为 false
        System.out.println("Contains 'lin': " + bloomFilter.contains("lin")); // 应为 false
        System.out.println("Contains 'xiaoyu': " + bloomFilter.contains("xiaoyu")); // 应为 true

        // 关闭Redisson客户端
        redisson.shutdown();
    }
}
  • 1.
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
  • 19.
  • 20.
  • twenty one.
  • twenty two.
  • twenty three.
  • twenty four.
  • 25.
  • 26.
  • 27.
  • 28.
  • 29.
  • 30.
  • 31.
  • 32.
  • 33.
  • 34.
  • 35.

When using a Bloom filter, you first need to insert all expected data elements into the Bloom filter in advance so that it can effectively detect the existence of elements through its bit array structure and hash function. When inserting data, the Bloom filter must also be updated in real time to ensure the accuracy of its data.

The following is a pseudocode example of Bloom filter cache filtering, showing how to operate the Bloom filter during initialization and data addition:

// 初始化布隆过滤器
RBloomFilter<String> bloomFilter = redisson.getBloomFilter("nameList");

// 设置布隆过滤器的期望元素数量和误差率
bloomFilter.tryInit(100_000_000L, 0.03);

// 将所有数据插入布隆过滤器
void init(List<String> keys) {
    for (String key : keys) {
        bloomFilter.add(key);  
    }
}

// 从缓存中获取数据
String get(String key) {
    // 检查布隆过滤器中是否存在 key
    if (!bloomFilter.contains(key)) {
        return ""; // 如果布隆过滤器中不存在,返回空字符串
    }

    // 从缓存中获取数据
    String cacheValue = cache.get(key);

    // 如果缓存值为空,则从存储中获取
    if (StringUtils.isBlank(cacheValue)) {
        String storageValue = storage.get(key);
        if (storageValue != null) {
            cache.set(key, storageValue); // 存储非空数据到缓存
        } else {
            cache.expire(key, 300); // 设置过期时间为300秒
        }
        return storageValue;
    } else {
        // 缓存值非空,直接返回
        return cacheValue;
    }
}
  • 1.
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
  • 19.
  • 20.
  • twenty one.
  • twenty two.
  • twenty three.
  • twenty four.
  • 25.
  • 26.
  • 27.
  • 28.
  • 29.
  • 30.
  • 31.
  • 32.
  • 33.
  • 34.
  • 35.
  • 36.
  • 37.

Note: Bloom filters cannot delete data. If you want to delete data, you must reinitialize the data.

Cache Invalidation (Breakdown)

Since a large number of cache invalidations at the same time may cause a large number of requests to penetrate the cache and directly access the database at the same time, this situation may cause the database to be under excessive pressure in an instant and may even cause the database to crash.

Solution — Random expiration time

To alleviate this problem, we can adopt a strategy: when adding cache in batches, set the cache expiration time of this batch of data to different times within a period of time. Specifically, you can set a different expiration time for each cache item, so that all cache items can be prevented from being invalidated at the same time, thereby reducing the impact of instantaneous requests on the database.

The following is a specific example pseudo code:

String get(String key) {
    // 从缓存中获取数据
    String cacheValue = cache.get(key);

    // 如果缓存为空
    if (StringUtils.isBlank(cacheValue)) {
        // 从存储中获取数据
        String storageValue = storage.get(key);
        
        // 如果存储中的数据存在
        if (storageValue != null) {
            cache.set(key, storageValue);
            // 设置一个过期时间(300到600秒之间的随机值)
            int expireTime = 300 + new Random().nextInt(301); // Random range: 300 to 600
            cache.expire(key, expireTime);
        } else {
            // 存储中没有数据时,设置缓存的默认过期时间(300秒)
            cache.expire(key, 300);
        }
        return storageValue;
    } else {
        // 返回缓存中的数据
        return cacheValue;
    }
}
  • 1.
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
  • 19.
  • 20.
  • twenty one.
  • twenty two.
  • twenty three.
  • twenty four.
  • 25.

Cache Avalanche

Cache avalanche refers to a situation where a large number of requests directly flow to the backend storage layer when the cache layer fails or is overloaded, causing the storage layer to be overloaded or down. Usually, the role of the cache layer is to effectively carry and share the request traffic and protect the backend storage layer from the pressure of high concurrent requests.

However, when the cache layer cannot continue to provide services for some reason, such as encountering a huge concurrent impact or improper cache design (for example, accessing a very large cache item bigkey causes a sharp drop in cache performance), a large number of requests will be forwarded to the storage layer. At this time, the number of requests to the storage layer will increase sharply, which may cause the storage layer to be overloaded or crash, thereby causing system-level failures. This phenomenon is called "cache avalanche."

Solution

In order to effectively prevent and solve the cache avalanche problem, we can start from the following three aspects:

  1. Ensure high availability of cache layer services: Ensuring high availability of the cache layer is a key measure to avoid cache avalanche. You can use tools such as Redis Sentinel or Redis Cluster to achieve high availability of cache. Redis Sentinel provides automatic failover and monitoring functions, which can automatically promote slave nodes to new master nodes when problems occur in the master node, thereby maintaining service continuity. Redis Cluster further improves the availability and scalability of the system through data sharding and replication between nodes. In this way, even if some nodes fail, the system can still operate normally and continue to process requests.
  2. Rely on isolation components for current limiting, circuit breaking, and degradation: Using current limiting and circuit breaking mechanisms to protect backend services from sudden requests can effectively alleviate the pressure caused by cache avalanche. For example, use current limiting and circuit breaking components such as Sentinel or Hystrix to implement flow control and service degradation. Different processing strategies can be adopted for different types of data:

Non-core data: For example, product attributes or user information in an e-commerce platform. If this data in the cache is lost, the application can directly return predefined default degradation information, null values, or error prompts instead of directly querying the backend storage. This approach can reduce the pressure on the backend storage while providing some basic feedback to users.

Core data: For example, the inventory of goods in an e-commerce platform. For these key data, you can still try to query from the cache. If the cache is missing, read it from the database. In this way, even if the cache is unavailable, the reading of core data can still be guaranteed, avoiding the loss of system functions due to cache avalanche.

  1. Advance drills and contingency plan formulation: Before the project goes online, conduct sufficient drills and tests to simulate the application and backend load after the cache layer goes down, identify potential problems and formulate corresponding contingency plans. This includes simulating cache failure, backend service overload, etc., observing system performance, and adjusting system configuration and policies based on test results. Through these drills, system weaknesses can be discovered and corresponding emergency measures can be formulated to deal with emergencies in the actual production environment. This can not only improve the robustness of the system, but also ensure that the system can quickly resume normal operation when a cache avalanche occurs.

By combining these measures, the risk of cache avalanche can be significantly reduced and the stability and performance of the system can be improved.

Summarize

Web-tier caching significantly improves application performance and speeds up response time by reducing repeated data processing and database queries. As an efficient in-memory data structure storage system, Redis plays an important role in implementing the cache layer. It supports various data structures and can quickly access data, thereby reducing database burden and improving user experience.

However, the cache mechanism also faces challenges, such as cache penetration, cache breakdown, and cache avalanche. Cache penetration is solved by caching empty objects and Bloom filters. The former avoids accessing the database for each query, and the latter effectively reduces the impact of malicious requests. Cache breakdown is alleviated by setting a random expiration time, which can prevent a large number of requests from rushing to the database at the same time. For cache avalanche, it is key to ensure the high availability of the cache layer, adopt current limiting and fuse mechanisms, and formulate adequate contingency plans.

Effective cache management not only improves system performance, but also enhances system stability. Understanding and solving these cache problems can ensure that the system maintains efficient and stable operation in a high-concurrency environment. Carefully designed and implemented cache strategies are the basis for optimizing application performance. Continuous attention and adjustment of these strategies can help the system cope with various challenges and maintain a good user experience.