Dewu CDN domain name convergence and multi-vendor disaster recovery optimization practice

Dewu CDN domain name convergence and multi-vendor disaster recovery optimization practice


Through CDN domain name convergence, we have not only improved CDN network performance and stability, but also realized the unification and standardization of multiple domain names, greatly reducing the complexity of subsequent CDN domain name optimization and maintenance. In addition, it also supports the disaster recovery capability of the CDN primary domain name, ensuring the stability of online services.

​Background

Too many CDN domain names cause requests to be fragmented, leading to the following problems:

Frequent TCP connection establishment and poor network request performance

The network connection pool resources used to request CDN static resources are limited. Different domain names will create their own TCP connections and compete for TCP connection pool resources, resulting in frequent interruption of TCP connections. Initiating a network request again requires re-establishing a TCP connection, which increases the time-consuming of the connection-building phase (including: DNS resolution, TCP handshake, and TLS handshake), resulting in an increase in the total time-consuming.

Too many domain names, high daily maintenance costs

Too many domain names lead to increased complexity of domain name management, performance monitoring, performance optimization, and online changes, as well as high labor costs and operation and maintenance costs.

For example: Dewu IPv6 upgrade project and TLS1.3 protocol upgrade project need to execute multiple online change processes in batches by domain name (including: test regression, change application, change review, change verification, performance monitoring).

Some domain names are not standardized, and there is a risk of going offline

Due to historical reasons, there are many domain names that do not conform to the existing new domain name specifications (such as: xxx.poizon.com, xxx.dewu.com). There is a risk of being forcibly taken offline for domain names that are not subject to acquisition. For example, the old domain name offline project has invested a lot of labor costs for transformation.

DNS resolves IP frequently, and Alibaba Cloud HttpDNS service costs are high

Too many domain names increase the frequency of each domain name calling Alibaba Cloud HttpDNS to resolve the corresponding IP when establishing a TCP connection. The increase in the number of resolutions leads to high HttpDNS service costs.

In order to solve the problem caused by too many CDN domain names, we decided to optimize the convergence of CDN domain names on the client side.

2. CDN domain name convergence

2.1 Convergence thinking

Let's think about a question first: How to converge multiple CDN domain names into a unified domain name on the client side without affecting business?

Mainly need to consider the following points:

  • There is no need to make any changes on the server side, and the URLs of multiple CDN domain names are still issued in the existing way.
  • The client side implements the CDN domain name convergence logic without intruding into the business code, and has no awareness of the upper layer of the business.
  • When the client implements the final network request, the domain name of the request URL has been replaced with the unified domain name.
  • After receiving the URL request of the unified domain name, the CDN server can return to the source normally and return the requested static resource.

Let's first look at the example diagram of the client requesting static resources through the CDN service before the CDN domain name converges:

picture

The process of a static resource URL request to the origin site can be roughly divided into three sections:

  • Client side: business upper layer -> client network library;
  • Public network: client network library -> CDN service;
  • CDN service side: CDN service -> Ali OSS origin station;

The second section of the public network in the red box above: client network library -> CDN service is a multi-one-to-one relationship. To realize the convergence of multiple CDN domain names into a unified domain name, the main thing is to transform this section into a single one-to-one a relationship. Therefore, as long as the first paragraph is transformed into many-to-one, the third paragraph is transformed into one-to-many. To put it simply, it is how to converge the domain name on the client side, and how to distribute it back to the origin site on the CDN service side.

2.2 Converged domain name on the client side

When the network library on the client side is initialized, a network request interceptor is inserted in advance, which is used to intercept network requests and perform unified processing at the bottom layer, replacing each domain name with a unified domain name. Avoid intrusion into the code of the business layer, and realize the unified convergence of the domain name at the bottom layer without the upper layer of the business being aware.

Taking Dewu Android as an example, the network library uses OkHttp3, and OkHttp3 supports adding custom interceptors. We can add interceptors for CDN domain name convergence to all OkHttpClient objects. Interceptors are inserted through ASpectJ stub insertion, which does not intrude into business code, and ensures that subsequent newly created OkHttpClient objects are automatically inserted into interceptors.

// ASpectJ插桩@Aspectpublic class OkHttpAspect {
    /**    * 在okhttp3.OkHttpClient.Builder.new(..)方法后插桩,    * 可以保证OkHttpClient使用new方法或buidler方法创建都被覆盖    */    @After("execution(okhttp3.OkHttpClient.Builder.new(..))")    public void addInterceptor(JoinPoint joinPoint) {        OkHttpClient.Builder target = (OkHttpClient.Builder) joinPoint.getTarget();        addMergeHostInterceptor(target);    }
    /**     * 添加CDN域名收敛拦截器MergeHostInterceptor     */    private void addMergeHostInterceptor(OkHttpClient.Builder builder){        // 避免重复添加,判断当前是否已经添加了拦截器,如果已添加则返回        for (Interceptor interceptor : builder.interceptors()) {            if (interceptor instanceof MergeHostInterceptor) {                return;            }        }        builder.addInterceptor(new MergeHostInterceptor());    }}
  • 1.
  • 2.
  • 3.

When the client-side business upper layer initiates a static resource URL network request, the network library interceptor intercepts the request and executes the CDN domain name convergence logic. Replace each domain name with a unified domain name, and carry the original domain name information by inserting the path prefix representing each domain name (one-to-one mapping with each domain name).

One-to-one mapping table between original domain name and Path prefix

original domain name

Path prefix

image.xxx.com

/image

product.xxx.com

/product

community.xxx.com

/community

An example of generating a new URL after the original URL of the image.xxx.com domain name is intercepted and the request is intercepted

First replace the image.xxx.com domain name with the unified domain name cdn.xxx.com, and then insert the path prefix /image to generate a new URL. 

The comparison between the original URL and the new URL is as follows: 

https://image.xxx.com/xxx.jpg is replaced by https://cdn.xxx.com/image/xxx.jpg

The specific implementation code is as follows:

/** * 收敛域名逻辑,进行域名与path映射 * * @param urlStr 原url * @param sourceHost 原域名 * @param targetHost 统一域名 * @param pathPrefix 原域名映射的path前缀 * @return 拼接好的新url */public static String replaceMergeHostUrl(String urlStr, String sourceHost,                                     String targetHost, String pathPrefix)
    if (!TextUtils.isEmpty(urlStr)) {        //替换域名        urlStr = urlStr.replaceFirst(sourceHost, targetHost);
        if (!TextUtils.isEmpty(pathPrefix)) {            //插入path前缀            StringBuilder urlStrBuilder = new StringBuilder(urlStr);            int offset = urlStr.indexOf(targetHost) + targetHost.length();            urlStrBuilder.insert(offset, pathPrefix);            urlStr = urlStrBuilder.toString();        }    }
    return urlStr;}
  • 1.
  • 2.
  • 3.
  • 4.

The network library uses the new URL to request the CDN service. If the resource cache of the new URL does not exist in the CDN service node or the cache has expired, the CDN service side distribution origin station logic will be triggered.

2.3 CDN service side distribution source station

  • CDN service side distribution source site solution selection

Scenario 1: CDN edge script redirection

CDN Edge Script Alibaba Cloud official documentation: https://help.aliyun.com/document_detail/126588.html

Write the CDN edge script and deploy it on the CDN service node, which supports restoring and forwarding the request to the source OSS through redirection.

Solution 2: Alibaba Cloud OSS image back to source

Mirror back to the official Alibaba Cloud documentation: https://help.aliyun.com/document_detail/409627.html

Configure mirroring back-to-origin rules for the unified OSS, and a 404 error occurs when there are no static resources in the unified OSS, triggering mirroring back-to-origin to the source OSS. After the resource is successfully pulled from the source site, a copy is stored in the unified OSS, and the unified OSS will directly return the stored resource copy at the next access, without triggering the mirroring to return to the source.

Schematic diagram of the principle of mirroring back to the source (from the official documentation of Alibaba Cloud)

picture

Both options have advantages and disadvantages, as detailed in the table below:

Back to the source program

advantage

insufficient

CDN edge script redirection

Implemented on the CDN service side without relying on Alibaba Cloud OSS image back-to-source

Edge script development and maintenance costs are relatively high

Executing redirection logic has a certain impact on performance

There is a greater risk of stability when edge scripts are launched iteratively



Alibaba Cloud OSS image back to source

Only need to perform image back-to-source configuration, no development cost

Mirroring back to the source can migrate multiple OSS resources to a unified OSS, which converges the OSS

Mirroring back to the source is only the first time on the entire network, subsequent requests directly return resource copies, and the performance impact is negligible



Rely on Alibaba Cloud OSS image back-to-source capability

Considering the transformation cost and performance impact, "Alibaba Cloud OSS image back-to-origin" was finally selected as the CDN service side distribution source station solution.

  • The specific implementation of Aliyun OSS image back-to-source

Match the path prefix in the unified OSS image back-to-origin configuration rules, and the mapping is restored to the source OSS corresponding to the original domain name, realizing accurate image back-to-source to the source OSS.

The one-to-one mapping table between the Path prefix and the source OSS is as follows:

Path prefix

Source OSS

/image

image_oss

/product

product_oss

/community

community_oss

For example, an example of OSS image back-to-source configuration for the domain name community.xxx.com

picture

After realizing client-side convergence and CDN service-side distribution of origin sites, let's look at an example diagram of a client requesting static resources through CDN services:

picture

The red box on the left is an example of the many-to-one transformation completed by convergence on the client side, and the red box on the right is an example of the one-to-many transformation completed by OSS image back-to-source on the server side. The architecture has basically achieved the goal of CDN domain name convergence. But we also need to consider how to ensure the stability of the function in the online stage and the flexibility of domain name convergence after the launch.

2.4 Stability and flexibility in the launch phase

Ensure stability by supporting the overall grayscale and monitoring logs

Configure the AB experiment switch as the gray-scale switch of the client function, which supports scaling up by percentage, and controls the gray-scale ratio of the client CDN domain name convergence function to ensure the stability of the function on-line.

client

key

value

describe

android

merge_host_android

1-on, 0-off (default)

Android side CDN domain name convergence AB experiment switch

iOS

merge_host_ios

1-on, 0-off (default)

iOS side CDN domain name convergence AB experiment switch

An example diagram of the gray scale of the CDN domain name convergence function controlled by the client through the AB experiment:

picture

Pre-embed logs (support sampling) in the key code logic of the CDN domain name convergence function, report to Alibaba Cloud log service SLS, and configure monitoring and alarms on the SLS platform to facilitate timely detection of online exceptions for processing.

CDN domain name convergence function code-level monitoring buried point definition

field

describe

type

Is it required?

value

bi_id

business description

String

required

"mergeHost"

section

Code Execution Key Points

String

required

"init_config_data"

desc

describe

String

optional

"Domain name convergence data initialization"

host

current host

String

optional

"cdn.xxx.com"

url

current url

String

optional


originHost

original host

String

optional

"image.xxx.com"

code

http status code

int

optional

5xx

stack

Error stack information

String

optional


Flexibility is ensured by supporting individual domain names to be converged

The client configuration center distributes the configuration data of the list of domain names to be converged, and supports the dynamic distribution and gradual increase of domain names to be converged according to the single domain name dimension.

{
    "mergeHostConfigs": [{ //待收敛的CDN域名列表
        "host": "image.xxx.com", //原域名
        "pathPrefix": "/image", //原域名一一映射的path前缀
        "rate": 1 //单域名收敛灰度率
    }, { 
        "host": "product.xxx.com",
        "pathPrefix": "/product", 
        "rate": 0.5 
    }, { 
        "host": "community.xxx.com",
        "pathPrefix": "/community", 
        "rate": 0.1 
    }]
}
  • 1.
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.

Now, let's look at the flow chart of the client requesting static resources after the CDN domain name converges:

picture

Gray-scale heavy-duty process in the development, testing, and release stages

AB experiment heavy-duty process

stage

matter

New version development and testing phase of App

In the T1 test environment, the AB experiment switch is turned off, and the whitelist hits the experiment.

After passing the test to before grayscale

T1 test environment AB experimental group opened 50%, observe whether all internal colleagues who installed the test package have feedback on problems.

App grayscale period

The online AB experimental group and the control group each open 1%, and observe the online stability.

officially released

The experimental group and the control group were gradually increased by 5%, 10%, 30%, and 50%.

After the experimental group and the control group each opened 50% of the

Observe the data of the experimental group and the control group. During this period, the domain names to be converged are converged and increased according to the single domain name dimension.

After the domain name to be converged is fully converged according to the single domain name dimension

The experimental group increased the volume by 70%, 90%, and 100%.

Convergence and scale-up process for single domain name dimension

stage

matter

Before volume

Make a gray scale heavy volume schedule.

Pull the gray scale notification group to facilitate timely notification of online changes to O&M and all business parties.

Configure OSS image back-to-origin rules, and notify test students to perform offline verification, including:



Hit the AB experimental group, the domain name convergence function takes effect, use the spliced ​​new URL and resource access is normal.

If the AB experiment is not hit, the domain name convergence function does not take effect, and the original URL is used and resource access is normal.

heavy volume

Operate the configuration center according to the heavy volume schedule and release new gray scale.

Online changes are synchronized to the gray-scale heavy-volume notification group, and relevant parties at at will pay close attention to various indicators.

Simultaneously test students to perform online environment regression verification.

Observe the indicators of each monitoring platform for 1 hour.




After heavy volume

Continue to pay attention to various monitoring indicators and user feedback.

3. CDN multi-vendor disaster recovery

As an e-commerce platform, Dewu needs to provide users with stable and reliable CDN services, whether it is major promotional activities (such as 618, Qixi Festival, Double Eleven, Double Twelve, etc.) or daily services.

Alibaba Cloud CDN service SLA only supports 99.9%, that is, there is a risk of online service being unavailable for 43 minutes per month Alibaba Cloud CDN service SLA official document: http://terms.aliyun.com/legal-agreement/terms/suit_bu1_ali_cloud/suit_bu1_ali_cloud201803050950_21147 .html?spm=a2c4g.11186623.0.0.7af85fcey4BKBZ

Therefore, after converging multiple CDN domain names into a unified domain name, we decided to upgrade the unified domain name for CDN disaster recovery with the same vendor and multi-vendors.

3.1 Disaster Recovery Ideas

Mainly consider the following points:

  • There is a risk of unavailability of a single primary domain name, and the domain name list can be dynamically delivered to the client.
  • When the primary domain name is unavailable, the client automatically downgrades for disaster recovery and selects an alternate domain name for network requests.
  • When all the domain names in the domain name list are unavailable, the original domain name before the domain name convergence can be used as a backup.
  • For domain names that are unavailable in the domain name list, the available status of the domain name can be automatically restored through availability detection.

3.2 Dynamically distribute domain name list

The client supports configurable CDN domain names, and selects the domain name used when loading static resources according to the domain name list issued by the configuration center. The priority strategy for selecting domain names is based on the order of the domain names in the domain name list. By default, the first available domain name in the domain name list is used as the current domain name. Other domain names of the same manufacturer or multiple manufacturers can also be configured in the list as backup domain names. Domain name switching priority: main domain name -> backup domain name of the same manufacturer -> backup domain name of multiple manufacturers.


{
    "hostListConfig": [{//CDN域名列表
        "host": "cdn.xxx.com", //主域名
        "rate": 1
    },{
        "host": "cdn-ss.xxx.com", //同厂商备用域名
        "rate": 1 //单域名灰度率
    },{
        "host": "cdn-bak.xxx.com", //多厂商域名
        "rate": 1 //单域名灰度率
    }],
    "reviveProbeInterval":600000, //域名复活探测时间间隔,单位ms
    "configRequestInterval":600000, //配置兜底接口请求时间间隔,单位ms
    "configVersion":1, //配置内容版本号,每变更一次配置就需要版本号+1
    "publicIpList":["223.5.5.5","180.76.76.76"] //国内公共DNS服务IP
}
  • 1.
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.

3.3 Domain Name Automatic Disaster Recovery and Downgrade

When the client business layer initiates a static resource URL network request, it executes the domain name replacement logic in the CDN domain name convergence interceptor of the network library, replaces the domain name of the original URL with the current domain name selected from the domain name list, and initiates a load static resource network request.

Monitor the exception callback of the current domain name request. If it is determined that the current domain name is unavailable, update the availability status field isDisabled of the current CDN domain name to true (true for unavailable, false for available (default)). The criteria for judging that the current domain name is unavailable are as follows:

Http protocol status code returns 5XX (500<=code and code <600)

The socket connection timed out/failed, and the client's network status is normal (check the cause of the client's network).

Client network status judgment method: 

Use the InetAddress.isReachable function (Android) or the ping command (such as: ping -c 1 223.5.5.5) to detect at least one of the domestic public DNS services (configure the IP list, such as: 223.5.5.5, 180.76.76.76 ) The ping is successful. 

If the ping is successful, it means that the client network is normal; if the ping fails, it means that the client network is abnormal.


/**
 * 判断客户端网络是否正常
 */
public static boolean isNetworkStatusNormal() {

    //IP列表中有一个公共IP可触达则认为网络状态正常
    for (int i = 0; i < publicIpList.size(); i++) {
        String ip = publicIpList.get(i);
        try {
            if (!TextUtils.isEmpty(ip)) {
                InetAddress inetAddress = InetAddress.getByName(ip);
                boolean result = inetAddress.isReachable(3000);
                if (result) {
                    return true;
                }
            }
        } catch (IOException e) {
            DuLogger.t(TAG).e(ip + " test is bad, error e = " + e);
        }
    }
    return false;
}
  • 1.
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
  • 19.
  • 20.
  • twenty one.
  • twenty two.
  • twenty three.

If the current CDN domain name is unavailable, traverse the CDN domain name list to obtain the next available CDN domain name as the new current CDN domain name. Replace the domain name of the URL with the new current domain name again, initiate a static resource network request, realize automatic disaster recovery and downgrade of the end-side domain name, and support the same-vendor and multi-vendor disaster recovery capabilities.

If all the domain names in the CDN domain name list are unavailable, the back-and-forth logic will be executed to restore the original URL before convergence using the CDN domain name, and initiate the request again.

Domain name automatic disaster recovery downgrade flow chart:

picture

3.4 Automatic recovery of domain name availability​

For the CDN domain name whose status field isDisabled in the domain name list has been assigned a value of true, after the CDN domain name is unblocked and the CDN server recovers from failure, the client side needs to detect and automatically restore the CDN domain The standby domain name is gradually switched back to the primary domain name.

The specific steps for restoring domain name availability are as follows:

  • The client implements domain name availability detection logic to support detecting whether the CDN domain name is available again.
  • After the network library intercepts the static resource URL network request, if the current time is greater than the interval time from the last detection time (configuration delivery, such as 10 minutes), the domain name availability detection logic is executed asynchronously.
  • Traverse the list of CDN domain names in the sub-thread, sequentially take out the unavailable domain names in the list whose availability status field isDisabled is assigned true, and replace the original domain name of the original URL to generate a new URL.
  • Use the new URL to send an Http HEAD request for detection. If the Http status code returned by the request is 2XX, it means that the CDN domain name has been restored, and the status field isDisabled is assigned a value of false; otherwise, it is skipped.
  • Continue to traverse and detect the availability of the next unavailable domain name until the end of CDN domain name list traversal.
  • The CDN domain name whose availability has been restored can be reselected according to priority in subsequent static resource network requests.

Domain name availability automatic recovery flow chart:

picture

4. Challenges encountered

4.1 Scenarios where a small amount of resources are dynamically updated

  • challenge point

Static resources (pictures, videos, zip files) requested by the client through the CDN service generally do not update the content of the file (the URL remains unchanged, and the file is overwritten and updated). However, the resources requested by some business scenarios are json files, and the content of the json file will be updated dynamically. If the configuration data is released by the client configuration center platform, the json file will be overwritten and updated.

We have already introduced the working principle of OSS image back-to-source above. After resources are pulled from the source OSS, unified OSS will store a copy, and the next time you access it, it will directly return the stored resource copy, and no mirror image back-to-source will be triggered. Therefore, the dynamic update of such resources needs to be handled separately.

  • solution

Sort out all business scenarios (enumerable) where resource content is dynamically updated, and promote OSS dual-write compatibility transformation. When the resource content needs to be updated dynamically, the source OSS and the unified OSS are updated synchronously to ensure that the resource file content on the unified OSS is up to date.

4.2 CDN service console monitoring does not support Path dimension monitoring

  • challenge point

After the client completes CDN domain name convergence, multiple original domain names are converged into a unified domain name, and the original domain names are distinguished by Path prefixes. The business side expects that after the domain name is converged, the performance monitoring report can be viewed in the Path dimension, but the Alibaba Cloud CDN service console currently only supports monitoring in the domain name dimension. Therefore, self-developed Path dimension monitoring reports need to be considered.

  • Solution

The client network monitoring platform supports Path dimension monitoring indicators, including: the number of requests, the number of back-to-source times, traffic, bandwidth, and cost.

Use the Path dimension monitoring data API provided by Alibaba Cloud CDN to query data, including the number of requests and traffic.

picture

Use the back-to-source popular URL interface API provided by Alibaba Cloud CDN to query the number of back-to-source times of popular URLs, and then count the number of back-to-source times in the Path dimension.

picture

Example of Path dimension monitoring report:

picture

4.3 How to check the CDN domain name and OSS newly applied for in the future​

  • challenge point

Although domain name convergence and image back-to-origin have been performed for multiple existing domain names and multiple source OSSs, it is still necessary to consider how to ensure that new CDN domain names will not be added on the Dewu App side.

  • Solution

After communicating with the operation and maintenance classmates, add an approval link to the new domain name and new OSS application process to realize the checkpoint.

4.4 The configuration data is also delivered through the CDN domain name, and there is a risk of unavailability​

  • challenge point

Since the configuration data is also delivered through the CDN domain name, when the domain name is unavailable, the client cannot pull the latest configuration data (such as the newly configured domain name list of the configuration platform) through the configuration center SDK, and the resource request failure of the client will not be able to be adjusted. The way configuration data is restored in time.

  • Solution

In order to ensure the timeliness and reliability of configuration data, we can add a dedicated API interface for obtaining configuration data as a cover. When the App is cold-started or all the domain names in the CDN domain name list are unavailable, asynchronously send a dedicated API interface request to obtain configuration data.


获取配置数据的专用API接口
接口定义:/app/config-center/module-config
请求方式:Post
  • 1.
  • 2.
  • 3.
  • 4.

Since both the configuration center SDK and the dedicated API interface obtain configuration data from the same source and are independent of each other, when the client uses the configuration data, it needs to compare the configuration data of the two data sources and use the latest configuration data. How? We add a configVersion field to represent the version of the configuration data. Every time the configuration data is updated, the configVersion is incremented by +1. The larger the version number, the newer the data. The client can determine which configuration data is the latest data by judging the size of configVersion in the configuration data of the two data sources, and use it.


//客户端获取最新的配置数据
private ConfigModel getConfigData() {
    ConfigModel configModel = DataSource.getConfigCenterData();
    ConfigModel apiConfigModel = DataSource.getConfigApiData();

    //使用配置中心下发配置数据与兜底接口下发配置数据中最新的数据
    if (configModel != null && apiConfigModel != null) {
        if (configModel.configVersion < apiConfigModel.configVersion) {
            configModel = apiConfigModel;
        }

    } else if (configModel == null && apiConfigModel != null) {
        configModel = apiConfigModel;
    }

    return configModel;
}
  • 1.
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.

5. Effect after launch

Complete the convergence of 8 CDN domain names into a unified primary domain name, and support CDN domain name same-vendor and multi-vendor disaster recovery capabilities.

Network request performance improvement

  • Android average time-consuming: 354ms -> 277ms, reduced by 77ms (21%)
  • iOS average time-consuming: 226ms -> 194ms, a reduction of 32ms (14%)

The exception rate of network requests is reduced

After the CDN domain name convergence function is launched, the reuse rate of TCP connections is significantly improved, the number of abnormalities caused by DNS resolution failures and TCP connection establishment failures is significantly reduced, and the abnormality rate of both ends is significantly reduced.

  • Android exception rate: 3.42% -> 2.56%, a decrease of 0.86%
  • iOS exception rate: 0.61% -> 0.13%, a decrease of 0.48%

Improved stability

  • The main body of the domain name is unified as dewucdn.com to avoid the risk of offline global transformation of similar old domain names.
  • The disaster recovery capability has been improved. The CDN domain name supports the disaster recovery of the same manufacturer and multiple manufacturers. The SLA has been increased from 99.9% to 99.99%+.

Alibaba Cloud & Tencent Cloud multi-vendor disaster recovery capabilities are supported. If the user network fails to access Alibaba Cloud CDN service request resources under normal conditions (http code 5xx, or socket failure), automatic retry will switch to the alternate domain name and use Tencent Cloud CDN to request resources. The SLA is as follows: Single manufacturer 99.9% increased to 99.99%+.

HTTPDNS cost reduction

Alibaba Cloud HttpDNS service costs reduced by 24%

picture

6. Stepping on pit experience

In the stage of grayscale heavy-duty, internal students reported that some identification pictures were blurred on the identification page

Cause: There is a problem with the configuration of OSS image back-to-origin rules, and "Back-to-origin parameters: Carry request string" is checked.

This means that the unified OSS will carry the request parameters after the "?" When the client requests again, Unified OSS will use the thumbnail as the original image, and then perform secondary cropping according to the image cropping parameters, resulting in a small size of the image returned to the client, and the image will be blurred after being stretched by the View.

It should be noted that the closer the cropping parameters carried in the first request on the entire network to the original image, the less impact it will have on the clarity of the image; the smaller the cropping parameters, the larger the cropping parameters in the second request, and the greater the stretching of the image , the image becomes blurrier.

Example URL for the first request on the entire network: https://cdn.xxx.com/image-cdn/app/xxx/identify/du_android_w1160_h2062.jpeg?x-oss-process=image//resize,m_lfit,w_260,h_470 Second request Example URLs:

https://cdn.xxx.com/image-cdn/app/xxx/identify/du_android_w1160_h2062.jpeg?x-oss-process=image//resize,m_lfit,w_760,h_1500

For example, in the example URL, the original image is a picture with a width of 1160 and a height of 2062. Due to the display needs of the client View (width 260, height 470), the Alibaba Cloud image cropping parameter "x-oss-process=image//resize" is spliced. ,m_lfit,w_260,h_470". After hitting the CDN domain name convergence grayscale, the entire network requests the image for the first time using a new URL replaced with a unified domain name. The CDN service does not have this URL cache, and returns to the unified OSS. The unified OSS triggers the image to return to the source, and carries the request parameters to the source OSS. The source OSS will return a thumbnail with a width of 259 and a height of 473 to the unified OSS according to the image cropping parameters in the request parameters, and the unified OSS will store the thumbnail as a copy of the original image.

In the second request, the unified OSS will crop according to the cropping parameters of the second request with a width of 760 and a height of 1500. However, the width and height of the copy of the original image are smaller than the cropping parameters (width 260 < 760, height 470 < 1500), and finally returned to Thumbnail image of the client with a width of 260 and a height of 470. The client View (width 760, height 1500) stretches the thumbnail for display, resulting in blurred pictures.

During the test phase, we only verified whether the image is returned and displayed normally, and did not pay attention to the image blurring problem caused by carrying the image parameter mirroring back to the source.

Solution:

Close image.xxx.com gray scale configuration.

Reconfigure the OSS image back-to-origin rules, and remove the check of "Back-to-origin parameters: Carry request string". Use the new Path prefix as a mapping to ensure that re-triggering the mirror back to the source correctly pulls the original image.

Carry out test verification, confirm that there is no image blur problem, and then increase the volume again.

Delete wrong thumbnail images under /image-cdn to avoid waste of oss storage costs.

7. Summary

Through CDN domain name convergence, we have not only improved CDN network performance and stability, but also realized the unification and standardization of multiple domain names, greatly reducing the complexity of subsequent CDN domain name optimization and maintenance. In addition, it also supports the disaster recovery capability of the CDN primary domain name, ensuring the stability of online services.