Ingress enterprise combat: URL rewriting and advanced gameplay

2023.08.22

Ingress enterprise combat: URL rewriting and advanced gameplay


In common web servers such as Nginx and Apache, URL rewriting can be implemented through regular expressions and rule matching. The exact syntax and method will vary depending on the server software.

What is URL Rewriting

URL rewriting (URL rewriting) is a process of modifying or transforming the request URL on the web server. It usually involves using server configuration or rules to alter incoming URLs to achieve different behaviors such as redirection, path mapping, parameter handling, etc. without changing the actual requested resource. URL rewriting is performed at the server level, so the client (such as a browser) will not perceive these changes in the URL request, but the server will handle it appropriately according to the configuration. URL rewriting can be used for a variety of purposes, such as:

  1. Redirection: Rewrite a URL to another URL to achieve 301 permanent redirection or 302 temporary redirection. This can be used to change the site structure, fix bad URLs, achieve SEO optimization, and more.
  2. Path Mapping: Maps the path of one URL to another location, useful for hiding actual file paths or path reorganization.
  3. Query parameter processing: Add, delete or modify query parameters in the URL to suit different application requirements.
  4. Dynamic URL to Static URL: Convert a dynamically generated URL (with parameters) to a static URL, which is more friendly and easy to index.
  5. Hiding technical details: The actual technical details of the backend server or application can be hidden through URL rewriting to improve security.

In common web servers such as Nginx and Apache, URL rewriting can be implemented through regular expressions and rule matching. The exact syntax and method will vary depending on the server software. Usually, there will be a special part in the server configuration file for configuring URL rewriting rules, for example, the rewrite command is used in Nginx. URL rewriting is a powerful technique, but it needs to be used with care, making sure it is configured correctly to avoid potential issues such as infinite loops of redirects or incorrect rewrite rules that could render a website unusable.

Ingress built-in variables

Built-in predefined variables are variables that can be used without declaration, and usually include the value of a part of the content in an http request or response. The following are some commonly used built-in predefined variables:

变量名 定义
$arg_PARAMETER  GET请求中变量名PARAMETER参数的值。
$args   这个变量等于GET请求中的参数。例如,foo=123&bar=blahblah;这个变量只可以被修改
$binary_remote_addr 二进制码形式的客户端地址。
$body_bytes_sent    传送页面的字节数
$content_length 请求头中的Content-length字段。
$content_type   请求头中的Content-Type字段。
$cookie_COOKIE  cookie COOKIE的值。
$document_root  当前请求在root指令中指定的值。
$document_uri   与$uri相同。
$host   请求中的主机头(Host)字段,如果请求中的主机头不可用或者空,则为处理请求的server名称(处理请求的server的server_name指令的值)。值为小写,不包含端口。
$hostname   机器名使用 gethostname系统调用的值
$http_HEADER    HTTP请求头中的内容,HEADER为HTTP请求中的内容转为小写,-变为_(破折号变为下划线),例如:$http_user_agent(Uaer-Agent的值);
$http_user_agent : 客户端agent信息;
$http_cookie : 客户端cookie信息;
$sent_http_HEADER   HTTP响应头中的内容,HEADER为HTTP响应中的内容转为小写,-变为_(破折号变为下划线),例如: $sent_http_cache_control, $sent_http_content_type…;
$is_args    如果$args设置,值为"?",否则为""。
$limit_rate 这个变量可以限制连接速率。
$nginx_version  当前运行的nginx版本号。
$query_string   与$args相同。
$remote_addr    客户端的IP地址。
$remote_port    客户端的端口。
$remote_user    已经经过Auth Basic Module验证的用户名。
$request_filename   当前连接请求的文件路径,由root或alias指令与URI请求生成。
$request_body   这个变量(0.7.58+)包含请求的主要信息。在使用proxy_pass或fastcgi_pass指令的location中比较有意义。
$request_body_file  客户端请求主体信息的临时文件名。
$request_completion 如果请求成功,设为"OK";如果请求未完成或者不是一系列请求中最后一部分则设为空。
$request_method 这个变量是客户端请求的动作,通常为GET或POST。包括0.8.20及之前的版本中,这个变量总为main request中的动作,如果当前请求是一个子请求,并不使用这个当前请求的动作。
$request_uri    这个变量等于包含一些客户端请求参数的原始URI,它无法修改,请查看$uri更改或重写URI,
包含请求参数的原始URI,不包含主机名,如:”/foo/bar.php?arg=baz”。
$scheme 所用的协议,比如http或者是https,比如rewrite ^(.+)$ $scheme://example.com$1 redirect;
$server_addr    服务器地址,在完成一次系统调用后可以确定这个值,如果要绕开系统调用,则必须在listen中指定地址并且使用bind参数。
$server_name    服务器名称。
$server_port    请求到达服务器的端口号。
$server_protocol    请求使用的协议,通常是HTTP/1.0或HTTP/1.1。
$uri    请求中的当前URI(不带请求参数,参数位于args),不同于浏览器传递的args),不同于浏览器传递的args),不同于浏览器传递的request_uri的值,它可以通过内部重定向,或者使用index指令进行修改。uri不包含主机名,如”/foo/bar.html”。
  • 1.
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
  • 19.
  • 20.
  • 21.
  • 22.
  • 23.
  • 24.
  • 25.
  • 26.
  • 27.
  • 28.
  • 29.
  • 30.
  • 31.
  • 32.
  • 33.
  • 34.
  • 35.
  • 36.

Ingress regular expressions

正则表达式匹配,其中:
~       为区分大小写匹配
~*      为不区分大小写匹配
!~和!~*  分别为区分大小写不匹配及不区分大小写不匹配
.      匹配除换行符以外的任意字符
\w     匹配字母或数字或下划线或汉字
\s     匹配任意的空白符
\d     匹配数字
\b     匹配单词的开始或结束
^      匹配字符串的开始
$      匹配字符串的结束
*         重复零次或更多次
+         重复一次或更多次
?         重复零次或一次
{n}       重复n次
{n,}      重复n次或更多次
{n,m}     重复n到m次
*?        复任意次,但尽可能少重复
+?        重复1次或更多次,但尽可能少重复
??        重复0次或1次,但尽可能少重复
{n,m}?    重复n到m次,但尽可能少重复
{n,}?     重复n次以上,但尽可能少重复
\W        匹配任意不是字母,数字,下划线,汉字的字符
\S        匹配任意不是空白符的字符
\D        匹配任意非数字的字符
\B        匹配不是单词开头或结束的位置
[^x]      匹配除了x以外的任意字符
[^aeiou]  匹配除了aeiou这几个字母以外的任意字符   
(exp)         匹配exp,并捕获文本到自动命名的组里
(?<name>exp)  匹配exp,并捕获文本到名称为name的组里,也可以写成(?'name'exp)
(?:exp)       匹配exp,不捕获匹配的文本,也不给此分组分配组号   
(?=exp)       匹配exp前面的位置
(?<=exp)      匹配exp后面的位置
(?!exp)       匹配后面跟的不是exp的位置
(?<!exp)      匹配前面不是exp的位置
(?#comment)   注释分组不对正则表达式的处理产生任何影响
  • 1.
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
  • 19.
  • 20.
  • 21.
  • 22.
  • 23.
  • 24.
  • 25.
  • 26.
  • 27.
  • 28.
  • 29.
  • 30.
  • 31.
  • 32.
  • 33.
  • 34.
  • 35.
  • 36.

Configure URL Rewrite Rules

In some application scenarios, the URL provided by the backend service is different from the path executed in the Ingress rule, and the Ingress visit directly forwards the access path to the same path on the backend. If no URL rewriting rule is configured, all visits will return 404 . For example, in the following case, /user/info is configured in the Ingress rule, and the access path provided by the backend service is /info. If rewriting is not configured, it will be directly forwarded to the backend /user/info and the actual access provided If the path/info does not match, 404 will be returned directly. Next, let's verify it with a case.

To forward directly without configuring URL rewriting:

$ cat ingress.yml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: demo
spec:
  rules:
  - host: demo.kubesre.com
    http:
      paths:
      - path: /
        pathType: ImplementationSpecific
        backend:
          service:
            name: demo-svc
            port:
              number: 8080
  ingressClassName: nginx

$ kubectl apply -f ingress.yml
ingress.networking.k8s.io/demo configured
  • 1.
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
  • 19.
  • 20.
  • 21.

Access authentication (/user/info):

# 访问/user/info,可以看出直接返回404
$ curl http://demo.kubesre.com/user/info
404 page not found
  • 1.
  • 2.
  • 3.

Configure URL rewriting:

$  cat ingress.yml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: demo
  annotations:
    nginx.ingress.kubernetes.io/rewrite-target: /$2
spec:
  rules:
  - host: demo.kubesre.com
    http:
      paths:
      - path: /user(/|$)(.*)
        pathType: ImplementationSpecific
        backend:
          service:
            name: demo-svc
            port:
              number: 8080
  ingressClassName: nginx

$ kubectl apply -f ingress.yml
ingress.networking.k8s.io/demo configured
  • 1.
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
  • 19.
  • 20.
  • 21.
  • 22.
  • 23.

Access authentication (/user/info):

# 访问/user/info,可以看出直接正常返回了
$ curl  http://demo.kubesre.com/user/info
{"message":"云原生运维圈!"}
  • 1.
  • 2.
  • 3.

Notes:

In the above case, Ingress rewriting implements rewriting rules for different paths through the nginx.ingress.kubernetes.io/rewrite-target annotation. The placeholder $2 indicates that all characters matched in the second bracket (.*) are filled in the nginx.ingress.kubernetes.io/rewrite-target annotation. Presumably everyone knows that Ingress is developed based on Nginx. At this time, it is a rewritten configuration created through Ingress CRD. Its essence is to modify the Nginx configuration file. At this time, the configuration copied from Nginx in Ingress is as follows:

server {
  server_name demo.kubesre.com ;

  listen 80  ;
  listen [::]:80  ;
  listen 443  ssl http2 ;
  listen [::]:443  ssl http2 ;

  set $proxy_upstream_name "-";

  ssl_certificate_by_lua_block {
   certificate.call()
  }

  location ~* "^/user(/|$)(.*)" {

   set $namespace      "default";
   rewrite "(?i)/user(/|$)(.*)" /$2 break;
   proxy_pass http://upstream_balancer;

   proxy_redirect     off;
   }
  • 1.
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
  • 19.
  • 20.
  • 21.
  • 22.

Advanced URL Rewrite Rules

For some complex rewriting rule requirements, the following annotations can be used to achieve, the essence of which is to modify the Nginx configuration file.

  • nginx.ingress.kubernetes.io/server-snippet: Add custom configuration in nginx.conf "server" field.
  • nginx.ingress.kubernetes.io/configuration-snippet: Add custom configuration in the "location" field of nginx.conf.

URL rewrite Flag parameter:

  • last: Indicates that after the matching of this rule is completed, continue to match downwards.
  • break: Indicates to stop matching after the matching of this rule is completed.
  • redirect: Indicates temporary redirection, and returns status code 302.
  • permanent: Indicates permanent redirection and returns status code 301.

Redirection is to automatically redirect the webpage to:

  • 301 permanent redirection: the new URL completely inherits the old URL, and the SEO ranking of the old URL is completely cleared
  • 301 redirection is the best way to be friendly to search engines after the address of the webpage is changed. As long as it is not temporarily moved, it is recommended to use 301 for redirection.
  • 302 temporary redirection: no effect on the old URL, but the new URL will not rank
  • Search engine crawlers crawl new content and keep old URLs

Configure Location:

Annotate nginx.ingress.kubernetes.io/server-snippet to configure location through Ingress, access /sre, and return 401 error code, the case is as follows:

$ cat sre.yml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  annotations:
    nginx.ingress.kubernetes.io/server-snippet: |
       location /sre {
        return 401;
        }
  name: demo-redirect
spec:
  rules:
  - host: demo.kubesre.com
    http:
      paths:
      - path: /
        pathType: ImplementationSpecific
        backend:
          service:
            name: demo-svc
            port:
              number: 8080
  ingressClassName: nginx

$ kubectl apply -f 1.yml
ingress.networking.k8s.io/demo-redirect configured
  • 1.
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
  • 19.
  • 20.
  • 21.
  • 22.
  • 23.
  • 24.
  • 25.
  • 26.

Access verification:

# 表示验证成功
$ curl http://demo.kubesre.com/sre/
<html>
<head><title>401 Authorization Required</title></head>
<body>
<center><h1>401 Authorization Required</h1></center>
<hr><center>nginx</center>
</body>
</html>
  • 1.
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.

URL redirection (permanent):

cat  demo-permanent.yml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  annotations:
    nginx.ingress.kubernetes.io/configuration-snippet: |
      rewrite ^/$ https://www.baidu.com redirect;
  name: demo-redirect
spec:
  rules:
  - host: demo.kubesre.com
    http:
      paths:
      - path: /
        pathType: ImplementationSpecific
        backend:
          service:
            name: demo-svc
            port:
              number: 8080
  ingressClassName: nginx

$ kubectl apply -f demo-permanent.yml
ingress.networking.k8s.io/demo-permanent created
  • 1.
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
  • 19.
  • 20.
  • 21.
  • 22.
  • 23.
  • 24.

Access verification:

# 301永久重定向,浏览器器地址栏会显示跳转后的URL地址,真实效果可以通过浏览器访问测试验证
$ curl http://demo.kubesre.com
<html>
<head><title>301 Moved Permanently</title></head>
<body>
<center><h1>301 Moved Permanently</h1></center>
<hr><center>nginx</center>
</body>
</html>
  • 1.
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.

URL redirection (redirect):

Through URL redirection, visit /test/info, and directly redirect 302 to /user/info.

$ cat demo-redirect.yml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  annotations:
    nginx.ingress.kubernetes.io/configuration-snippet: |
       rewrite ^/test/(.*)$ /user/$1 redirect;
  name: demo-redirect
spec:
  rules:
  - host: demo.kubesre.com
    http:
      paths:
      - path: /test
        pathType: ImplementationSpecific
        backend:
          service:
            name: demo-svc
            port:
              number: 8080
  ingressClassName: nginx
  
$ kubectl apply -f demo-redirect.yml
ingress.networking.k8s.io/demo-redirect created
  • 1.
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
  • 19.
  • 20.
  • 21.
  • 22.
  • 23.
  • 24.

Access verification:

# 302 说明已经重定向了,实际效果可以通过浏览器访问查看
$ curl  http://demo.kubesre.com/test/info
<html>
<head><title>302 Found</title></head>
<body>
<center><h1>302 Found</h1></center>
<hr><center>nginx</center>
</body>
</html>
  • 1.
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.

URL rewriting (last):

Through URL rewriting, visit /sre, return the result of /kube, you can rewrite the Flag last parameter, when the URL is rewritten, it will send a new request, enter the server block again, retry the location match, match Successfully return the result directly.

$ cat sre.yml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  annotations:
    nginx.ingress.kubernetes.io/configuration-snippet: |
      rewrite ^/sre /kube last;
    nginx.ingress.kubernetes.io/server-snippet: |
       location /sre {
        return 401;
        }
        location /kube {
        return 403;
        }
  name: demo-redirect
spec:
  rules:
  - host: demo.kubesre.com
    http:
      paths:
      - path: /sre
        pathType: ImplementationSpecific
        backend:
          service:
            name: demo-svc
            port:
              number: 8080
  ingressClassName: nginx

$ kubectl apply -f sre.yml
ingress.networking.k8s.io/demo-redirect configured
  • 1.
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
  • 19.
  • 20.
  • 21.
  • 22.
  • 23.
  • 24.
  • 25.
  • 26.
  • 27.
  • 28.
  • 29.
  • 30.
  • 31.

Access verification:

# 访问/sre,则返回/kube结果403
$ curl http://demo.kubesre.com/sre/
<html>
<head><title>403 Forbidden</title></head>
<body>
<center><h1>403 Forbidden</h1></center>
<hr><center>nginx</center>
</body>
  • 1.
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.

Summarize

This article introduces the concept of URL rewriting, and explains all aspects of URL rewriting through practical cases. The next chapter will explain more enterprise-level practical combat of Ingress, so stay tuned!