What are the excellent designs worth learning in NS

2022.12.02

What are the excellent designs worth learning in NS


In the computer field, there is a high probability that you will not be able to remember the IP, so you also need to have a similar function of the address book. For example, you only need to enter www.baidu.com, it can help you find the corresponding 112.80.248.76, and then visit.

I used to be a student. Looking back now, I find that boys in school days have very good memories. They can always remember a string of complex and mysterious alphanumeric string domain names. Some masters can even directly type IP to surf the Internet.

Every night when you climb over the school wall to go to the Internet cafe, you can always find them looking for open source learning materials in a certain forum, and they don’t forget to wish the host a good life at the bottom of the page.

It turned out that at that time, they were already learning the most important open source and sharing spirit of the Internet.

Every time I think of it, I am moved.

In addition to being moved.

We will find that there are several technical issues worth talking about.

For example, why both the domain name and IP can be used to access the Internet.

What is the relationship between them.

Going deeper, we can talk about the principle of DNS and what is worth learning about its design.

Today's topic, let's start with why we need DNS.

Why DNS

If we want to visit a certain degree, you can enter the IP address 112.80.248.76 in the search bar on the browser to go directly to the page.

picture

Access web pages by IP

Such behavior is legal, but sick.

Most people can't even remember their partner's phone number, so how can they remember such a string of IP addresses.

Oh, I'm sorry, I hurt the brothers, you have no match.

But I assume you have.

Recall that even though you can't remember your partner's phone number, it doesn't stop you from calling her. Does your operation process open the address book, enter "rich woman", and then a phone number pops up. Click to call.

In the computer field, there is a high probability that you will not be able to remember the IP, so you also need to have a similar function of the address book. For example, you only need to enter www.baidu.com​, it can help you find the corresponding 112.80.248.76, and then visit.

picture

Access with domain name

Where www.baidu.com​ is the domain name, and the IP behind it can be obtained through this domain name is 112.80.248.76.

Just like a person can have multiple phone numbers, a domain name can also have multiple IP addresses.

The process of resolving a domain name to an IP, that is, the process of checking the "address book", is actually what the DNS (Domain Name System, Domain Name System) protocol needs to do.

In addition, it should be noted that the above IP address can be accessed when I write this article, but it does not mean that everyone can access it when reading the article. Because the IP address behind this is likely to change. You can get the latest IP address by using ping www.baidu.com.

picture

ping get IP

But here comes the problem.

The address book of ordinary people usually has a thousand phone numbers, which is more than enough to put in the address book.

However, website domain names are different. It is said that in 2015, there were more than 300 million.

If these 300 million records are placed in one server, there will be two problems.

• More than 300 million pieces of domain name data, the data volume is too large, and the data volume continues to increase

• Need to withstand a large number of read requests. Each website domain name may have thousands of visits. This adds up to hundreds of billions of qps when rounded.

Obviously, if DNS is made into a single-point service like a mobile phone address book, it is impossible to realize such a capability, and it must be a distributed system.

So, the question becomes, how to design a large-scale distributed system that supports hundreds of billions of +qps requests.

I know someone will say: "Is this something that people with only 10qps should consider?"

Although the service we make may only be 10qps, this does not prevent us from learning the excellent design in DNS.

Let's start with the URL hierarchy.

Hierarchy of URLs

for example. A common domain name, such as www.baidu.com.

As you can see, there are two periods in the middle of this domain name. The domain name can be divided into three parts by the dot notation.

Among them, com​is called the first-level domain or top-level domain, other common top-level domains include cn, co​, etc., baidu​is the second-level domain, and www is the third-level domain.

In addition, after com, there is actually an omitted period. It's called the root domain.

picture

Hierarchy of Domain Names

When there are more domain names, extract the same parts of them, and multiple domain names can become such a tree-like hierarchical structure.

picture

Hierarchy

At this time, we can see that these domains are actually a hierarchical relationship, just like schools, grades, and classes.

When you want to locate a specific domain name, you can find the corresponding domain name through this level.

for example. Everyone should still remember the slogan, "Li Xiaoming, Class 2, Grade 3, your mother brought you two cans of Wangzai milk." layers to find people.

The principle of DNS

Let's go back and see how the big guys design DNS.

Go straight to the most important conclusions first.

  • Use hierarchical structure to split services
  • Join the multi-level cache

Expand next.

Use URL hierarchy to split services

The traffic pressure carried by DNS is very high, and it must be made into a distributed service, so the key to the problem becomes how to split the service.

Since URLs have a tree-like hierarchical structure, the services that store them can also be split into a tree-like form very naturally based on this.

A server maintains information for one or more domains. So the service becomes the following hierarchical form.

When we need to visit www.baidu.com.

The query process is the same as the figure below.

picture

DNS query process

The request will first hit the nearest DNS server (such as your home router). If it cannot be found in the DNS server, the DNS server will directly ask the root domain server. Although there is no www.baidu.com in the root domain server ​This record, but it can know that this URL belongs to the com domain​, so it finds the IP address of the com domain server, then accesses the com domain server, repeats the above operation, and then finds out which server puts the baidu domain​, Continue down until you find the record of www.baidu.com​, and finally return the corresponding IP address.

It can be seen that the principle is relatively simple, but there are two problems involved here.

• How does this machine know what is the nearest DNS server IP?

• How does the nearest DNS server know the IP of the root domain?

Let's answer them one by one.

How does this machine know what is the IP of the nearest DNS server?

This was written before in "Just plugged in the Internet cable, how does the computer know what its IP is?" "As mentioned, when plugging in the Internet cable, the machine will obtain the IP address, subnet mask, router address, and DNS server IP address of the machine through the DHCP protocol.

picture

DHCP protocol

The following is my mac machine, the screenshot of the packet capture in the second stage of DHCP Offer. As you can see, the information returned here contains the IP of the DNS server.

picture

offer stage

At the same time, you can also check the IP address of the DNS server in the upper left corner by clicking the Apple icon in the upper left corner -> System Preferences -> Network -> Advanced -> DNS.

picture

Here is a small detail. From the above packet capture picture, you can see the router address, DNS server address, and DHCP server address. They are actually 192.168.31.1. This is actually the IP address of my home router, that is, It is said that general home routers come with these functions.

And in a certain cloud server, the DNS server is the same, it is obtained through the dhcp protocol. It is also very convenient to check the IP address of the DNS server, just execute cat /etc/resolv.conf.

picture

In the nameserver above, it can be seen that there are two DNS servers, and the machine will initiate requests in the order in which they appear in the file. If the first server does not respond, it will request the second one.

How does the nearest DNS server know what the IP of the root domain is?

We also know that the root domain is the top level of the domain name tree. Since it is the top level, the information is generally relatively less. There are only 13 corresponding IPV4 addresses, and only 25 IPV6 addresses.

We can view the dns resolution process of a domain name through the +trace option of the dig command.

picture

The aforementioned 13 legendary root domains, from the letter am, are all in the above picture.

But this raises another question, all you see above are domain names.

This. . .

"I originally wanted to find the IP through the domain name, but you asked me to find the IP of another domain name?"

It sounds unscientific, isn't this an endless loop?

Yes, so the IPs corresponding to these root domain names will be placed in each domain name server in the form of configuration files.

That is to say, there is no need to request the IP corresponding to the root domain name, just read it directly in the configuration.

The screenshot below is the configuration content in the domain name server.

You can see the root domain starting with A, and its IPv4 address is 198.41.0.4.

picture

Join the multi-level cache

For high-concurrency scenarios with more reads and fewer writes, adding a cache is almost standard.

DNS is no exception, it adds a cache, and more than one layer.

Enter the URL from your browser's search box. It will successively access the browser cache, the operating system cache /etc/hosts, and the nearest DNS server cache. If none of them can be found, it will go to the root domain, top-level (first-level) domain, second-level domain and other DNS servers to make query requests.

picture

DNS query sequence after joining the cache

So the request process becomes as shown below. You can see that I have added a small green file icon to the places where there are caches mentioned above, and I give priority to querying in the cache.

picture

DNS query process after joining the cache

Since the information in the above tree structure is cached, the nearest DNS server no longer needs to check from the root domain every time. For example, if you can find the server IP of baidu.com in the cache, just jump directly to the second-level domain server to search.

Because of the existence of multi-level cache, the actual requests received by each layer are greatly reduced. And there are only a few websites that everyone visits every day, so most of the time they can hit the cache and return the IP address directly.

Briefly summarize.

In the design of DNS, services are split through a hierarchical structure, and traffic is distributed to multiple servers.

By adding multi-level cache, the actual requests received by each level are greatly reduced, thus greatly improving the performance of the system.

These two points are excellent designs that we can refer to in the process of business development.

But there is another point that we probably can’t learn. It is called anycast. It also provides important support for DNS to achieve high concurrent processing capabilities. I will put it in the next article to talk about it.

protocol format

DNS is a domain name resolution system, and the protocol running on this system is called DNS protocol.

Similar to HTTP, the DNS protocol is also an application layer protocol.

picture

DNS is an application layer protocol

The following figure is its message format.

picture

DNS message

Too many fields, dizzy? now it's right.

Let's just pick a few key points.

Transaction ID is the transaction ID. For a request and the response corresponding to this request, their transaction IDs are the same, similar to log_id in the microservice system.

The flag field refers to the flag bit, which has 2 Bytes and 16 bits​. The ones that need attention are QR, OpCode​, RCode.

•QR is used to mark whether this is a query or a response message, 0 is a query, 1 is a response.

•OpCode is used to mark the operation code, and the normal query is 0, whether it is to check the ip of the domain name or the domain name of the ip, it is a normal query. It can be roughly thought that we usually only see 0.

• RCode is a response code, similar to status codes like 404, 502​ in HTTP. It is used to indicate whether the result of this request is normal. 0 means everything is fine. 1 means that the message format is wrong, and 2 is an internal error in the service domain name server.

The Queries field refers to the content of your actual query. This actually contains three parts of information, Name​, Type​, Class.

picture

The content of the query is divided into three parts of information

•Name can put domain name or IP. For example, if you want to check the IP corresponding to the domain name baidu.com​, the domain name is placed in it. Conversely, if you check the corresponding domain name through the IP, the IP is placed in the Name field.

•Type refers to what kind of information you want to check. For example, if you want to check the IP address corresponding to this domain name, then fill in A (address). If you want to check whether this domain name has other aliases, fill in CNAME (Canonical Name) . If you want to check the email server address corresponding to xiaobaidebug@gmail.com (such as gmail.com), then fill in MX (Mail Exchanger). There are many other types, the following is a common Type table.

picture

• The Class field is more interesting. You can simply think that we will only see it fill in IN​ (Internet​). In fact, the DNS protocol was originally designed to consider that there may be more application scenarios. For example, CH and HS can also be filled here. You don't even need to know what they mean, because with the development of time, these have become fossils. We know that the only function of this field is that you can put an x ​​at will during the interview, and hide the merits and name.

picture

The Answers field, as can be seen from the name, corresponds to Queries, one question and one answer. The function is to return the query result, such as searching the corresponding IP address through the domain name, and the specific IP information will be put in this field.

picture

grab bag

After reading the principle, let's grab a bag.

We open wireshark. then execute

dig www.baidu.com
  • 1.

At this time, the operating system will send a DNS request to query the IP address corresponding to www.baidu.com.

picture

DNS_Query

The figure above shows the content of the DNS query (request​). You can see that it is an application layer protocol, and the transport layer uses the UDP protocol for data transmission. The part marked in red in the screenshot is the content of the message field mentioned above that needs to be paid attention to. Among them, the flag​ field is displayed by bit, so it is displayed by line in the captured packet.

Next, look at the contents of the response packet.

picture

DNS_Response

It can be seen that the transaction ID (Transaction ID) is consistent with the DNS request message. And there are two IP addresses in the Answers field. After trying it, both IP addresses can be accessed normally.

picture

picture

Summarize

• DNS is an excellent high-concurrency distributed system, which splits services through a hierarchical structure and distributes traffic to multiple servers. By adding a multi-level cache, the actual cache received by each level is greatly reduced, thus greatly improving the performance of the system. These two points can be used for reference in the process of business development.

• When the network cable is connected to the network, the machine obtains the address of the DNS server through the DHCP protocol.

• The IP of the root domain server will be loaded into each DNS server in the form of configuration. Therefore, accessing any DNS server can easily find the IP address corresponding to the root domain.

at last

Finally, I leave you with two questions.

picture

DNS is based on UDP protocol

• It can be seen from the packet capture that DNS uses UDP protocol on the transport layer, so does it only use UDP?

• As mentioned above, there are only 13 IPV4 root domain names in DNS, and many of them are actually deployed in the beautiful country. Does that mean that as long as they are unhappy and cut off our access, our network will be paralyzed? Woolen cloth?

It's been a long time since I left Guangdong, and no one called me Pretty Boy for a long time.

Can everyone call me a handsome boy in the comment area?

Recently, more and more brothers call me diao Mao in the comment area.

so emo. There's nothing wrong with it, what's in front of you is just a poor migrant worker who is wandering outside and misses his hometown.

so.