Byte Side: Will HTTPS Encrypt URLs?

2022.08.17
Byte Side: Will HTTPS Encrypt URLs?

Because the URL information is stored in the HTTP Header, and HTTPS encrypts the entire HTTP Header + HTTP Body, the URL will naturally be encrypted.

Hello everyone, my name is Xiaolin. Last night, a reader sent me material again.

He was interviewing for Bytes and was asked this question: Will HTTPS encrypt URLs?

picture

picture

The answer is, it will be encrypted.

Because the URL information is stored in the HTTP Header, and HTTPS encrypts the entire HTTP Header + HTTP Body, the URL will naturally be encrypted.

The following figure is the request header of HTTP/1.1, you can see that it contains URL information.

picture

The corresponding actual HTTP/1.1 request header:

picture

The first line of an HTTP/1.1 request contains the request method and path. HTTP/2 replaces the request line with a series of pseudo-headers. These five pseudo-headers are easy to identify because they are represented by a colon at the beginning of the name.

For example, the request method and path pseudo-header fields are as follows:

  • The ":method" pseudo-header field contains the HTTP method;
  • The ":path" pseudo-header field contains the path and query parts of the target URL;

As shown below:

picture

The picture above is the information viewed by my browser's F12 developer tool. The information displayed by the browser is the decrypted information, so don't mistake the URL for not being encrypted.

If you use a packet capture tool to capture HTTPS data, you will not see anything. As shown in the figure below, only "Application Data" will be displayed, indicating that this is an encrypted HTTP application data.

picture

Can HTTPS see the domain name?

Let me ask you another question, can HTTPS see the requested domain name?

From the above, we know that HTTPS has encrypted the entire HTTP Header + HTTP Body, so we cannot obtain the requested domain name from the encrypted HTTP data.

But we can see the domain name information during the TLS handshake.

For example, in the figure below, in the "Client Hello" message of the first TLS handshake, there is a server name field, which is the requested domain name address.

picture

Therefore, if you use HTTPS, you can't think that a hub on the company will not be found secretly.

How is the integrity of HTTPS application data guaranteed?

I believe everyone is familiar with the TLS handshake protocol, and I have written related articles:

Then I didn't mention how HTTPS encrypts HTTP data.

Then many readers think that the HTTP data is encrypted with the symmetric encryption key (the symmetric encryption key negotiated during the TLS handshake) and then sent directly, and then they wonder whether the HTTP data is guaranteed by the digest algorithm?

In fact, TLS is implemented in two layers: handshake protocol and record protocol:

  • The TLS handshake protocol is what we call the TLS four-way handshake process, which is responsible for negotiating encryption algorithms and generating a symmetric key, which is then used to protect application data (ie HTTP data);
  • The TLS record protocol is responsible for protecting application data and verifying its integrity and origin, so HTTP data encryption is using the record protocol;

The TLS record protocol is mainly responsible for message (HTTP data) compression, encryption and data authentication. The process is as follows:

picture

The specific process is as follows:

  • First, the message is split into multiple shorter segments, and each segment is compressed separately.
  • Next, the compressed fragment is added with a message authentication code (MAC value, which is generated by a hash algorithm), which is to ensure integrity and authenticate the data. Tampering can be identified by appending the MAC value of the message authentication code. At the same time, in order to prevent replay attacks, when calculating the message authentication code, the encoding of the fragment is also added.
  • Next, the compressed fragment and the message authentication code are encrypted together with a symmetric cipher.
  • Finally, the above encrypted data plus a header consisting of the data type, version number, and compressed length is the final encrypted message data.

After the recording protocol is completed, the final encrypted message data is passed to the Transmission Control Protocol (TCP) layer for transmission.