Front-end essential HTTP knowledge! Just reading this is enough! !

2022.08.25

Front-end essential HTTP knowledge! Just reading this is enough! !

HTTP is a communication protocol (protocol) that can obtain network resources such as HTML and pictures. It is the basis for data exchange on the web and is a client-server protocol.

Origin of HTTP

HTTP was initiated by Tim Berners-Lee at CERN in 1989.

The most famous of these is RFC 2616[1], published in June 1999, which defines a version of the HTTP protocol that is widely used today—HTTP 1.1.

picture

what is HTTP

Full name: HyperText Transfer Protocol (HyperText Transfer Protocol).

Concept: HTTP is a communication protocol (protocol) that can obtain network resources such as HTML and pictures. It is the basis for data exchange on the web and is a client-server protocol.

HTTP - The Internet's Multimedia Messenger - The Definitive Guide to HTTP. The role of HTTP in the Internet: Acting as a messenger, doing errands, passing information between the client and the server, but we cannot lack it. The HTTP protocol is an application layer protocol and is the protocol most closely related to front-end development. The HTTP requests, HTTP caches, Cookies, cross-domain, etc. that we usually encounter are actually closely related to HTTP.

Basic Features of HTTP

  • Extensible protocol. HTTP headers in HTTP 1.0 make protocol extension easier. As long as the server and client agree on the semantics of the headers, new features can easily be added.
  • HTTP​ is stateless and session-based. There is no relationship between two successful HTTP requests within the same connection. This brings a problem that users cannot interact continuously on the same website. For example, in an e-commerce website, a user adds a certain product to the shopping cart, switches a page and then adds the product again. These two There is no correlation between requests to add items, and the browser has no way of knowing which items the user ultimately selected. And using HTTP​'s header extension, HTTP Cookies​ can solve this problem. Adding cookies to the header creates a session so that every request can share the same contextual information and achieve the same state.
  • HTTP​ and connections. Sent over TCP​, or TLS​—encrypted TCP​ connections, theoretically any reliable transport protocol can be used. Connections are controlled by the transport layer, which is fundamentally outside the scope of HTTP.

picture

That is, HTTP relies on connection-oriented TCP for message passing, but connections are not required. Just need it to be reliable, or not lose messages (at least return errors).

HTTP/1.0 defaults to opening a separate TCP connection for each HTTP request/response pair. When multiple requests need to be made in succession, this mode is less efficient than multiple requests sharing the same TCP link. To this end, the concept of HTTP 1.1 persistent connections, the underlying TCP connection can be implemented through the connection header. But HTTP 1.1 is also imperfect on connections, as we'll see later.

HTTP-based component system

The component system of HTTP includes clients, web servers, and proxies.

picture

Client: user-agent

Browsers, especially programs used by engineers, and web developers debugging applications.

web server

The document requested by the client is served and provided by the Web Server. Every request sent to the server will be processed by the server and return a message, which is the response.

Proxies

Between browsers and servers, there are many computers and other devices forwarding HTTP messages. They may appear at the transport layer, network layer, and physical layer, and are transparent to the HTTP application layer.

It has the following functions:

  • cache
  • Filtering (like antivirus scanning, parental controls)
  • load balancing
  • Authentication (permission control for different resources)
  • log management

HTTP message composition

HTTP has two types of messages:

  • Request - Sent by the client to trigger an action on the server.
  • Response - The reply from the server side.

HTTP messages consist of multiple lines of text encoded in ASCII. In HTTP/1.1 and earlier, these messages were sent publicly over the connection. In HTTP 2.0, messages are divided into multiple HTTP frames. Provide HTTP messages through configuration files (for proxy servers or servers), API (for browsers), or other interfaces.

Typical HTTP Session

  • Establishing a connection In the client-server protocol, the connection is established by the client. Opening a connection in HTTP means initiating the connection at the underlying transport layer, usually TCP. When using TCP​, the default port number for the HTTP​ server is 80​, and 8000​ and 8080 are also commonly used.
  • Send client request.
  • The server responds to the request.

HTTP requests and responses

Both HTTP requests and responses include a start line, HTTP Headers, an empty line, and a body part, as shown in the following figure:

picture

start line. The starting line of the request: the request method, the request path and the HTTP version number The starting line of the response: the HTTP version number, the response status code, and the status text description.

The following is a detailed description of the request Path. The request path (Path) has the following types:

1) An absolute path followed by a ' ? ' and the query string at the end. This is the most common form, called the origin form, used by the GET, POST, HEAD and OPTIONS methods.

POST / HTTP/1.1
GET /background.png HTTP/1.0
HEAD /test.html?query=alibaba HTTP/1.1
OPTIONS /anypage.html HTTP/1.0
  • 1.
  • 2.
  • 3.
  • 4.

2) A full URL. Mainly used when connecting to a proxy using the GET method.

GET http://developer.mozilla.org/en-US/docs/Web/HTTP/Messages HTTP/1.1
  • 1.

3) The authority component of the URL consisting of the domain name and optional port (prefixed with ':') is called the authority form. Only used when using CONNECT to establish an HTTP tunnel.

CONNECT developer.mozilla.org:80 HTTP/1.1
  • 1.

4) Asterisk form, a simple asterisk ('*'), used with the OPTIONS method, represents the entire server.

OPTIONS * HTTP/1.1
  • 1.
  • Headers Request header or response header. See the header below for details. A case-insensitive string followed by a colon (':') and a structure depending on the value of the header.
  • Blank line. Many people tend to ignore it.
  • Body.

Request Body: Some requests send data to the server in order to update the data: a common case is a POST request (containing HTML form data). The body of the request message generally falls into two categories. One type is a single-file body defined by Content-Type and Content-Length. The other category is composed of multiple bodies, usually associated with HTML Form. The difference between the two lies in the value of Content-Type.

1) Content-Type - application/x-www-form-urlencoded For form content in application/x-www-form-urlencoded format, it has the following characteristics:

I. The data in it will be encoded as key-value pairs separated by &.

II. Characters are encoded in URL encoding.

// 转换过程: {a: 1, b: 2} -> a=1&b=2 -> 如下(最终形式)
"a%3D1%26b%3D2"
  • 1.
  • 2.

2) Content-Type - multipart/form-data.

The Content-Type field in the request header will contain boundary, and the value of boundary is specified by the browser by default. Example: Content-Type: multipart/form-data;boundary=----WebkitFormBoundaryRRJKeWfHPGrS4LKe.

The data will be divided into multiple parts, each two parts are separated by a separator, each part of the expression has an HTTP header description sub-package body, such as Content-Type, at the last separator will add -- to indicate the end .

Content-Disposition: form-data;name="data1";
Content-Type: text/plain
data1
----WebkitFormBoundaryRRJKeWfHPGrS4LKe
Content-Disposition: form-data;name="data2";
Content-Type: text/plain
data2
----WebkitFormBoundaryRRJKeWfHPGrS4LKe--
  • 1.
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.

Response Body section:

1) Consists of a single file of known length. The type body is defined by two headers: Content-Type and Content-Length.

2) Consists of a single file of unknown length, encoded using chunks by setting Transfer-Encoding to chunked.

About Content-Length will be mentioned in HTTP 1.0 below, this is a very important header added in HTTP 1.0.

method

Security Methods: HTTP defines a set of methods called security methods. Both the GET method and the HEAD method are considered safe, which means that neither the GET method nor the HEAD method will do anything - the HTTP request will not produce any results on the server side, but that doesn't mean that no action will take place , in fact, this is more of a web developer's decision.

  • GET: Request the server to send a resource.
  • HEAD: Similar to the GET method, but the server only returns the header in the response. The body part of the entity is not returned.
  • PUT: Write the document to the server. Semantics: Use the body part of the request to create a new document named by the requested URL.
  • POST: Used to enter data into the server. Usually we submit form data to the server. [POST is used to send data to the server, and the PUT method is used to store data in a resource (such as a file) on the server].
  • TRACE: Mainly used for diagnosis. Implements loop-back testing of messages along the path to the target resource, providing a useful debugging mechanism.
  • OPTIONS: Request the WEB server to inform it of various functions it supports. The server can be asked which methods are supported. Or which methods are supported for some special resources.
  • DELETE: Requests the server to delete the resource specified in the request URL.

Difference between GET and POST

The first thing to understand is the concept of side effects and idempotency. Side effects refer to modifications to server-side resources. Idempotency means that M and N requests are sent (both are different and both are greater than 1), and the state of the resources on the server is consistent. In application scenarios, get has no side effects and is idempotent. post mainly has side effects and is not idempotent.

Technically there are the following distinctions:

  • Cache: Get requests can be cached, Post requests cannot.
  • Secure: Get requests are not as secure as Post requests because the requests are all in the URL. And will be saved by the browser history. POST is placed in the request body, which is more secure.
  • Restriction: The URL has a length limit, which will interfere with the Get request, which is determined by the browser.
  • Encoding: GET requests can only be URL encoded and can only receive ASCII characters, while POST has no restrictions. POST supports more encoding types and does not limit the data type.
  • From the TCP point of view, the GET request will send the request message at one time, and the POST will be divided into two TCP data packets, the header part will be sent first, and if the server responds with 100 (continue), then the body part will be sent. (Except for Firefox, its POST request only sends a TCP packet).

status code

  • ​100 ~199 - informational status code

101 Switching Protocols. When the HTTP is upgraded to WebSocket, if the server agrees to the change, the status code 101 will be sent.

  • 200~299——success status code

200 OK, indicating that the request from the client was correctly processed on the server.

204 No content, indicating that the request was successful, but the response message did not contain the body part of the entity.

205 Reset Content, indicating that the request is successful, but the response message does not contain the body part of the entity, but it is different from the 204 response in that the requester is required to reset the content.

206 Partial Content, make a range request.

  • 300~399 - redirection status code

301 moved permanently, a permanent redirect, indicating that the resource has been assigned a new URL.

302 found, temporary redirection, indicating that the resource is temporarily assigned a new URL.

303 see other, indicating that there is another URL for the resource, and the GET method should be used to obtain the resource.

304 not modified, indicating that the server allows access to the resource, but the request does not meet the conditions.

307 temporary redirect, a temporary redirect, has a similar meaning to 302, but expects the client to keep the request method unchanged and make a request to a new address.

  • 400~499——Client error status code

400 bad request, there is a syntax error in the request message.

401 unauthorized, indicating that the sent request requires authentication information through HTTP authentication.

403 forbidden, indicating that access to the requested resource was denied by the server.

404 not found, indicating that the requested resource was not found on the server.

  • 500~599——Server error status code

500 internal sever error, indicating that an error occurred on the server side while executing the request.

501 Not Implemented, indicating that the server does not support a function required by the current request.

503 service unavailable, indicating that the server is temporarily overloaded or is down for maintenance and cannot process requests.

capital

HTTP Headers

1. General headers apply to both request and response messages, but are message headers that have nothing to do with the data transmitted in the final message body. such as Date.

2. Request headers contain more information about the resource to be obtained or the client itself. Such as User-Agent.

3. Response headers contain supplementary information about the response.

4. Entity headers contain more information about the entity body, such as the body length (Content-Length) or its MIME type. Such as Accept-Ranges.

For detailed headers, see HTTP Headers Collection [2].

The past and present of HTTP

HTTP (HyperText Transfer Protocol) is the basic protocol of the World Wide Web. Dr. Tim Berners-Lee and his team created it in 1989-1991. [HTTP, web browser, server].

HTTP version 0.9 was released in 1991, version 1.0 in 1996, version 1.1 in 1997, and version 1.1 is the most widely transmitted version to date. Version 2.0 was released in 2015, which greatly optimized the performance and security of HTTP/1.1. Version 3.0, released in 2018, continued to optimize HTTP/2 and radically replaced TCP with UDP. Currently, HTTP/3 is in Supported by Chrome, Firefox, and Cloudflare on September 26, 2019.

picture

HTTP 0.9

One-line protocol, the request consists of a single-line instruction. Begins with the only method available, GET. This is followed by the path to the target resource.

GET /mypage.html
  • 1.

Response: Only include the response document itself.

<HTML>
这是一个非常简单的HTML页面
</HTML>
  • 1.
  • 2.
  • 3.
  • No response headers, only HTML files are transmitted
  • no status code

HTTP 1.0

RFC 1945[3] proposed HTTP 1.0 to build better scalability.

  • Protocol version information is sent with each request.
  • Response status code.
  • Introduces the concept of HTTP headers, either request or extension, allowing metadata to be transmitted. Make the protocol flexible and more extensible.
  • Content-Type request header, with the ability to transmit other types of documents in addition to plain text HTML files In the response, the Content-Type header tells the client the content type of the content actually returned.

Media type is a standard. Used to represent the nature and format of a document, file, or byte stream. Browsers typically use MIME (Multipurpose Internet Mail Extensions) types to determine how to handle URLs, so it's important that web servers configure the correct MIME type in the response headers. If the configuration is incorrect, it may cause the website to not work properly. The structure of MIME is very simple; it consists of two strings of type and subtype separated by '/'.

HTTP takes a part from the MIME type to mark the data type of the body part of the message. These types are reflected in the Content-Type field. Of course, this is for the sender. If the receiver wants to receive a specific type of data, it can also Use the Accept field.

The values ​​of these two fields can be divided into the following categories:

- text:text/html, text/plain, text/css 等
- image: image/gif, image/jpeg, image/png 等
- audio/video: audio/mpeg, video/mp4 等
- application: application/json, application/javascript, application/pdf, application/octet-stream
  • 1.
  • 2.
  • 3.
  • 4.

At the same time, in order to agree on the compression method, supported language, character set, etc. of the requested data and response data, the following headers are also proposed.

1. Compression method: sender: Content-Encoding (the server informs the client of the encoding method of the main body of the entity) and receiver: Accept-Encoding (the encoding method supported by the user agent), the value is gzip: popular compression format; deflate: another well-known compression format; br: a compression algorithm invented specifically for HTTP.

2. Supported languages: Content-Language and Accept-Language (the set of natural languages ​​supported by the user agent).

3. Character set: Sender: In Content-Type, it is specified by the charset attribute. Receiver: Accept-Charset (character set supported by the user agent).

// 发送端
Content-Encoding: gzip
Content-Language: zh-CN, zh, en
Content-Type: text/html; charset=utf-8

// 接收端
Accept-Encoding: gzip
Accept-Language: zh-CN, zh, en
Accept-Charset: charset=utf-8
  • 1.
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.

Although HTTP 1.0 has improved a lot on the basis of HTTP 0.9, there are still many shortcomings.

The main disadvantage of HTTP/1.0 is that only one request can be sent per TCP connection. After sending the data, the connection is closed. If you want to request other resources, you must create a new connection. The cost of establishing a TCP connection is high because it requires a three-way handshake between the client and the server, and the sending rate is slow at the beginning (slow start).

The earliest model of HTTP, and the default model of HTTP/1.0, was short connections. Each HTTP request is done by its own independent connection; this means that each HTTP request is preceded by a TCP handshake, which is continuous.

HTTP 1.1

HTTP/1.1 was published as RFC 2068[4] in January 1997.

HTTP 1.1 removed a lot of ambiguity and introduced several technologies:

  • Connections can be reused. Long connection: connection: keep-alive. HTTP 1.1 supports PersistentConnection, which can transmit multiple HTTP requests and responses on one TCP connection, reducing the consumption and delay of establishing and closing connections. Connection: keep-alive is enabled by default in HTTP1.1, to a certain extent It makes up for the shortcomings of HTTP1.0 that every request must create a connection.
  • Added pipelining technology (HTTP Pipelinling), allowing a second request to be sent before the first response is fully sent to reduce communication latency. During the multiplexing of the same TCP connection, even if multiple requests are sent at the same time through the pipeline, the server will give responses in the order of the requests; and the client will respond to all previous requests before it receives the response. Blocking subsequent requests (queuing), this is called "Head-of-line blocking".
  • Supports response chunking and chunked encoding transmission: Transfer-Encoding: chunkedContent-length declares the data length of this response. A keep-alive connection can send multiple responses in succession, so Content-length is used to distinguish which response the packet belongs to. The prerequisite for using the Content-Length field is that the server must know the data length of the response before sending it. For some time-consuming dynamic operations, this means that the server cannot send data until all operations are completed, which is obviously inefficient. A better way to deal with it is to generate a piece of data, send a piece, and use "Stream" instead of "Buffer". Therefore, HTTP 1.1 stipulates that instead of using the Content-Length field, "Chunked Transfer Encoding" can be used. As long as the request or response header has the Transfer-Encoding: chunked field, it indicates that the body will likely consist of an unspecified number of chunks. Before each data block, there will be a line containing a hexadecimal value, indicating the length of the block; the last block with a size of 0 indicates that the data of this response has been sent.
  • Introduce additional cache control mechanisms. In HTTP1.0, If-Modified-Since, Expires, etc. in the header are mainly used as cache judgment criteria, HTTP1.1 introduces more cache control strategies such as Entity tag, If-None-Match, Cache- Control and more optional cache headers to control the cache strategy.
  • Host header. Different domain names configure servers with the same IP address. Host is a new request header in the HTTP 1.1 protocol, which is mainly used to implement virtual host technology.

Virtual hosting (virtual hosting) is shared hosting (shared web hosting), which can use virtualization technology to divide a complete server into several hosts, so multiple websites or services can be run on a single host.

For example, there is a server with an IP address of 61.135.169.125, and the websites of Google, Baidu and Taobao are deployed on this server. Why do we see Google's homepage instead of Baidu or Taobao's homepage when we visit https://www.google.com? The reason is that the Host request header determines which virtual host to access.

HTTP 2.0

In 2015, HTTP 2.0 was introduced. rfc7540[5].

  • HTTP/2 is a binary protocol rather than a text protocol. Let's look at a few concepts first:

Frame: The client and server communicate by exchanging frames, which are the smallest unit of communication based on this new protocol.

Message: A logical HTTP message, such as a request, a response, etc., consisting of one or more frames.

Stream: A stream is a virtual channel in a connection that can carry messages in both directions; each stream has a unique integer identifier.

Framing in HTTP 2.0 breaks HTTP/1.x messages into frames and embeds them in a stream. Data frames and header frames are separated, which will allow header compression. Combining multiple streams, a process called multiplexing, allows for more efficient underlying TCP connections.

That is, streams are used to carry messages, which in turn consist of one or more frames. The binary transmission method further improves the transmission performance. Each data stream is sent as a message, which in turn consists of one or more frames. A frame is a unit of data in a stream.

HTTP frames are now transparent to web developers. In HTTP/2, this is an additional step between HTTP/1.1 and the underlying transport protocol. Web developers don't need to make any changes in the APIs they use to take advantage of HTTP frames; HTTP/2 will be turned on and used when both the browser and the server are available.

picture

  • This is a multiplexing protocol. Parallel requests can be processed on the same connection, removing the ordering and blocking constraints of HTTP/1.x. Multiplexing allows multiple request-response messages to be issued simultaneously over a single HTTP/2 connection.

As we mentioned before, although HTTP 1.1 has long connection and pipeline technology, there will still be head-of-line blocking. And HTTP 2.0 solves this problem. The new binary framing layer in HTTP/2 breaks through these limitations and realizes complete request and response multiplexing: the client and server can decompose HTTP messages into independent frames, and then Interleave the send and finally reassemble them on the other end.

picture

As shown in the figure above, a snapshot captures multiple data streams in parallel within the same connection. The client is transmitting a DATA frame (stream 5) to the server, and at the same time the server is interleaving a series of frames of stream 1 and stream 3 to the client. Therefore, there are three parallel data streams on one connection at the same time.

Breaking up HTTP messages into separate frames, interleaving them, and reassembling them at the other end is one of the most important enhancements to HTTP 2. In fact, this mechanism will trigger a series of chain reactions in the entire network technology stack, resulting in a huge performance improvement, allowing us to: 1. Send multiple requests in parallel and interleaved without affecting each other. 2. Multiple responses are sent in parallel and interleaved without interfering with each other. 3. Send multiple requests and responses in parallel using one connection. 4. Eliminate unnecessary delays and improve utilization of existing network capacity, thereby reducing page load times. 5. No more work to bypass HTTP/1.x restrictions (like sprites)...

Connection sharing, that is, each request is used as a connection sharing mechanism. A request corresponds to an id, so there can be multiple requests on a connection, and the requests of each connection can be randomly mixed together.

For the comparison between HTTP 1.1 and HTTP 2.0, you can refer to this website demo [6].

The HTTP 1.1 demo is as follows:

picture

The HTTP2.0 demo is as follows:

picture

picture

  • Compressed headers. The header of HTTP1.x has a lot of information, and it has to be sent repeatedly every time, which causes performance loss. To reduce this overhead and improve performance, HTTP/2 compresses request and response header metadata using the HPACK compression format, which employs two simple but powerful techniques: Header fields are encoded, thereby reducing the size of individual transmissions. This format requires that both the client and server maintain and update an indexed list of previously seen header fields (in other words, it can establish a shared compression context), this list is then used as a reference for previous transmissions The value of is encoded efficiently.

picture

Server push. It allows the server to populate the client cache with data, requesting ahead of time through a mechanism called server push. The server can push resources to the client without explicit request from the client. The server can push necessary resources to the client in advance, which can reduce the request delay time. For example, the server can actively push JS and CSS files to the client instead of waiting until The request is sent when the HTML is parsed to the resource, which can reduce the delay time. The general process is shown in the following figure:

picture

How to upgrade your HTTP version

Using HTTP/1.1 and HTTP/2 is transparent to sites and applications. It is enough to have an up-to-date server to interact with the browser of the new point. Only a small group of people need to make changes, and as older browsers and servers are updated, with no need for web developers to do anything, the number of people who use them naturally increases.

HTTPS

HTTPS also transmits information through the HTTP protocol, but uses the TLS protocol for encryption.

Symmetric and asymmetric encryption

Symmetric encryption means that both sides have the same secret key, and both sides know how to encrypt and decrypt the ciphertext. However, because the transmitted data is all over the network, if the secret key is transmitted through the network, once the secret key is intercepted, there is no meaning of encryption.

Asymmetric encryption

We all know that public keys can be used to encrypt data. But decrypting the data must use the private key, which is in the hands of the party that issued the public key. First, the server publishes the public key, then the client knows the public key. Then the client creates a secret key, encrypts it with the public key, and sends it to the server. After receiving the ciphertext, the server decrypts the correct secret key through the private key.

TLS handshake process

The TLS handshake process uses asymmetric encryption

  • Client Hello: The client sends a random value (Random1) and the required protocol and encryption method.
  • Server Hello and Certificate: The server receives the random value of the client, and generates a random value (Random2) by itself, and uses the corresponding method according to the protocol and encryption method required by the client, and sends its own certificate (if it needs to verify the client end certificate needs to be specified).
  • Certificate Verify: The client receives the certificate of the server and verifies whether it is valid. After the verification, a random value (Random3) will be generated, and the random value will be encrypted by the public key of the server certificate and sent to the server. If the server needs to verify The client certificate will be accompanied by a certificate.
  • Server generates secret: The server receives the encrypted random value and decrypts it with the private key to obtain the third random value (Random3). At this time, both ends have three random values. These three random values ​​can be used according to the previous agreement. The key is generated by the encryption method, and the next communication can be encrypted and decrypted by this key.

HTTP cache

Strong cache

Strong caching is mainly determined by the Cache-control and Expires headers.

The value of Expires and the value of the Date attribute in the header determine whether the cache is still valid. Expires is a web server response message header field, which tells the browser when responding to an http request that the browser can directly fetch data from the browser's cache before the expiration time without having to request it again. One disadvantage of Expires is that the returned expiration time is the server-side time, which is an absolute time, so there is a problem, if the client's time is very different from the server's time (such as the clock is out of sync, or across time zones), Then the error is huge.

Cache-Control indicates the validity period of the current resource, and controls whether the browser fetches data directly from the browser cache or re-sends the request to the server to fetch data. But it sets a relative time.

Specify the expiration time: max-age is the number of seconds from the time when the request is initiated. For example, the following means that the strong cache can be hit within 31536000S from the time when the request is initiated.

Cache-Control: max-age=31536000
  • 1.

Indicates that there is no cache.

Cache-Control: no-store
  • 1.

Cached but revalidated.

Cache-Control: no-cache
  • 1.

Private and public caches.

public means that the response can be cached by any middleman (such as an intermediate proxy, CDN, etc.) and private means that the response is dedicated to a single user, the middleman cannot cache the response, and the response can only be used in the browser's private cache.

Cache-Control: private
Cache-Control: public
  • 1.
  • 2.

Verification method: The following means that once the resource expires (for example, it has exceeded max-age), the cache cannot use the resource to respond to subsequent requests until it successfully authenticates to the origin server.

Cache-Control: must-revalidate
  • 1.

Cache-control has higher priority than Expires.

The following is the process of a Cache-Control strong cache:

  • The first request is obtained directly from the server. Which will set max-age=100.
  • The second request, age=10, less than 100, hits the Cache and returns directly.
  • The third request, age=110, is greater than 110. If the strong cache fails, you need to request the server again.

picture

Negotiate cache

  • If-Modified-Since——Last-Modified

Last-Modified indicates the last modification date of the local file. The browser will add If-Modified-Since (the value of Last-Modified returned last time) to the request header to ask the server whether the resource has been updated after this date. The new resource will be sent back.

But if the cache file is opened locally, it will cause Last-Modified to be modified, so ETag appears in HTTP / 1.1.

  • If-none-match - ETags

Etag is like a fingerprint. Changes in resources will lead to changes in ETag. It has nothing to do with the last modification time. ETag can ensure that each resource is unique. The header of If-None-Match will send the last returned Etag to the server, asking whether the Etag of the resource has been updated, and a new resource will be sent back if there is a change.

If-none-match, ETags take precedence over If-Modified-Since, Last-Modified.

First request:

picture

Second request for the same page:

picture

Negotiate the cache, if there is no change, return 304, change the resource and return 200

  • 200: When the strong cache Expires/Cache-Control fails, a new resource file is returned.
  • 200(from cache): Both Expires and Cache-Control exist and are not expired. When Cache-Control prioritizes Expires, the browser successfully obtains resources from the local area.
  • 304 (Not Modified): When the negotiated cache Last-modified/Etag has not expired, the server returns the status code 304.

Now 200 (from cache) has become two types: disk cache (disk cache) and memory cache (memory cache).

picture

revving technology

The above mentioned HTTP caching is related, but in many cases, we want to update online resources after going online.

Web developers have invented a technique Steve Souders calls revving. Infrequently updated files are named in a specific way: the URL (usually the filename) is followed by a version number.

Disadvantage: The version number is updated, and the version number of all resources that refer to these must be changed.

Web developers often use automated build tools to do the mundane work in the real world. When the frequently updated resource (js/css) changes, only the entry changes are made in the frequently changed resource file (html).

Cookies

An HTTP cookie (also called a web cookie or browser cookie) is a small piece of data that the server sends to the user's browser and saves it locally. It will be carried and sent to the server the next time the browser makes a request to the same server. .

create cookie

Set-Cookie response header and Cookie request header.

Set-Cookie: <cookie名>=<cookie值>
  • 1.

session cookies

A session cookie is the simplest type of cookie: it is automatically deleted when the browser is closed, which means it is only valid for the duration of the session. Session cookies do not need to specify an expiration time (Expires) or a validity period (Max-Age). It should be noted that some browsers provide a session recovery function, in this case, even if the browser is closed, the session cookie will be retained, as if the browser was never closed.

persistent cookies

Unlike session cookies, which expire when the browser is closed, persistent cookies can be specified with a specific expiration time (Expires) or a validity period (Max-Age).

Set-Cookie: id=a3fWa; Expires=Wed, 21 Oct 2015 07:28:00 GMT;
  • 1.

Cookie's Secure and HttpOnly flags

Cookies marked as Secure should only be sent to the server via requests encrypted with the HTTPS protocol.

Cookies marked as Secure should only be sent to the server via requests encrypted with the HTTPS protocol. However, even if the Secure flag is set, sensitive information should not be transmitted through cookies, because cookies are inherently insecure, and the Secure flag cannot provide real security.

Cookies marked with HttpOnly are not accessible via JavaScript's Document.cookie API. This is done to avoid cross-domain scripting attacks (XSS).

Set-Cookie: id=a3fWa; Expires=Wed, 21 Oct 2015 07:28:00 GMT; Secure; HttpOnly
  • 1.

Scope of cookies

The Domain and Path flags define the cookie's scope: that is, to which URLs the cookie should be sent.

The Domain ID specifies which hosts can accept cookies. If not specified, it defaults to the current host (excluding subdomains). If Domain is specified, subdomains are generally included.

For example, if Domain=mozilla.org is set, cookies are also included in subdomains (eg developer.mozilla.org).

The Path identifier specifies which paths under the host can accept cookies (the URL path must exist in the request URL). Subpaths are also matched with the character %x2F ("/") as the path separator.

For example, with Path=/docs, the following addresses will all match:

/docs
/docs/Web/
/docs/Web/HTTP
  • 1.
  • 2.
  • 3.

SameSite Cookies

SameSite cookies prevent cross-site request forgery attacks by allowing the server to require a cookie not to be sent in cross-site requests.

None The browser will continue to send cookies under the same-site request and cross-site request, which is not case-sensitive. [Older versions of chrome default before Chrome 80 version].

Strict Browser will only send cookies when visiting the same site.

Lax will be reserved for some cross-site subrequests, such as image loading or calls to frames, but will only be sent when the user navigates to the URL from an external site. Such as link link:

Set-Cookie: key=value; SameSite=Strict
  • 1.

None Strict Lax

In newer versions of browsers (after Chrome 80), the default attribute for SameSite is SameSite=Lax. In other words, when a cookie does not have the SameSite attribute set, it will be treated as if the SameSite attribute was set to Lax - which means the cookie will not be automatically sent when the current user uses it. If you want to specify that cookies are sent in both the same site and cross-site requests, you need to explicitly specify SameSite as None. Because of this, we need to check whether old systems explicitly specify SameSite, and recommend that new systems explicitly specify SameSite for compatibility with old and new versions of Chrome

For more cookie-related information, you can check out an article I summarized earlier about cookies. Cookie knowledge summary for front-end instructions [7]

HTTP Access Control (CORS)

Cross-Origin Resource Sharing (CORS) is a mechanism that uses additional HTTP headers to tell browsers that web applications running on one origin (domain) are allowed to access specified resources from servers on different origins

picture

The Cross-Origin Resource Sharing standard adds a new set of HTTP header fields that allow servers to declare which origin sites have permission to access which resources through the browser.

simple request

A simple request (a preflight request that does not trigger CORS) needs to meet the following three points at the same time:

  • The method is one of GET/HEAD/POST.
  • The value of Content-Type is limited to one of text/plain, multipart/form-data, application/x-www-form-urlencoded.
  • The HTTP header cannot exceed the following fields: Accept, Accept-Language, Content-LanguageContent-Type (additional restrictions need to be noted) DPR, Downlink, Save-Data, Viewport-Width, Width.

The following are the request message and response message of a simple request:

picture

Simplify the following:

picture

The request header field Origin indicates that the request originated from http://foo.example.

In this example, the Access-Control-Allow-Origin: * returned by the server indicates that the resource can be accessed by any external domain. If the server only allows access from http://foo.example, the content of this header field is as follows:

Access-Control-Allow-Origin: http://foo.example
  • 1.

Access-Control-Allow-Origin should be * or contain the domain name indicated by the Origin header field.

preflight request

The specification requires HTTP request methods that may have side effects on server data. The browser must first use the OPTIONS method to initiate a preflight request to know whether the server allows the cross-origin request.

After the server confirms the permission, the actual HTTP request is initiated. In the return of the preflight request, the server can also notify the client whether it needs to carry identity credentials (including cookies and HTTP authentication related data)

picture

The preflight request carries the following two header fields:

Access-Control-Request-Method: POST
Access-Control-Request-Headers: X-PINGOTHER, Content-Type
  • 1.
  • 2.

The header field Access-Control-Request-Method tells the server that the actual request will use the POST method. The header field Access-Control-Request-Headers informs the server that the actual request will carry two custom request header fields: X-PINGOTHER and Content-Type. Based on this, the server decides whether the actual request is allowed.

The response to the preflight request includes the following fields

Access-Control-Allow-Origin: http://foo.example
// 表明服务器允许客户端使用 POST, GET 和 OPTIONS 方法发起请求
Access-Control-Allow-Methods: POST, GET, OPTIONS
// 表明服务器允许请求中携带字段 X-PINGOTHER 与 Content-Type
Access-Control-Allow-Headers: X-PINGOTHER, Content-Type
// 表明该响应的有效时间为 86400 秒,也就是 24 小时。在有效时间内,浏览器无须为同一请求再次发起预检请求。
Access-Control-Max-Age: 86400
  • 1.
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.

HTTP Requests and Responses In general, browsers do not send credential information for cross-domain XMLHttpRequest or Fetch requests. If you want to send credential information, you need to set a special flag of XMLHttpRequest. For example, if the withCredentials flag of XMLHttpRequest is set to true, a cookie can be sent to the server.

The server MUST NOT set Access-Control-Allow-Origin to '*' for requests with credentials attached. This is because the cookie information is carried in the header of the request. If the value of Access-Control-Allow-Origin is "*", the request will fail. While setting the value of Access-Control-Allow-Origin to http://foo.example, the request will execute successfully.

The request and response headers involved in CORS are as follows: HTTP response header fields

  • Access-Control-Allow-Origin The foreign domain URI that allows access to this resource. For requests that do not need to carry identity credentials, the server can specify the value of this field as a wildcard, indicating that requests from all domains are allowed.
  • The Access-Control-Expose-Headers header tells the server to whitelist the headers that browsers are allowed to access
  • The Access-Control-Max-Age header specifies how long the result of a preflight request can be cached
  • The Access-Control-Allow-Credentials header specifies whether to allow the browser to read the content of the response when the browser's credentials are set to true.
  • The Access-Control-Allow-Methods header field is used for responses to preflight requests. It specifies the HTTP methods allowed for the actual request.
  • The Access-Control-Allow-Headers header field is used for responses to preflight requests. It specifies the header fields that are allowed to be carried in the actual request.

HTTP request header fields:

  • The Origin header field indicates the origin of the preflight request or the actual request
  • The Access-Control-Request-Method header field is used for preflight requests. Its role is to tell the server the HTTP method used by the actual request.
  • The Access-Control-Request-Headers header field is used for preflight requests. Its role is to tell the server the header fields carried by the actual request.

refer to

  • MDN[8]
  • The development of HTTP [9]
  • Overview of HTTP [10]
  • Introduction to HTTP/2 [11]
  • Cache (2) - Browser caching mechanism: strong cache, negotiation cache [12]
  • (Recommended intensive reading) The soul of HTTP, consolidate your HTTP knowledge system [13]

References

[1] RFC 2616: https://tools.ietf.org/html/rfc2616

[2] HTTP Headers collection: https://developer.mozilla.org/zh-CN/docs/Web/HTTP/Headers

[3]RFC 1945: https://tools.ietf.org/html/rfc1945

[4]RFC 2068: https://tools.ietf.org/html/rfc2068

[5]rfc7540: https://httpwg.org/specs/rfc7540.html

[6] Website demo demo: https://http2.akamai.com/demo

[7] Cookie knowledge summary for front-end instructions: https://juejin.im/post/6844903841909964813

[8]MDN: https://developer.mozilla.org/zh-CN/docs/Web/HTTP

[9] Evolution of HTTP: https://developer.mozilla.org/zh-CN/docs/Web/HTTP/Basics_of_HTTP/Evolution_of_HTTP

[10] HTTP Overview: https://developer.mozilla.org/zh-CN/docs/Web/HTTP/Overview

[11] Introduction to HTTP/2: https://developers.google.com/web/fundamentals/performance/http2?hl=zh-cn

[12] Cache (2) - Browser Cache Mechanism: Strong Cache, Negotiated Cache: https://github.com/amandakelake/blog/issues/41

[13] (recommended intensive reading) The soul of HTTP, consolidate your HTTP knowledge system: https://juejin.im/post/6844904100035821575#heading-62