HTML together to understand which form-data

Written byKalanKalan
💡

If you have any questions or feedback, pleasefill out this form

This post is translated by ChatGPT and originally written in Mandarin, so there may be some inaccuracies or mistakes.

Introduction

The form element is quite common in web applications, serving not only to transmit plain text but also to facilitate file uploads. However, due to the different behavior of forms compared to other transmission methods, confusion and misunderstandings can sometimes arise.

This article aims to provide a deeper understanding of what happens behind the scenes with forms after grasping the relevant specifications, as well as the differences between Form Data and other transmission methods. Finally, we will discuss the functionality provided by the HTML <form/> tag.

The main points covered are:

  • What is multipart/form-data and why do we need it?
  • How to understand request formats
  • What problems are solved by form-data

Why Do We Need Form Data?

For data transmission, both parties need to have a certain understanding of the data format. In the world of the internet, we use protocols to standardize how data is transmitted. Through the HTTP Content-Type header, we can identify the content of a request and interpret the data accordingly.

MIME Type defines the types of transmission formats:

  • Content-Type: application/json indicates that the request content is JSON
  • Content-Type: image/png indicates that the request content is an image file

Among these, multipart/form-data is one of the Content-Type options.

Generally, Content-Type can only transmit one type of data at a time. However, in web applications, we may also want to upload files, images, or videos through forms, which led to the emergence of the multipart/form-data specification (RFC7578).

Parsing Form Data Requests

The primary utility of multipart/form-data is that it allows users to send multiple data formats in a single request, mainly used in HTML forms or for implementing file upload functionalities.

Next, let's take a look at what a multipart/form-data format looks like. To send a request with a Content Type of multipart/form-data, we can use the HTML form tag (or JavaScript's FormData):

<form enctype="multipart/form-data" action="/upload" method="POST">
  <input type="text" name="name" />
  <input type="file" name="file" />
  <button>Submit</button>
</form> 

When the Submit button is clicked, the browser sends a POST request:

POST /upload HTTP/1.1
Host: localhost:3000

Content-Type: multipart/form-data; boundary=----WebKitFormBoundaryFYGn56LlBDLnAkfd
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.66 Safari/537.36

WebKitFormBoundaryFYGn56LlBDLnAkfd
Content-Disposition: form-data; name="name"

Test
WebKitFormBoundaryFYGn56LlBDLnAkfd
Content-Disposition: form-data; name="file"; filename="text.txt"
Content-Type: text/plain

Hello World
WebKitFormBoundaryFYGn56LlBDLnAkfd--

Since web requests are based on HTTP, multipart/form-data will also be an HTTP request, with its format specified in the RFC.

To understand a multipart/form-data request, two key points need to be noted:

  • The role of the boundary
  • The meaning of each format

The Role of Boundary

Content-Type: multipart/form-data; boundary=----WebKitFormBoundaryFYGn56LlBDLnAkfd

In the Content-Type, we can see that the boundary is followed by a strange string. What is the purpose of this boundary?

As mentioned earlier, the purpose of multipart/form-data is to allow different formats of data to be sent through a single request. Therefore, there needs to be a way to determine where each piece of data begins and ends. For example, in query parameters: a=b&c=d, the & serves as a delimiter, allowing the computer to know when to split the data. Each time the computer encounters this boundary, it knows that the data for the current attribute has been fully read and it can start reading the next piece of data.

Group 2

The specifications do not impose strict limitations on the format of the boundary, but they do define its length and allowable characters:

  • Begins with two hyphens
  • Total length not exceeding 70 (excluding the hyphens themselves)
  • Only accepts ASCII 7bit

Thus, a string like helloworldboundary is also a completely valid boundary.

Content-Disposition

In multipart/form-data, the Content-Disposition serves to describe the format of the data.

Content-Disposition: form-data; name="name"

This indicates that this is a field in the Form Data with the name name.

If it is a file, the filename will also be appended, and the next line will include Content-Type to describe the file type:

Content-Disposition: form-data; name="file"; filename="text.txt"
Content-Type: text/plain

A blank line follows before the data content:

WebKitFormBoundaryFYGn56LlBDLnAkfd
Content-Disposition: form-data; name="name"

Test
WebKitFormBoundaryFYGn56LlBDLnAkfd
Content-Disposition: form-data; name="file"; filename="text.txt"
Content-Type: text/plain

Hello World
WebKitFormBoundaryFYGn56LlBDLnAkfd--

In this example, I uploaded a plain text file. If it were an image or another file type, it would be displayed in binary format.

Content-Disposition: form-data; name="file"; filename="image.png"
Content-Type: image/png

PNG


IHDR¤@¬
ÃiCCPICC ProfileHTSÙϽétBoô*%ôÐ{³@B!!ØPGp,¨2 cd,(¶A±a :l¨¼<ÂÌ{ë½·Þ¿ÖY÷»;ûì½ÏYçܵÏ
(omitted)

Implementing a multipart/form-data Request

Now that we understand the request format for multipart/form-data, we can try to implement one ourselves. Here, we'll use node.js as an example:

const http = require('http');
const fs = require('fs');

const content = fs.readFileSync('./text.txt');

const formData = {
  name: 'Kalan',
  file: content,
};

let payload = '';

const boundary = 'helloworld';

Object.keys(formData).forEach((k) => {
  let content;
  if (k === 'file') {
    content = [
      `\r\n--${boundary}`,
      `\r\nContent-Disposition: multipart/form-data; name=${k}; filename="text.txt"`,
      `\r\nContent-Type: text/plain`,
      `\r\n`,
      `\r\n${formData[k]}`,
    ].join('');
  } else {
    content = [
      `\r\n--${boundary}`,
      `\r\nContent-Disposition: multipart/form-data; name=${k}`,
      `\r\n`,
      `\r\n${formData[k]}`,
    ].join('');
  }

  payload += content;
});

payload += `\r\n--${boundary}--`;

const options = {
  host: 'localhost',
  port: '3000',
  path: '/upload',
  protocol: 'http:',
  method: 'POST',
  headers: {
    'Content-Type': 'multipart/form-data; boundary=helloworld',
    'Content-Length': Buffer.byteLength(payload),
  },
};

const req = http.request(options, (res) => {});

req.write(payload);
req.end();

Implementing it is straightforward; it merely involves filling in the request body with the defined format. The key point to note is that each boundary begins with two hyphens, and the last boundary ends with two hyphens as well.

Next, we can use Wireshark to observe whether the packet content is parsed correctly:

form-data HTTP request packet analysis

form-data HTTP request packet analysis2

We can see that the encapsulated multipart part, which includes name=Kalan and the file content, has been parsed correctly. This indicates several things:

  • multipart/form-data is also a type of HTTP request
  • Requests can be sent without a browser as long as they conform to the format
  • File content must be parsed on the server-side (the request only transmits a chunk of binary data)

The last point is often overlooked by beginners; sending a request with multipart/form-data doesn't mean that the backend can directly access the file. It requires parsing to retrieve the file content, which is a more manageable format for us. For instance, in node.js, a popular package for handling file uploads is multer, which helps parse file contents.

application/x-www-form-urlencoded

If you use the GET method to submit a form, all form contents will be transmitted in URL-encoded format. For example, the following HTML will transform into /upload?name=Kalan&file=filename when the Submit button is clicked, even if the enctype specifies multipart/form-data.

<form enctype="multipart/form-data" action="/upload" method="GET">
  <input type="text" name="name" />
  <input type="file" name="file" />
  <button>Submit</button>
</form> 

Conclusion

This article aimed to understand multipart/form-data through specifications, explore the problems Form Data solves, and attempt to create a compliant multipart/form-data request for a deeper understanding of this uniquely structured HTTP request.

multipart/form-data offers several benefits for web applications:

  • Different formats of data can be sent in a single request
  • It meets users' needs for file transmission
  • Browsers have a standardized specification to implement

For developers, understanding multipart/form-data serves several purposes:

  • Knowing the principles for achieving file uploads on the web
  • Understanding how HTTP requests standardize the transmission of different format data
  • Mastery of these principles can accelerate development speed

The next article will focus on the <form> tag, exploring how browsers handle this HTML element and what developers should pay attention to.

If you found this article helpful, please consider buying me a coffee ☕ It'll make my ordinary day shine ✨

Buy me a coffee