Kalan's Blog

Kalan 頭像照片,在淡水拍攝,淺藍背景

四零二曜日電子報上線啦!訂閱訂起來

Software Engineer / Taiwanese / Life in Fukuoka
This blog supports RSS feed (all content), you can click RSS icon or setup through third-party service. If there are special styles such as code syntax in the technical article, it is still recommended to browse to the original website for the best experience.

Current Theme light

我會把一些不成文的筆記或是最近的生活雜感放在短筆記,如果有興趣的話可以來看看唷!

Please notice that currenly most of posts are translated by AI automatically and might contain lots of confusion. I'll gradually translate the post ASAP

HTML together to understand which form-data

Introduction

form forms are a common feature in web pages, allowing not only the transmission of plain text but also file uploads. However, due to the different behavior of forms compared to other transmission methods, they can sometimes cause confusion and misunderstanding.

This article aims to provide a deeper understanding of what happens behind the scenes when using forms, the differences between Form Data and other transmission methods, and the underlying functionality of the HTML <form/> tag.

The main points covered are:

  • What is multipart/form-data and why is it needed?
  • Understanding the request format
  • Knowing what problems Form Data solves

Why is Form Data needed?

Data transmission requires both parties to have a certain understanding of the data format. In the world of the internet, we use protocols to regulate the form of data transmission. Through the Content-Type header of HTTP, we can determine the content of a request and interpret the data accordingly.

MIME Type defines the types of transmission formats:

  • Content-Type: application/json represents JSON content in the request
  • Content-Type: image/png represents an image file in the request

multipart/form-data is one of the Content-Type options.

Ordinary Content-Type can usually only send one type of data format. However, in web applications, we may want to upload files, images, or videos through forms. This need led to the emergence of the multipart/form-data specification (RFC7578).

Parsing Form Data Requests

The main purpose of multipart/form-data is to allow the transmission of multiple data formats in a single request. It is mainly used in HTML forms or when implementing file upload functionality.

Let's take a look at the format of a multipart/form-data request. To send a request with the Content Type set to multipart/form-data, we can use the HTML <form> tag (or use JavaScript's FormData):

<form enctype="multipart/form-data" action="/upload" method="POST">
  <input type="text" name="name" />
  <input type="file" name="file" />
  <button>Submit</button>
</form> 

When the Submit button is clicked, the browser sends a POST request:

POST /upload HTTP/1.1
Host: localhost:3000

Content-Type: multipart/form-data; boundary=----WebKitFormBoundaryFYGn56LlBDLnAkfd
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.66 Safari/537.36

----WebKitFormBoundaryFYGn56LlBDLnAkfd
Content-Disposition: form-data; name="name"

Test
----WebKitFormBoundaryFYGn56LlBDLnAkfd
Content-Disposition: form-data; name="file"; filename="text.txt"
Content-Type: text/plain

Hello World
----WebKitFormBoundaryFYGn56LlBDLnAkfd--

Since web requests are based on HTTP, multipart/form-data is also an HTTP request format specified in RFC.

To understand a multipart/form-data request, there are two key points:

  • Understanding the purpose of the boundary
  • Understanding the meaning of each format

Purpose of the Boundary

Content-Type: multipart/form-data; boundary=——WebKitFormBoundaryFYGn56LlBDLnAkfd

In the Content-Type, we can see a strange string following the boundary. What is the purpose of this boundary?

As mentioned earlier, the purpose of multipart/form-data is to allow different data formats to be sent through a single request. Therefore, there needs to be a way to determine the boundaries of each data. For example, in query parameters like a=b&c=d, the & acts as a delimiter, allowing the computer to know when to separate the data. Each time the computer encounters this boundary, it knows that the data for that attribute has been read and can proceed to read the next data.

Group 2

The specification (RFC2046) does not fully restrict the format of the boundary, but it defines the length and allowed characters:

  • Starts with two hyphens
  • Total length is within 70 characters (excluding the hyphens themselves)
  • Accepts only ASCII 7-bit

Therefore, a string like helloworldboundary is also a valid boundary.

Content-Disposition

In multipart/form-data, the Content-Disposition describes the format of the data.

Content-Disposition: form-data; name="name"

This indicates that this is a field in the Form Data with the name "name".

If it is a file, it will also include filename and the Content-Type to describe the type of the file in the next line:

Content-Disposition: form-data; name="file"; filename="text.txt"
Content-Type: text/plain

After an empty line, the data content follows:

----WebKitFormBoundaryFYGn56LlBDLnAkfd
Content-Disposition: form-data; name="name"

Test
----WebKitFormBoundaryFYGn56LlBDLnAkfd
Content-Disposition: form-data; name="file"; filename="text.txt"
Content-Type: text/plain

Hello World
----WebKitFormBoundaryFYGn56LlBDLnAkfd--

In the example, a plain text file is uploaded, but if an image file or other file format is used, it will be displayed as binary data.

Content-Disposition: form-data; name="file"; filename="image.png"
Content-Type: image/png

PNG


IHDR¤@¬
ÃiCCPICC ProfileHTSÙϽétBoô*%ôÐ{³@B!!ØPGp,¨2 cd,(¶A±a :l¨¼<ÂÌ{ë½·Þ¿ÖY÷»;ûì½ÏYçܵÏ
(omitted)

Implementing a multipart/form-data Request

Now that we understand the format of a multipart/form-data request, we can create our own request to observe it. Here's an example using node.js:

const http = require('http');
const fs = require('fs');

const content = fs.readFileSync('./text.txt');

const formData = {
  name: 'Kalan',
  file: content,
};

let payload = '';

const boundary = 'helloworld';

Object.keys(formData).forEach((k) => {
  let content;
  if (k === 'file') {
    content = [
      `\r\n--${boundary}`,
      `\r\nContent-Disposition: multipart/form-data; name=${k}; filename="text.txt"`,
      `\r\nContent-Type: text/plain`,
      `\r\n`,
      `\r\n${formData[k]}`,
    ].join('');
  } else {
    content = [
      `\r\n--${boundary}`,
      `\r\nContent-Disposition: multipart/form-data; name=${k}`,
      `\r\n`,
      `\r\n${formData[k]}`,
    ].join('');
  }

  payload += content;
});

payload += `\r\n--${boundary}--`;

const options = {
  host: 'localhost',
  port: '3000',
  path: '/upload',
  protocol: 'http:',
  method: 'POST',
  headers: {
    'Content-Type': 'multipart/form-data; boundary=helloworld',
    'Content-Length': Buffer.byteLength(payload),
  },
};

const req = http.request(options, (res) => {});

req.write(payload);
req.end();

The implementation is straightforward, just filling in the request body with the defined format. One thing to note is that each boundary starts with two hyphens, and the last boundary is marked with two hyphens at the end.

We can use Wireshark to observe if the packet content is correctly parsed:

form-data HTTP request packet parsing

form-data HTTP request packet parsing2

We can see that both name=Kalan and the file content are correctly parsed in the Encapsulated multipart part. This indicates a few things:

  • multipart/form-data is also an HTTP request.
  • Requests can be sent without a browser as long as they comply with the format.
  • The file content needs to be parsed on the server-side (the request only sends a blob of binary data).

The last point is often overlooked by many beginners. Just sending a request with multipart/form-data does not automatically provide the server with the file. It needs to be parsed to obtain the file content, which is a format easier for us to work with. For example, in node.js, a popular package for handling file uploads is multer, which is used to parse the file content.

application/x-www-form-urlencoded

When using the GET method in a form, all form contents are sent in url-encoded format. For example, the following HTML will be transformed into /upload?name=Kalan&file=filename when the Submit button is clicked, even if enctype is set to multipart/form-data, it will still be sent in the application/x-www-form-urlencoded format.

<form enctype="multipart/form-data" action="/upload" method="GET">
  <input type="text" name="name" />
  <input type="file" name="file" />
  <button>Submit</button>
</form> 

Conclusion

This article aimed to understand multipart/form-data based on the specification, discuss the problems it solves, and create a multipart/form-data request that complies with the specification. This provides a deeper understanding of this special type of HTTP request.

multipart/form-data has several advantages for web applications:

  • Different data formats can be sent in a single request.
  • It allows users to upload files.
  • Browsers have a unified specification to implement.

For developers, understanding multipart/form-data serves several purposes:

  • Knowing the principles of achieving file uploads on the web.
  • Understanding how different data formats are transmitted based on HTTP requests.
  • Enhancing development speed through a grasp of the underlying principles.

The next article will focus on the <form> tag, exploring how browsers handle this HTML tag and what we, as developers, should pay attention to.

Next article will discuss the <form> tag, how browsers handle it, and what developers should be aware of.

Prev

在日本看醫生

Next

Application of form tag with formData

If you found this article helpful, please consider buy me a drink ☕️ It'll make my ordinary day shine✨

Buy me a coffee