Introduction
form
forms are a common feature in web pages, allowing not only the transmission of plain text but also file uploads. However, due to the different behavior of forms compared to other transmission methods, they can sometimes cause confusion and misunderstanding.
This article aims to provide a deeper understanding of what happens behind the scenes when using forms, the differences between Form Data and other transmission methods, and the underlying functionality of the HTML <form/>
tag.
The main points covered are:
- What is
multipart/form-data
and why is it needed? - Understanding the request format
- Knowing what problems Form Data solves
Why is Form Data needed?
Data transmission requires both parties to have a certain understanding of the data format. In the world of the internet, we use protocols to regulate the form of data transmission. Through the Content-Type
header of HTTP, we can determine the content of a request and interpret the data accordingly.
MIME Type defines the types of transmission formats:
Content-Type: application/json
represents JSON content in the requestContent-Type: image/png
represents an image file in the request
multipart/form-data
is one of the Content-Type
options.
Ordinary Content-Type
can usually only send one type of data format. However, in web applications, we may want to upload files, images, or videos through forms. This need led to the emergence of the multipart/form-data
specification (RFC7578).
Parsing Form Data Requests
The main purpose of multipart/form-data
is to allow the transmission of multiple data formats in a single request. It is mainly used in HTML forms or when implementing file upload functionality.
Let's take a look at the format of a multipart/form-data
request. To send a request with the Content Type set to multipart/form-data
, we can use the HTML <form>
tag (or use JavaScript's FormData):
<form enctype="multipart/form-data" action="/upload" method="POST">
<input type="text" name="name" />
<input type="file" name="file" />
<button>Submit</button>
</form>
When the Submit button is clicked, the browser sends a POST request:
POST /upload HTTP/1.1
Host: localhost:3000
Content-Type: multipart/form-data; boundary=----WebKitFormBoundaryFYGn56LlBDLnAkfd
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.66 Safari/537.36
----WebKitFormBoundaryFYGn56LlBDLnAkfd
Content-Disposition: form-data; name="name"
Test
----WebKitFormBoundaryFYGn56LlBDLnAkfd
Content-Disposition: form-data; name="file"; filename="text.txt"
Content-Type: text/plain
Hello World
----WebKitFormBoundaryFYGn56LlBDLnAkfd--
Since web requests are based on HTTP, multipart/form-data
is also an HTTP request format specified in RFC.
To understand a multipart/form-data
request, there are two key points:
- Understanding the purpose of the boundary
- Understanding the meaning of each format
Purpose of the Boundary
Content-Type: multipart/form-data; boundary=——WebKitFormBoundaryFYGn56LlBDLnAkfd
In the Content-Type, we can see a strange string following the boundary. What is the purpose of this boundary?
As mentioned earlier, the purpose of multipart/form-data
is to allow different data formats to be sent through a single request. Therefore, there needs to be a way to determine the boundaries of each data. For example, in query parameters like a=b&c=d
, the &
acts as a delimiter, allowing the computer to know when to separate the data. Each time the computer encounters this boundary, it knows that the data for that attribute has been read and can proceed to read the next data.
The specification (RFC2046) does not fully restrict the format of the boundary, but it defines the length and allowed characters:
- Starts with two hyphens
- Total length is within 70 characters (excluding the hyphens themselves)
- Accepts only ASCII 7-bit
Therefore, a string like helloworldboundary
is also a valid boundary.
Content-Disposition
In multipart/form-data
, the Content-Disposition describes the format of the data.
Content-Disposition: form-data; name="name"
This indicates that this is a field in the Form Data with the name "name".
If it is a file, it will also include filename
and the Content-Type
to describe the type of the file in the next line:
Content-Disposition: form-data; name="file"; filename="text.txt"
Content-Type: text/plain
After an empty line, the data content follows:
----WebKitFormBoundaryFYGn56LlBDLnAkfd
Content-Disposition: form-data; name="name"
Test
----WebKitFormBoundaryFYGn56LlBDLnAkfd
Content-Disposition: form-data; name="file"; filename="text.txt"
Content-Type: text/plain
Hello World
----WebKitFormBoundaryFYGn56LlBDLnAkfd--
In the example, a plain text file is uploaded, but if an image file or other file format is used, it will be displayed as binary data.
Content-Disposition: form-data; name="file"; filename="image.png"
Content-Type: image/png
PNG
IHDR¤@¬
ÃiCCPICC ProfileHTSÙϽétBoô*%ôÐ{³@B!!ØPGp,¨2 cd,(¶A±a :l¨¼<ÂÌ{ë½·Þ¿ÖY÷»;ûì½ÏYçܵÏ
(omitted)
Implementing a multipart/form-data
Request
Now that we understand the format of a multipart/form-data
request, we can create our own request to observe it. Here's an example using node.js
:
const http = require('http');
const fs = require('fs');
const content = fs.readFileSync('./text.txt');
const formData = {
name: 'Kalan',
file: content,
};
let payload = '';
const boundary = 'helloworld';
Object.keys(formData).forEach((k) => {
let content;
if (k === 'file') {
content = [
`\r\n--${boundary}`,
`\r\nContent-Disposition: multipart/form-data; name=${k}; filename="text.txt"`,
`\r\nContent-Type: text/plain`,
`\r\n`,
`\r\n${formData[k]}`,
].join('');
} else {
content = [
`\r\n--${boundary}`,
`\r\nContent-Disposition: multipart/form-data; name=${k}`,
`\r\n`,
`\r\n${formData[k]}`,
].join('');
}
payload += content;
});
payload += `\r\n--${boundary}--`;
const options = {
host: 'localhost',
port: '3000',
path: '/upload',
protocol: 'http:',
method: 'POST',
headers: {
'Content-Type': 'multipart/form-data; boundary=helloworld',
'Content-Length': Buffer.byteLength(payload),
},
};
const req = http.request(options, (res) => {});
req.write(payload);
req.end();
The implementation is straightforward, just filling in the request body with the defined format. One thing to note is that each boundary starts with two hyphens, and the last boundary is marked with two hyphens at the end.
We can use Wireshark to observe if the packet content is correctly parsed:
We can see that both name=Kalan
and the file content are correctly parsed in the Encapsulated multipart part. This indicates a few things:
multipart/form-data
is also an HTTP request.- Requests can be sent without a browser as long as they comply with the format.
- The file content needs to be parsed on the server-side (the request only sends a blob of binary data).
The last point is often overlooked by many beginners. Just sending a request with multipart/form-data
does not automatically provide the server with the file. It needs to be parsed to obtain the file content, which is a format easier for us to work with. For example, in node.js
, a popular package for handling file uploads is multer, which is used to parse the file content.
application/x-www-form-urlencoded
When using the GET method in a form, all form contents are sent in url-encoded format. For example, the following HTML will be transformed into /upload?name=Kalan&file=filename
when the Submit button is clicked, even if enctype
is set to multipart/form-data
, it will still be sent in the application/x-www-form-urlencoded
format.
<form enctype="multipart/form-data" action="/upload" method="GET">
<input type="text" name="name" />
<input type="file" name="file" />
<button>Submit</button>
</form>
Conclusion
This article aimed to understand multipart/form-data
based on the specification, discuss the problems it solves, and create a multipart/form-data
request that complies with the specification. This provides a deeper understanding of this special type of HTTP request.
multipart/form-data
has several advantages for web applications:
- Different data formats can be sent in a single request.
- It allows users to upload files.
- Browsers have a unified specification to implement.
For developers, understanding multipart/form-data
serves several purposes:
- Knowing the principles of achieving file uploads on the web.
- Understanding how different data formats are transmitted based on HTTP requests.
- Enhancing development speed through a grasp of the underlying principles.
The next article will focus on the <form>
tag, exploring how browsers handle this HTML tag and what we, as developers, should pay attention to.
Next article will discuss the <form>
tag, how browsers handle it, and what developers should be aware of.