Some tricks about reading files in node.js.
# FrontendIn Node.js, working with files using the fs module is a common practice. However, when dealing with high throughput scenarios, any I/O operations should be approached with caution. For example, here’s a typical piece of code for reading a file:
const fs = require('fs')
fs.readFile('./text.txt', (err, data) => {
if (!err) {
console.log(data.toString())
}
})
This approach works fine for reading small files, but if the file is too large, it can lead to a significant memory footprint. Node.js determines the maximum Buffer size based on the platform’s integer pointer length.
console.log(buffer.constants.MAX_LENGTH)
// 4294967296 = 4GB
This means that if the file exceeds 4GB, the above code becomes problematic, and Node.js will throw an error.
Working with Streams
Experienced developers often use fs.createReadStream to handle files to avoid issues related to file size. The main difference from readFile is that using streams allows us to break a large file into several chunks, each only a few dozen KB in size. For web services, transmitting data via streams is quite natural; it even enables the browser to start rendering HTML content as soon as it receives part of it, rather than waiting for the entire content to finish.
Here’s how to rewrite the code using createReadStream:
const fs = require('fs')
let data = ''
const stream = fs.createReadStream('./text.txt')
stream.on('data', chunk => {
data += chunk
})
stream.on('end', () => {
console.log(data)
})
At first glance, everything seems fine, and the program runs well, even applicable to most file processing tasks. However, a closer look at data += chunk reveals a potential issue. Some developers might treat chunk as a string without realizing that the data returned by the stream is actually a buffer. Thus, data += chunk is effectively concatenating chunk.toString() behind the scenes. At this point, some developers may start to feel alarmed.
That’s right! The most crucial aspect of working with strings is the encoding. By default, converting a buffer to a string uses UTF-8 encoding. Therefore, the way data += chunk is written could lead to incorrect results, as UTF-8 can represent a single character with 1, 2, 3, or 4 bytes. For demonstration purposes, I adjusted the highWaterMark to 5 bytes.
// text.txt
這是一篇部落格PO文
const fs = require('fs')
let data = ''
const stream = fs.createReadStream('./text.txt', { highWaterMark: 5 })
stream.on('data', chunk => {
data += chunk
})
stream.on('end', () => {
console.log(data)
})
The output will be:
這��一���部落��PO��
Since each chunk is a maximum of 5 bytes, chunk.toString() may produce incorrect encoding because not all data has been received yet.
Correct Concatenation Method: Buffer.concat
To use Buffers correctly, it’s best to leverage the API provided by Node.js and then convert to a string, avoiding encoding issues.
const fs = require('fs')
let data = []
let size = 0
const stream = fs.createReadStream('./text.txt', { highWaterMark: 5 })
stream.on('data', chunk => {
data.push(chunk)
size += chunk.length
})
stream.on('end', () => {
Buffer.concat(data, size).toString()
})
This approach avoids encoding problems. However, writing it this way can be quite cumbersome. If you’re just doing simple analysis, using readFile or readFileSync isn’t a bad idea. But when handling large file analyses or high throughput, these details become critical. (A side note: at this point, you might just opt to use a different programming language)
Conclusion
When working with large files, avoid loading them entirely into memory. Instead, opt for stream-based operations, and when transmitting data, use Buffer to enhance throughput and avoid unnecessary encoding issues, while also being mindful of encoding-related operations.
Related Posts
- CSS field-sizing — Auto-Resize Form Elements with One Line of CSSMaking a textarea auto-resize used to require JavaScript to watch scrollHeight. CSS field-sizing: content replaces all of that in one line, supporting textarea, input, and select.
- Make Your Hyperlink Underlines Look Better: text-underline-offsetBy default, underlines sit very close to the text, and some designers dislike this style. Personally, I don’t think it looks very good either.
- Why Web Design Shouldn’t Chase Pixel PerfectOnly pay attention to Pixel Perfect when it really matters; otherwise, it often leads to a lose-lose situation.
- Let’s Write Colors with CSS HSL! (And a Better Way)In web development, the traditional HEX and RGB color notations are widely used, but they are not very readable or intuitive, and their capabilities are limited in wider color spaces such as P3. HSL (Hue, Saturation, Lightness) provides a more intuitive way to define colors, making it easier for developers to understand and adjust them. By describing colors through the three dimensions of hue, saturation, and lightness, HSL makes color adjustment more human-friendly. In design systems in particular, HSL can better represent lightness variations in a color palette.