When displaying currency values, there is often a need to format the original number in a more human-readable format, such as:
- 1234567 → 1,234,567
- 10000 → 10,000
In frontend development, there are several ways to achieve this:
- Using
Intl.NumberFormat
(may require a polyfill for older browsers) - Using regular expressions with
.replace
There have been many discussions on this topic on StackOverflow, and one of the most popular ones is probably this post: How to print a number with commas as thousands separators in JavaScript
There are multiple solutions, but they generally fall into these two patterns:
const reg1 = /\B(?=(\d{3})+$)/
const reg2 = /(\d)(?=(\d{3})+$)/
This article will attempt to explain the differences between these two regular expressions and their execution. Finally, we will test their performance.
Introduction
Before we begin, there are a few important concepts that need to be understood: positive lookahead
, negative lookahead
, and word boundary
. These concepts are not commonly encountered when learning regular expressions, but they are actually quite powerful.
Positive Lookahead and Negative Lookahead
In regular expressions, positive lookahead is denoted by ?=
. For example, a(?=b)
means matching a
only if it is followed by the letter b
. It is important to note that ?=
itself does not participate in the match, so this regular expression will only match a
.
As shown in the above diagram, only a
is matched by the regular expression.
Lookahead syntax can accept any valid regular expression. For example: ,(?=(?:\d{3})+$)
. This regular expression matches a comma that is followed by three consecutive digits, and it matches one or more times, ending at the end of the string.
Negative lookahead, denoted by ?!
, is the opposite of positive lookahead. For example, a(?!b)
matches a
only if it is not followed by the letter b
.
It is important to note that both positive lookahead and negative lookahead are zero-length expressions, meaning they do not consume any characters in the match. If you use (?=a)
without adding any characters, the match will have a length of 0.
Although there is a successful match, the length of the match is 0, between the characters.
In the regular expressions:
/\B(?=(\d{3})+$)/
and /(?=(\d{3})+$)/
, the meanings are the same (with some minor differences). The reason these two expressions are equivalent is because we will introduce \b
and \B
below.
The Meaning of \b
and \B
\b
In regular expressions, the case sensitivity often represents the opposite meaning. For example, \d
matches a digit, so \D
matches a non-digit. Let's first understand the meaning of \b
according to the MDN documentation:
A word boundary matches the position where a word character is not followed or preceded by another word character. Note that a matched word boundary is not included in the match. In other words, the length of a matched word boundary is zero.
To understand the definition of a word character, we need to understand \w
, which is defined as follows:
Matches any alphanumeric character including the underscore. Equivalent to
[A-Za-z0-9_]
.
Now that we know what \w
means, let's understand the meaning of "a word character is not followed or preceded by another word character". \b
can appear in the following situations, and for clarity, we will use the term "word character" to represent \w
:
- At the beginning of a word character
- Between a word character and a non-word character
- At the end of a word character
The following diagram illustrates these situations:
In fact, you can also understand word boundaries as the edges between characters. It is important to emphasize that if no other characters are added, \b
itself is a zero-width match, meaning it does not consume any characters in the match. However, it does not mean that there is no match.
Do not confuse this with cases where there are characters, such as d\b
. This means matching the letter d
followed by a word boundary. In this case, the actual matched character is d
:
\B
\B
represents the opposite of a word boundary, meaning it matches any position that is not a word boundary. In the diagram below, the arrows indicate non-word boundary positions.
Understanding Regular Expressions Correctly
Understanding regular expressions requires experience. However, it is helpful to have some concepts for practical development. A regular expression can be seen as a state machine transition. For example, \d+
can be represented as follows:
Usually, an initial state is added (e.g., if the input is not a digit, it should not go to state 0), but the main idea is to understand it. Place possible input text in the arrows and determine whether to transition to the next state. If the state is a final state, it means the match is accepted.
Analyzing the Expressions
Method 1: Matching using zero-length matches
After providing the background information and necessary knowledge, we can finally analyze the first expression: /\B(?=(\d{3})+$)/g
.
The initial \B
matches a non-word boundary position. Next, let's examine the regular expression after (?=)
. (\d{3})+
matches one or more consecutive groups of three digits, such as 333
, 666
, 123
, and so on. The regular expression after (?!)
matches a single digit. Combining the overall meaning, it matches a non-word boundary position that is not followed by three consecutive digits, and this pattern can occur one or more times.
In this case, the interesting part is (\d{3})+$)
. This regular expression matches only if the length of the match is a multiple of 3 and it occurs at the end of the string. For example, 123456
has a length that is a multiple of 3, but 12345
does not match because although it matches one \d{3}
, it is not at the end of the string.
Using this characteristic, with \B
cleverly applied, for the number 1000000
, it will match two positions as shown below:
Therefore, when calling .replace
, you can write it like this:
"1000000".replace(/\B(?=(\d{3})+$)/g, ",");
Based on the matching positions shown in the diagram, it will insert ,
at these two positions, resulting in 1,000,000
. This is why this regular expression does not require $1,
because both \B
and (?=)
are zero-length matches, meaning they do not consume any characters in the match, resulting in a match length of 0.
The matching process can be observed in the following video. The number of matches shown in the video is only for reference and may vary depending on the language. Some steps are also omitted, but it roughly demonstrates the process:
Method 2: Matching the digits that should have commas
/(\d)(?=(\d{3})+$)/
From this expression, you can see that it is very similar to the previous one, except that \B
is removed and \d
is added. Overall, there is not much difference. However, there is one difference: \d
actually matches a digit. The final result will look like this:
(The image includes (?:)
to indicate that the match result is not captured in a group, but the result is the same)
I personally prefer to use (?:)
when the captured value is not used, as it makes it easier for others and future me to understand.
The overall process will be like this: (omitting the unsuccessful match steps)
So in JavaScript, you would write it like this:
"1000000".replace(/(\d)(?=(\d{3})+$)/g, "$1,"); // Note the $1 here
The $1
is important because we want to include the matched digit in the replacement. If we only use ,
, it will result in ,00,000
.
Other Considerations and Approaches
The two regular expressions mentioned above are based on the condition (?=(\d{3})+$)
. However, in practice, there may be cases where decimal points are involved, and the expressions may fail to match, such as 1000.12
.
In such cases, it may be necessary to modify the expressions to handle the presence of decimal points, for example, by adding \b
to enforce word boundaries to stop the match at the decimal point.
Additionally, modern browsers provide the Intl.NumberFormat
API, which can be used out of the box without additional configuration. You can refer to the MDN documentation for usage examples.
new Intl.NumberFormat('ja-JP', { style: 'currency', currency: 'JPY' }).format(number);
Performance and Other Considerations
Since the results are the same, there are a few remaining considerations: readability and performance.
In terms of readability and ease of use, Intl.NumberFormat
is the best option. The MDN documentation provides clear instructions, and it is very convenient to use.
The only thing to consider is performance. Here is a test using jsbench
(link). It can be seen that Intl.NumberFormat
is almost twice as slow. This may be due to the loading of internationalization (i18n) and number conversion for different locales.
The matching using zero-length matches is faster than matching using \d
, probably due to the zero-length nature. However, it is important to note that expressions like (\d{3})+
perform backtracking to match as many results as possible, which can cause performance issues. Therefore, caution is needed when using such expressions.
In practice, we can use requestIdleCallback
to delay the initialization of Intl.NumberFormat
to avoid excessive performance impact. Alternatively, we can wrap the logic in a separate function and initialize it only when it is actually called by other files. This should help mitigate performance issues.
Other Approaches
The regular expressions mentioned above are mainly based on lookahead. If we want to achieve the same result using loops, how can we do it? Here is an alternative implementation of /(\d)(?=(?:\d{3})+\b)/g
:
let digits = number.toFixed(2).toString();
let matcher = /(\d)(?=(?:\d{3})+\b)/g;
while (matcher.test(digits)) {
let first = digits.slice(0, matcher.lastIndex);
let second = digits.slice(matcher.lastIndex);
digits = first + "," + second;
}
Here is a more intuitive approach that replaces one match at a time:
let digits = number.toFixed(2).toString();
let matcher = /(\d+)(\d{3})/;
while (matcher.test(digits)) {
digits = digits.replace(matcher, "$1,$2");
}
Let's review the results again:
Name | Ops/s | |
---|---|---|
Zero-length /\B(?=(\d{3})+\b)/g | 1778943 ops/s fastest | |
Matching using zero-length matches (without \B) | 1712701 ops/s 3.72% slower | |
While loop | 1371453 ops/s 22.91% slower | |
Simple loop | 597173.88 ops/s 66.43% slower | |
Intl.NumberFormat | 25304.89 ops/s 98.55% slower |
The fastest method is still matching using zero-length matches, followed by the while loop. The slowest method is still Intl.NumberFormat
. If you are interested in the test results, you can try them out in the link.
Conclusion
There is a lot to learn about regular expressions, and concepts like lookahead
and word boundary
are not often mentioned. I have summarized them here. Many concepts are well-documented in the MDN documentation, and the website Regex101 provides a visual representation of regular expressions, making it convenient to understand. However, I still find regular expressions difficult to understand.