Database Primary Keys: AUTO_INCREMENT, UUID, and UUIDv7
# Dev NoteShould the primary key use auto increment or UUID? Recently, while reading related discussions, I noticed that the answer to this question seems to have gradually converged on UUIDv7. But even if the consensus is settling, the trade-offs and context behind it are still worth understanding properly; otherwise, you’re just following a trend in a different way.
TL;DR
UUIDv4brings almost no benefit- If you’re certain it’s a single-database architecture, the ID is not exposed externally, and there will never be a need to merge data across databases in the future, Auto Increment is the simplest choice
- In all other cases, use UUIDv7
What Is UUID?
UUID (Universally Unique Identifier) is a 128-bit identifier. The standard format looks like this:
550e8400-e29b-41d4-a716-446655440000
It has 36 characters, split by - into 5 parts (8-4-4-4-12). This segmented format comes from the earliest UUIDv1, where each segment corresponded to a different internal field. By v4 and v7, most fields are filled with random values, but the format is preserved for backward compatibility.
There are several UUID versions; the ones discussed in this post are mainly:
- UUIDv41: 122 bits of pure randomness, with no structure at all. It is currently the most widely used version, but it is not very database-friendly.
- UUIDv72: the first 48 bits store a millisecond-level timestamp, and the remaining roughly 74 bits store randomness. It balances time ordering and randomness, and is the current recommended choice.
The two look exactly the same; the difference is in their internal structure. You can tell them apart by the first character of the third segment: 4xxx is v4, 7xxx is v7.
UUIDv4: 550e8400-e29b-41d4-a716-446655440000
^
UUIDv7: 018f3a3b-7a5d-7a3b-8b3a-3b7a5d7a3b8b
^
Why Are Primary Keys So Important?
Depending on the use case, primary keys affect write performance, query efficiency, whether the system can scale in the future, and whether you’ll run into landmines during migrations later on. From the benchmark below, the difference can even exceed sevenfold.
Auto Increment vs UUID: What’s the Difference?
Let’s start with the most basic decision point. When choosing a Primary Key, you can think about it from several angles:
- Insert performance
- Index performance
- Search performance
The advantage of Auto Increment is very straightforward: the database automatically increments it for you, it’s an 8-byte integer, compact, fast, and friendly to B-Tree indexes. Because it always appends to the end, INSERT performance is basically the best possible.
But Auto Increment has two fundamental limitations:
1. It Is Not Friendly to Distributed Architectures
In distributed systems, generating a globally unique ID is difficult. This can be solved by implementing a central ID generator to ensure every ID is globally unique, such as Twitter’s Snowflake algorithm.
However, a central ID generator introduces performance bottlenecks and a single point of failure risk. To avoid a single point of failure, multiple workers are often deployed to coordinate in practice, which increases complexity and the components that need to be maintained.
2. Predictability
If you change the number to 1024 or 1022, you might be able to see someone else’s orders (if the implementation is not done properly). Competitors can estimate your user base or transaction volume from how quickly IDs increase. (You can also use this in reverse as a deception tactic.)
UUID (represented here by UUIDv4) solves these two problems: with a 128-bit random space, any node can generate IDs independently, without central coordination, and they are almost impossible to guess.
UUIDv4 is completely random, but when inserted into a B-Tree, it ends up inserting all over pages that are already full, triggering a lot of page splits and significantly reducing write performance. This is why many people feel that “the database got slower” after switching to UUID.
So in practice, the key questions are:
- Does the system need distributed ID generation?
- Will the ID be exposed externally (URL, API response)?
- How high are the requirements for database write performance?
If your system is a monolith and the ID is not exposed externally, auto increment is perfectly fine. But as soon as distributed generation is involved, or the ID appears in URLs, a UUID-based solution is almost necessary.
What About Collisions?
This is probably a question that comes up every time UUID is discussed.
UUIDv4 has a 122-bit random space, with about 5.3 × 1036 possible combinations.
To reach a 50% collision probability, you would need to generate about 2.7 × 1018 UUIDs — roughly the scale of “producing 1 billion UUIDs per second worldwide for 85 continuous years.”
UUIDv7’s random space “looks like” it shrinks from 122 bits to about 74 bits, but collisions can only happen within the same millisecond. In other words, only UUIDs generated within the same millisecond need the 74-bit random part to avoid collisions, and 74 bits gives about 1.8 × 1022 possible combinations — more than enough for a single millisecond.
UUIDv7 vs created_at + index: Real Benchmark Results
The most attractive feature of UUIDv7 is this: the first 48 bits are a millisecond-level Unix timestamp, so it is naturally sorted by time.
UUIDv7 (128 bits)
┌──────────────────────┬────┬──────────────────────────────┐
│ Unix Timestamp (ms) │ ver│ Random │
│ 48 bits │ 4b │ ~74 bits │
└──────────────────────┴────┴──────────────────────────────┘
↑ time-sortable ↑v7 ↑ randomness prevents collisions
If you use UUIDv7 as the primary key, ORDER BY id is effectively the same as ORDER BY created_at. You no longer need an extra created_at column and corresponding index for time sorting.
That sounds great, but how much better is it really? I ran a PostgreSQL 17 instance in Docker and actually tested it with 20 million rows.
Test Setup
Three tables, same data structure, differing only in primary key strategy:
-- Approach A: UUIDv7 primary key (no extra created_at needed)
CREATE TABLE orders_uuidv7 (
id uuid PRIMARY KEY DEFAULT uuid_generate_v7(),
user_id int NOT NULL,
amount numeric(10,2) NOT NULL,
status text NOT NULL
);
-- Approach B: Serial primary key + created_at index
CREATE TABLE orders_serial (
id bigserial PRIMARY KEY,
user_id int NOT NULL,
amount numeric(10,2) NOT NULL,
status text NOT NULL,
created_at timestamptz NOT NULL DEFAULT clock_timestamp()
);
CREATE INDEX idx_orders_serial_created_at ON orders_serial (created_at);
-- Approach C: UUIDv4 primary key (comparison group)
CREATE TABLE orders_uuidv4 (
id uuid PRIMARY KEY DEFAULT gen_random_uuid(),
user_id int NOT NULL,
amount numeric(10,2) NOT NULL,
status text NOT NULL
);
Test environment:
- Mac Mini M4
- Docker PostgreSQL 17
shared_buffers = 256MB
Each round inserted 1 million rows, for 20 rounds total, reaching 20 million rows. UUIDs were generated on the fly during each round (not pre-generated), so the cost of the generation function is included.
Test 1: INSERT Performance Trend (per 1 million rows, unit: ms)
| Cumulative Rows | Serial | UUIDv7 | UUIDv4 |
|---|---|---|---|
| 1M | 1,132 | 3,185 | 2,998 |
| 5M | 1,474 | 2,661 | 2,972 |
| 10M | 1,165 | 2,560 | 6,497 |
| 13M | 1,123 | 2,745 | 9,668 |
| 15M | 1,411 | 2,630 | 9,188 |
| 16M | 1,133 | 2,552 | 15,857 |
| 20M | 1,017 | 2,539 | 20,845 |
The important thing is not the absolute numbers, but the trend.
UUIDv4 jumped from 3 seconds all the way up to 21 seconds.
Because the larger the B-Tree index gets, the higher the chance that a random insert will hit a page that has already been flushed to disk. Each INSERT may then trigger: reading the old page back from disk into the buffer pool → page split → writing back to disk.
For each round of 1 million inserted rows, the WAL volume directly reflects the database’s actual disk write burden:
| Cumulative Rows | Serial WAL | UUIDv7 WAL | UUIDv4 WAL |
|---|---|---|---|
| 1M | 146 MB | 171 MB | 744 MB |
| 5M | 147 MB | 171 MB | 1,227 MB |
| 8M | 148 MB | 171 MB | 1,715 MB |
| 10M | 147 MB | 171 MB | 2,418 MB |
Serial and UUIDv7 WAL volume stayed constant from start to finish (~147 vs ~171 MB). The difference between them (24 MB) is simply the size difference between UUID’s 16 bytes and bigint’s 8 bytes.
But UUIDv4’s WAL ballooned from 744 MB to 2,418 MB — for the same 1 million-row insert, by round 10 the WAL write volume was already 14 times that of UUIDv7.
This is high write amplification3.
UUIDv7 stayed stable at 2.5–2.8 seconds throughout, with no degradation at all.
Because its time-ordered nature means new keys are almost always appended to the rightmost leaf page of the B-Tree, just like auto-increment — the database can write sequentially, with very few page splits.
Serial stayed stable at 1.0–1.5 seconds.
Serial is unquestionably the fastest, but the difference between UUIDv7 and Serial mainly comes from:
- UUID itself is 16 bytes, twice the size of bigint’s 8 bytes, so each index page can hold fewer entries
- In real applications, UUIDs are often generated in the application layer using native functions, which narrows the gap further (the time to generate the UUID is then shifted to the application side)
As for whether PostgreSQL should generate UUIDs using native functions, I used to like the idea of “letting the database do everything the database can do,” because it saves effort where possible.
However, once part of the business logic is written inside the database, it becomes harder to notice later. So nowadays I lean more toward generating UUIDs in the application layer.
Test 2: Storage Space (20M rows)
| Option | Heap | Index | Total Size |
|---|---|---|---|
| Serial | 1,149 MB | 428 MB | 1,578 MB |
| UUIDv7 | 1,302 MB | 749 MB | 2,051 MB |
| UUIDv4 | 1,302 MB | 770 MB | 2,072 MB |
The UUID-based heap is 153 MB larger than Serial (because each row’s PK grows from 8 bytes to 16 bytes), and the index is about 1.75 times larger, but in terms of storage space, there is no dramatic difference.
Test 3: Cursor-Based Pagination
The principle of cursor-based pagination is to use the last item of the previous page as the “cursor,” and query the next page with WHERE > cursor ORDER BY ... LIMIT N. Compared with OFFSET, cursor-based pagination maintains stable performance no matter how deep you go.
At a scale of 1.1 million rows, I tested three scenarios, each run 100 times, with the cursor positioned at row 900,000 (simulating deep pagination):
| Query Method | Total Time for 100 Runs |
|---|---|
| Serial + OFFSET 900000 (comparison group) | 7,012 ms |
| Serial + cursor on created_at index | 1.9 ms |
| UUIDv7 + cursor on PK | 1.6 ms |
A few observations:
- With UUIDv7, cursor pagination can use the primary key directly. No extra index is needed. If the Serial approach wants to do time-sorted cursor pagination, you need to use the
created_atindex, which is a secondary index. The query must first scan the index and then fetch the row from the heap (index scan → heap fetch). - UUIDv7 is slightly faster than the
created_atindex. Because UUIDv7’s cursor uses the primary key, whilecreated_atis a secondary index. - OFFSET is a disaster for deep pagination. 7 seconds versus less than 2 milliseconds is a 3,600x difference. OFFSET has to scan and discard all preceding rows. This conclusion is less directly related to this article, but if you’re building pagination, don’t use
OFFSET; it can easily drag down performance.
Summary
The advantages of using UUIDv7 as a primary key are:
- No degradation in write performance: just like auto-increment, it is sequential append and does not get slower as data grows. UUIDv4 was already 7 times slower at 20 million rows, and it only gets worse.
- One index does two jobs: it is both the primary key constraint and the basis for time ordering. You can save the
created_atcolumn and its index, and cursor-based pagination can be handled directly withWHERE id > :last_id ORDER BY id(in practice, you may still need it)
If your query pattern depends heavily on time ordering (for example, feeds, timelines, or order lists), UUIDv7 can make the schema cleaner and reduce the number of indexes.
But if you need creation time accurate to the microsecond, or if created_at has business meaning (for example, time-range queries are required), then you should still keep the created_at column. UUIDv7’s timestamp precision is only milliseconds, and ordering within the same millisecond is determined by randomness, so strict insertion order is not guaranteed.
What About External URLs?
Using UUIDv7 as an internal primary key is fine, but exposing it directly in URLs has a few drawbacks: it is long (36 characters), and the timestamp part still leaks creation time.
Common approaches include:
- Encode the 16-byte UUID with Base62 or Base58, shortening it to 21–22 characters. The underlying data does not change; this is just a shorter representation. Base58 additionally excludes confusing characters like
0/OandI/l, which is friendlier for manual entry - Use a separate external ID: generate another NanoID that is completely unrelated to the internal ID. NanoID is pure randomness, with no time structure. The default 21-character length is about 126 bits of entropy, giving collision resistance on the same level as UUIDv4.
I personally prefer approach 2. The reason is that although Base62/Base58 shortens the string, it is still the same UUID underneath, and anyone who wants to can decode it and recover the timestamp. NanoID is completely decoupled from the internal ID, so even if the external ID is obtained, it does not reveal anything about the database. The trade-off is that you need one more column and index to store the external ID, but this trade-off is usually worth it.
What If You Really Need “Absolutely No Duplicates”?
UUIDs (no matter the version) are only unique on a probabilistic basis — the chance of collision is extremely low, but theoretically not zero. For most systems this is completely sufficient, but some scenarios really do require a 100% guarantee: financial transaction numbers, invoice numbers, medical record IDs, and so on.
In those cases, you need to return to a “centralized” model — one single authoritative source is responsible for issuing IDs. Common approaches include:
- Database sequence + distributed lock. The most direct approach: use PostgreSQL’s
SEQUENCEor Redis’sINCRtogether with distributed locks to ensure global uniqueness. The downside is the risk of a single point of failure, and lock contention can become a bottleneck under high concurrency.
Twitter Snowflake and its variants. As mentioned earlier, Snowflake is also centralized in essence — assigning Worker IDs requires central coordination (ZooKeeper / etcd). Baidu’s UidGenerator and Sony’s Sonyflake follow similar ideas, only with adjustments to bit allocation or Worker ID management.
Database-native distributed IDs. NewSQL databases such as CockroachDB and TiDB have built-in distributed unique ID generation mechanisms, solving the problem at the database layer without requiring extra handling in the application layer.
No matter which option you choose, the essence is trading availability or operational complexity for absolute uniqueness. If your business scenario does not have legal or compliance requirements that mandate uniqueness, the probabilistic uniqueness of UUIDv7 is enough.
Other Small Facts
00000000-0000-0000-0000-000000000000is a valid UUID called the Nil UUID4FFFFFFFF-FFFF-FFFF-FFFF-FFFFFFFFFFFFis a valid UUID called the Max UUID- Don’t write your own UUID generator function
Looking at real incident cases, failures are often not caused by collisions; nine times out of ten, it’s just that the application side wasn’t implemented properly. Sigh—if you take that into account too, maybe the safest choice is still Serial.
Footnotes
-
https://www.rfc-editor.org/rfc/rfc9562#name-uuid-version-4 ↩
-
https://www.rfc-editor.org/rfc/rfc9562#name-uuid-version-7 ↩
-
A single insert may cause multiple pages to be rewritten, significantly exceeding the amount of data written ↩