· 14 min read

Database Primary Keys: AUTO_INCREMENT, UUID, and UUIDv7

# Dev Note
This article was auto-translated from Chinese. Some nuances may be lost in translation.

Should the primary key use auto increment or UUID? Recently, while reading related discussions, I noticed that the answer to this question seems to have gradually converged on UUIDv7. But even if the consensus is settling, the trade-offs and context behind it are still worth understanding properly; otherwise, you’re just following a trend in a different way.

TL;DR

  • UUIDv4 brings almost no benefit
  • If you’re certain it’s a single-database architecture, the ID is not exposed externally, and there will never be a need to merge data across databases in the future, Auto Increment is the simplest choice
  • In all other cases, use UUIDv7

What Is UUID?

UUID (Universally Unique Identifier) is a 128-bit identifier. The standard format looks like this:

550e8400-e29b-41d4-a716-446655440000

It has 36 characters, split by - into 5 parts (8-4-4-4-12). This segmented format comes from the earliest UUIDv1, where each segment corresponded to a different internal field. By v4 and v7, most fields are filled with random values, but the format is preserved for backward compatibility.

There are several UUID versions; the ones discussed in this post are mainly:

  • UUIDv41: 122 bits of pure randomness, with no structure at all. It is currently the most widely used version, but it is not very database-friendly.
  • UUIDv72: the first 48 bits store a millisecond-level timestamp, and the remaining roughly 74 bits store randomness. It balances time ordering and randomness, and is the current recommended choice.

The two look exactly the same; the difference is in their internal structure. You can tell them apart by the first character of the third segment: 4xxx is v4, 7xxx is v7.

UUIDv4: 550e8400-e29b-41d4-a716-446655440000
                      ^
UUIDv7: 018f3a3b-7a5d-7a3b-8b3a-3b7a5d7a3b8b
                      ^

Why Are Primary Keys So Important?

Depending on the use case, primary keys affect write performance, query efficiency, whether the system can scale in the future, and whether you’ll run into landmines during migrations later on. From the benchmark below, the difference can even exceed sevenfold.

Auto Increment vs UUID: What’s the Difference?

Let’s start with the most basic decision point. When choosing a Primary Key, you can think about it from several angles:

  • Insert performance
  • Index performance
  • Search performance

The advantage of Auto Increment is very straightforward: the database automatically increments it for you, it’s an 8-byte integer, compact, fast, and friendly to B-Tree indexes. Because it always appends to the end, INSERT performance is basically the best possible.

But Auto Increment has two fundamental limitations:

1. It Is Not Friendly to Distributed Architectures

In distributed systems, generating a globally unique ID is difficult. This can be solved by implementing a central ID generator to ensure every ID is globally unique, such as Twitter’s Snowflake algorithm.

However, a central ID generator introduces performance bottlenecks and a single point of failure risk. To avoid a single point of failure, multiple workers are often deployed to coordinate in practice, which increases complexity and the components that need to be maintained.

2. Predictability

If you change the number to 1024 or 1022, you might be able to see someone else’s orders (if the implementation is not done properly). Competitors can estimate your user base or transaction volume from how quickly IDs increase. (You can also use this in reverse as a deception tactic.)

UUID (represented here by UUIDv4) solves these two problems: with a 128-bit random space, any node can generate IDs independently, without central coordination, and they are almost impossible to guess.

UUIDv4 is completely random, but when inserted into a B-Tree, it ends up inserting all over pages that are already full, triggering a lot of page splits and significantly reducing write performance. This is why many people feel that “the database got slower” after switching to UUID.

So in practice, the key questions are:

  • Does the system need distributed ID generation?
  • Will the ID be exposed externally (URL, API response)?
  • How high are the requirements for database write performance?

If your system is a monolith and the ID is not exposed externally, auto increment is perfectly fine. But as soon as distributed generation is involved, or the ID appears in URLs, a UUID-based solution is almost necessary.

What About Collisions?

This is probably a question that comes up every time UUID is discussed.

UUIDv4 has a 122-bit random space, with about 5.3 × 1036 possible combinations.

To reach a 50% collision probability, you would need to generate about 2.7 × 1018 UUIDs — roughly the scale of “producing 1 billion UUIDs per second worldwide for 85 continuous years.”

UUIDv7’s random space “looks like” it shrinks from 122 bits to about 74 bits, but collisions can only happen within the same millisecond. In other words, only UUIDs generated within the same millisecond need the 74-bit random part to avoid collisions, and 74 bits gives about 1.8 × 1022 possible combinations — more than enough for a single millisecond.

UUIDv7 vs created_at + index: Real Benchmark Results

The most attractive feature of UUIDv7 is this: the first 48 bits are a millisecond-level Unix timestamp, so it is naturally sorted by time.

UUIDv7 (128 bits)
┌──────────────────────┬────┬──────────────────────────────┐
│  Unix Timestamp (ms) │ ver│         Random               │
│      48 bits         │ 4b │        ~74 bits              │
└──────────────────────┴────┴──────────────────────────────┘
   ↑ time-sortable      ↑v7   ↑ randomness prevents collisions

If you use UUIDv7 as the primary key, ORDER BY id is effectively the same as ORDER BY created_at. You no longer need an extra created_at column and corresponding index for time sorting.

That sounds great, but how much better is it really? I ran a PostgreSQL 17 instance in Docker and actually tested it with 20 million rows.

Test Setup

Three tables, same data structure, differing only in primary key strategy:

-- Approach A: UUIDv7 primary key (no extra created_at needed)
CREATE TABLE orders_uuidv7 (
  id uuid PRIMARY KEY DEFAULT uuid_generate_v7(),
  user_id int NOT NULL,
  amount numeric(10,2) NOT NULL,
  status text NOT NULL
);

-- Approach B: Serial primary key + created_at index
CREATE TABLE orders_serial (
  id bigserial PRIMARY KEY,
  user_id int NOT NULL,
  amount numeric(10,2) NOT NULL,
  status text NOT NULL,
  created_at timestamptz NOT NULL DEFAULT clock_timestamp()
);
CREATE INDEX idx_orders_serial_created_at ON orders_serial (created_at);

-- Approach C: UUIDv4 primary key (comparison group)
CREATE TABLE orders_uuidv4 (
  id uuid PRIMARY KEY DEFAULT gen_random_uuid(),
  user_id int NOT NULL,
  amount numeric(10,2) NOT NULL,
  status text NOT NULL
);

Test environment:

  • Mac Mini M4
  • Docker PostgreSQL 17
  • shared_buffers = 256MB

Each round inserted 1 million rows, for 20 rounds total, reaching 20 million rows. UUIDs were generated on the fly during each round (not pre-generated), so the cost of the generation function is included.

Test 1: INSERT Performance Trend (per 1 million rows, unit: ms)

Cumulative RowsSerialUUIDv7UUIDv4
1M1,1323,1852,998
5M1,4742,6612,972
10M1,1652,5606,497
13M1,1232,7459,668
15M1,4112,6309,188
16M1,1332,55215,857
20M1,0172,53920,845

The important thing is not the absolute numbers, but the trend.

UUIDv4 jumped from 3 seconds all the way up to 21 seconds.

Because the larger the B-Tree index gets, the higher the chance that a random insert will hit a page that has already been flushed to disk. Each INSERT may then trigger: reading the old page back from disk into the buffer pool → page split → writing back to disk.

For each round of 1 million inserted rows, the WAL volume directly reflects the database’s actual disk write burden:

Cumulative RowsSerial WALUUIDv7 WALUUIDv4 WAL
1M146 MB171 MB744 MB
5M147 MB171 MB1,227 MB
8M148 MB171 MB1,715 MB
10M147 MB171 MB2,418 MB

Serial and UUIDv7 WAL volume stayed constant from start to finish (~147 vs ~171 MB). The difference between them (24 MB) is simply the size difference between UUID’s 16 bytes and bigint’s 8 bytes.

But UUIDv4’s WAL ballooned from 744 MB to 2,418 MB — for the same 1 million-row insert, by round 10 the WAL write volume was already 14 times that of UUIDv7.

This is high write amplification3.

UUIDv7 stayed stable at 2.5–2.8 seconds throughout, with no degradation at all.

Because its time-ordered nature means new keys are almost always appended to the rightmost leaf page of the B-Tree, just like auto-increment — the database can write sequentially, with very few page splits.

Serial stayed stable at 1.0–1.5 seconds.

Serial is unquestionably the fastest, but the difference between UUIDv7 and Serial mainly comes from:

  1. UUID itself is 16 bytes, twice the size of bigint’s 8 bytes, so each index page can hold fewer entries
  2. In real applications, UUIDs are often generated in the application layer using native functions, which narrows the gap further (the time to generate the UUID is then shifted to the application side)

As for whether PostgreSQL should generate UUIDs using native functions, I used to like the idea of “letting the database do everything the database can do,” because it saves effort where possible.

However, once part of the business logic is written inside the database, it becomes harder to notice later. So nowadays I lean more toward generating UUIDs in the application layer.

Test 2: Storage Space (20M rows)

OptionHeapIndexTotal Size
Serial1,149 MB428 MB1,578 MB
UUIDv71,302 MB749 MB2,051 MB
UUIDv41,302 MB770 MB2,072 MB

The UUID-based heap is 153 MB larger than Serial (because each row’s PK grows from 8 bytes to 16 bytes), and the index is about 1.75 times larger, but in terms of storage space, there is no dramatic difference.

Test 3: Cursor-Based Pagination

The principle of cursor-based pagination is to use the last item of the previous page as the “cursor,” and query the next page with WHERE > cursor ORDER BY ... LIMIT N. Compared with OFFSET, cursor-based pagination maintains stable performance no matter how deep you go.

At a scale of 1.1 million rows, I tested three scenarios, each run 100 times, with the cursor positioned at row 900,000 (simulating deep pagination):

Query MethodTotal Time for 100 Runs
Serial + OFFSET 900000 (comparison group)7,012 ms
Serial + cursor on created_at index1.9 ms
UUIDv7 + cursor on PK1.6 ms

A few observations:

  1. With UUIDv7, cursor pagination can use the primary key directly. No extra index is needed. If the Serial approach wants to do time-sorted cursor pagination, you need to use the created_at index, which is a secondary index. The query must first scan the index and then fetch the row from the heap (index scan → heap fetch).
  2. UUIDv7 is slightly faster than the created_at index. Because UUIDv7’s cursor uses the primary key, while created_at is a secondary index.
  3. OFFSET is a disaster for deep pagination. 7 seconds versus less than 2 milliseconds is a 3,600x difference. OFFSET has to scan and discard all preceding rows. This conclusion is less directly related to this article, but if you’re building pagination, don’t use OFFSET; it can easily drag down performance.

Summary

The advantages of using UUIDv7 as a primary key are:

  1. No degradation in write performance: just like auto-increment, it is sequential append and does not get slower as data grows. UUIDv4 was already 7 times slower at 20 million rows, and it only gets worse.
  2. One index does two jobs: it is both the primary key constraint and the basis for time ordering. You can save the created_at column and its index, and cursor-based pagination can be handled directly with WHERE id > :last_id ORDER BY id (in practice, you may still need it)

If your query pattern depends heavily on time ordering (for example, feeds, timelines, or order lists), UUIDv7 can make the schema cleaner and reduce the number of indexes.

But if you need creation time accurate to the microsecond, or if created_at has business meaning (for example, time-range queries are required), then you should still keep the created_at column. UUIDv7’s timestamp precision is only milliseconds, and ordering within the same millisecond is determined by randomness, so strict insertion order is not guaranteed.

What About External URLs?

Using UUIDv7 as an internal primary key is fine, but exposing it directly in URLs has a few drawbacks: it is long (36 characters), and the timestamp part still leaks creation time.

Common approaches include:

  1. Encode the 16-byte UUID with Base62 or Base58, shortening it to 21–22 characters. The underlying data does not change; this is just a shorter representation. Base58 additionally excludes confusing characters like 0/O and I/l, which is friendlier for manual entry
  2. Use a separate external ID: generate another NanoID that is completely unrelated to the internal ID. NanoID is pure randomness, with no time structure. The default 21-character length is about 126 bits of entropy, giving collision resistance on the same level as UUIDv4.

I personally prefer approach 2. The reason is that although Base62/Base58 shortens the string, it is still the same UUID underneath, and anyone who wants to can decode it and recover the timestamp. NanoID is completely decoupled from the internal ID, so even if the external ID is obtained, it does not reveal anything about the database. The trade-off is that you need one more column and index to store the external ID, but this trade-off is usually worth it.

What If You Really Need “Absolutely No Duplicates”?

UUIDs (no matter the version) are only unique on a probabilistic basis — the chance of collision is extremely low, but theoretically not zero. For most systems this is completely sufficient, but some scenarios really do require a 100% guarantee: financial transaction numbers, invoice numbers, medical record IDs, and so on.

In those cases, you need to return to a “centralized” model — one single authoritative source is responsible for issuing IDs. Common approaches include:

  • Database sequence + distributed lock. The most direct approach: use PostgreSQL’s SEQUENCE or Redis’s INCR together with distributed locks to ensure global uniqueness. The downside is the risk of a single point of failure, and lock contention can become a bottleneck under high concurrency.

Twitter Snowflake and its variants. As mentioned earlier, Snowflake is also centralized in essence — assigning Worker IDs requires central coordination (ZooKeeper / etcd). Baidu’s UidGenerator and Sony’s Sonyflake follow similar ideas, only with adjustments to bit allocation or Worker ID management.

Database-native distributed IDs. NewSQL databases such as CockroachDB and TiDB have built-in distributed unique ID generation mechanisms, solving the problem at the database layer without requiring extra handling in the application layer.

No matter which option you choose, the essence is trading availability or operational complexity for absolute uniqueness. If your business scenario does not have legal or compliance requirements that mandate uniqueness, the probabilistic uniqueness of UUIDv7 is enough.

Other Small Facts

  • 00000000-0000-0000-0000-000000000000 is a valid UUID called the Nil UUID4
  • FFFFFFFF-FFFF-FFFF-FFFF-FFFFFFFFFFFF is a valid UUID called the Max UUID
  • Don’t write your own UUID generator function

Looking at real incident cases, failures are often not caused by collisions; nine times out of ten, it’s just that the application side wasn’t implemented properly. Sigh—if you take that into account too, maybe the safest choice is still Serial.

Footnotes

  1. https://www.rfc-editor.org/rfc/rfc9562#name-uuid-version-4

  2. https://www.rfc-editor.org/rfc/rfc9562#name-uuid-version-7

  3. A single insert may cause multiple pages to be rewritten, significantly exceeding the amount of data written

  4. https://www.rfc-editor.org/rfc/rfc9562.html#name-nil-uuid