The Evolution of Primary Keys in PostgreSQL: From Serial to Identity
Welcome back. This is the following post on Mastering Postgres series. If you skipped earlier posts, I recommend checking them out where we explored topics, such as Check Constraint, Custom Data Types, Enums, etc. Today, we will discuss Serial and Identity in PostgreSQL.
The Evolution of Primary Keys in PostgreSQL: From Serial to Identity
In PostgreSQL, the Serial data type has long been the go-to option for generating auto-incrementing values for primary keys. It’s simple, effective, and widely used in many projects. However, as of PostgreSQL 10, this approach has been superseded by a more robust and flexible method: Identity columns.
Surprised 😅 ? So was I! For years, I also relied on Serial as the default choice for primary keys, but the PostgreSQL team has made improvements that warrant a shift in how we handle auto-incrementing keys.
In this post, we’ll take a deeper dive into:
- Serial and its evolution
- Identity columns: The new recommended way to handle auto-incrementing values
Bonus Section:
- Generated Columns: A powerful way to create columns derived from other data in your table.
By the end of this post, you’ll not only understand why we should consider moving away from Serial, but also how to implement safer and more efficient primary key strategies and explore the power of Generated Columns.
Serial Type: The Old Standard
The Serial type has been a PostgreSQL stable for auto-incrementing values, especially when used as a primary key. At its core, Serial isn’t actually a data type — it’s an alias for an INTEGER
column with a default value set by a sequence. Let’s break this down:
Common Use Case for SERIAL
:
CREATE TABLE users (
id SERIAL PRIMARY KEY,
name TEXT
);
When you declare a column with SERIAL
, PostgreSQL automatically creates a sequence to manage the auto-incrementing behavior behind the scenes. If we unwrap this, it would look something like this:
CREATE SEQUENCE users_id_seq AS INTEGER;
CREATE TABLE users (
id INTEGER NOT NULL DEFAULT nextval('users_id_seq'),
name TEXT
);ALTER SEQUENCE users_id_seq OWNED BY users.id;
Auto-increment is provided by sequence that is created automatically when we declare Serial or as in this case using users_id_seq.
Sequence is created by the table and it exists outside the scope of the table, however that sequence is tied to the column.
Hovewer, keeping using SERIAL
as PRIMARY KEY
is still fine and acceptable though a better approach exists.
One key moment to note about Serial being used as Primary Key — to use BigSerial instead of Serial. Under the hood, BigInteger will be used to store the value and it creates enough space in order to avoid running out of Primary Key space in the future in case of massive success.
If too many rows inserted, the space is ran out of and it will create much trouble.
The New Standard: Identity Columns
Starting with PostgreSQL 10, Identity columns were introduced as a more reliable and flexible alternative to Serial. Identity columns behave similarly to Serial, but with better guarantees about sequence management and more control over how the values are generated.
Here’s how to define an auto-incrementing Primary Key using Identity:
Using Identity for Primary Keys
CREATE TABLE users
(
id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
name TEXT
);
With this approach:
- The
id
column uses aBIGINT
to ensure that the primary key values can grow without running into issues like overflow. - The
GENERATED ALWAYS
clause ensures that values are automatically generated, and you can’t insert values manually into this column without special handling.
Inserting values
INSERT INTO users(name)
VALUES ('Bruce'),
('Wayne'),
('Phil');
SELECT * FROM users;
-- OUTPUT
+--+-----+
|id|name |
+--+-----+
|1 |Bruce|
|2 |Wayne|
|3 |Phil |
+--+-----+
Attempting to Insert an Explicit ID
If you try to manually insert a value into the id
column (which is generated automatically), PostgreSQL will block it:
INSERT INTO users(id, name)
VALUES (4, 'Baggio');
-- ERROR: cannot insert a non-DEFAULT value into column "id"
-- Detail: Column "id" is an identity column defined as GENERATED ALWAYS.
-- Hint: Use OVERRIDING SYSTEM VALUE to override.
This is a protective measure to keep the sequence in sync with the table data. If you absolutely need to insert a specific value, PostgreSQL provides a way to override the default sequence behavior, but it’s not recommended for regular use.
INSERT INTO users(id, name)
OVERRIDING SYSTEM VALUE
VALUES (4, 'Baggio');
Restoring Sync After Manual Insertion
If you insert a value manually and break the sequence, PostgreSQL won’t automatically fix it for you. You’ll need to manually reset the sequence so it continues from the correct number. This can be done like so:
SELECT setval('users_id_seq', (SELECT MAX(id) FROM users));
-- Resetting the sequence 'users_id_seq' to the current maximum value in the 'id' column
-- This ensures that the next generated value for 'id' will continue from the highest existing value.
This ensures the sequence is properly aligned with the current state of the table.
Alternative: BY DEFAULT
Identity
If you want the flexibility to manually insert values (like the Serial
approach) but still use identity for auto-increment when no value is specified, you can use GENERATED BY DEFAULT
:
CREATE TABLE users (
id BIGINT GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY,
name TEXT
);
This allows you to insert values explicitly while also letting PostgreSQL handle auto-incrementation when no value is provided. However, as mentioned earlier, this can lead to sequence misalignment, so it’s generally better to avoid this unless absolutely necessary.
In summary, as you can see IDENTITY ensures more safety and strictness on inserting rows and prevents from accidentally inserting a row into table
unless you explicitly specify OVERRIDING SYSTEM VALUE. That’s definitely what you would not do accidentally.
Another interesting topic to discuss — Generated Columns
Generated Columns: Dynamic Data on the Fly
Generated columns are a feature that allows you to create columns whose values are derived from expressions or functions based on other columns in the same row. These columns are read-only, meaning you can’t insert or update values directly in them.
Postgres supports Stored Generated Columns, meaning, the column is written to disk.
For example, let’s say we have a persons
table and we want to store both height in centimeters and height in inches:
Example: Generated Column for Height Conversion
CREATE TABLE persons (
height_cm INTEGER,
height_inch INTEGER GENERATED ALWAYS AS (height_cm / 2.54) STORED
);
Here, the height_inch
column is automatically generated based on the height_cm
column value. You can't manually insert data into the height_inch
column; it’s calculated dynamically when you insert or update a record.
INSERT INTO persons(height_cm)
VALUES (170), (179);
SELECT *
FROM persons;
-- OUTPUT
+---------+-----------+
|height_cm|height_inch|
+---------+-----------+
|170 |67 |
|179 |70 |
+---------+-----------+
If you try to insert height_inch
manually, you will get error.
INSERT INTO persons(height_cm, height_inch)
values (170, 67);
-- ERROR: cannot insert a non-DEFAULT value into column "height_inch"
-- Detail: Column "height_inch" is a generated column.
Example: Extracting Data from Strings
Another common use case for generated columns is extracting or manipulating data from other columns. For example, in an accounts
table, you might want to extract the email domain into a separate column:
CREATE TABLE accounts (
email TEXT,
email_domain TEXT GENERATED ALWAYS AS (split_part(email, '@', 2)) STORED
);
INSERT INTO accounts(email)
VALUES ('bva@yandex.ru'), ('baggio@gmail.com');
SELECT * FROM accounts;
Output:
| email | email_domain |
|---------------------|--------------|
| bva@yandex.ru | yandex.ru |
| baggio@gmail.com | gmail.com |
This is a great way to derive values without needing to store redundant data or constantly recalculate values in your application code.
Conclusion: Evolving with PostgreSQL
While Serial was a useful tool in earlier versions of PostgreSQL, the introduction of Identity columns provides a more robust and flexible solution for generating auto-incrementing values. Whether you choose Generated Always or Generated By Default, Identity columns ensure better synchronization and consistency with the database.
Additionally, Generated Columns allow you to create derived data without extra application logic, making your database smarter and more efficient.
By embracing these newer features, you’ll ensure that your PostgreSQL schemas are future-proof, cleaner, and more reliable.
Final Thoughts:
- Serial still works but is now considered a legacy approach.
- Identity offers better control and avoids some of the pitfalls of Serial.
- Generated Columns provide a powerful way to transform data on the fly without storing redundant information.
Previous Posts:
If you find this post useful, like and subscribe to stay tuned for future posts. Happy Coding!