8 Jul 2018 • SQL

Soft-Delete Pattern In Postgres

You’re building a application on a Postgres database. At some point, you’ll be faced with the question:

How do I allow my users to delete records while maintaining historical data?

The most widely recommended solutions to this problem often involve adding some form of an is_deleted column, and each query filters out rows where is_deleted = true. There are a number of reasons why I don’t prefer this method, most of which are detailed in the excellent post: Soft-deletes are bad, m’kay?. So, if not an is_deleted column, then what?

After trying a number of alternatives, I finally settled on a solution that I think provides the most clarity and flexibility.

Setup

Let’s start with a simple example: a posts table:

CREATE TABLE posts(
   id SERIAL PRIMARY KEY NOT NULL,
   title VARCHAR(256) NOT NULL,
   body TEXT NOT NULL
);

Then, we’ll create an additional posts table under a new deleted schema:

CREATE SCHEMA deleted;
CREATE TABLE deleted.posts (
  deleted_at TIMESTAMP WITHOUT TIME ZONE DEFAULT NOW(),
  LIKE posts INCLUDING ALL
);

The LIKE clause copies all column definitions of the original table to the newly created table. We’ve also added a deleted_at column to track when posts are deleted. Keep in mind that the LIKE clause will not copy any foreign key constraints or triggers.

Finally, we’ll create a view to query records from both posts and deleted.posts tables. We’ll also keep this under a separate schema called combined.

CREATE SCHEMA combined;
CREATE VIEW combined.posts AS
  SELECT null AS deleted_at, * FROM posts
  UNION ALL
  SELECT * FROM deleted.posts;

Usage

Let’s add two rows to our posts table.

SELECT * FROM posts;

 id |     title      |        body
----+----------------+---------------------
  1 | My First Post  | Yay, my first post!
  2 | My Second Post | Woot, on a roll!

Deleting a post is as simple as copying the row from the posts table to the deleted_posts table and then deleting the original:

INSERT INTO deleted.posts 
  SELECT NOW() AS deleted_at, * FROM posts
  WHERE posts.id = 2;
  
DELETE FROM posts
  WHERE posts.id = 2;

The posts table now contains only the single active post:

SELECT * FROM posts;

 id |     title      |        body
----+----------------+---------------------
  1 | My First Post  | Yay, my first post!

And the deleted_posts table now contains the deleted post:

SELECT * FROM deleted.posts;

     deleted_at      | id |     title       |        body
---------------------+----+-----------------+---------------------
 2018-01-01 00:00:00 |  2 | My Second Post  | Woot, on a roll!

Queries that require both active and deleted posts simply select from the combined_posts view.

 SELECT * FROM combined.posts

     deleted_at      | id |     title      |        body
---------------------+----+----------------+---------------------
                     |  1 | My First Post  | Yay, my first post!
 2018-01-01 00:00:00 |  2 | My Second Post | Woot, on a roll!

The main benefit of using separate schemas comes when you want to reuse your existing join queries across both active and deleted records. Since the table names are all identical, you just need to switch name of the schema your operating on.

For Ecto users, it’s as simple as adding the prefix option:

Repo.all(some_complex_join_query, prefix: "combined")

Caveats

The main thing to keep in mind is that changes to columns on the posts table need to also be made to the deleted.posts table. And in most cases, it will also require dropping and re-created the combined.posts view. This can be a bit of a pain.

When creating your deleted tables, make sure to list the deleted_at column first. Otherwise, the column names wont align when you decide to add new columns in the future.

Any sufficiently complex application will likely involve deleting multiple related records simultaneously (ie: deleting a post will also need to delete comments, categories, ect.), so it’s important to keep all that logic well encapsulated.

Conclusion

There are many ways to solve the soft-delete problem. While the solution outlined here certainly involves more setup time, I think it’s ultimatly the most maintainable one I’ve seen. Please let me know in the comments if you’ve had better luck with other options.