A Soft Deletion Story
Part 1: It can be harder than it sounds
To start, what is soft deletion? It’s just adding some data to a database record (like a flag or a timestamp) to mark something as “deleted”, so it remains in the database but is no longer shown. There are a few reasons to do this, but the big one is making the record easy to restore (just remove that mark-as-deleted data).
This is a well-known and -established pattern, and there are several packages to easily (or even automatically) handle this — or at least that’s what I’m used to from the Rails and ActiveRecord world.
Wait, let me clear something up right now. The manner of handling soft deletion can be contentious, mostly related to the question of just how automatically it should be dealt with. For that matter, the manner of handling deletion at all can be contentious. Actually, probably the manner of handling anything database-related can be contentious. But that’s not what this blog post is about. For now, we’re talking about soft deletion. It’s happening.
So, ActiveRecord. It has scopes, and a lot of things go through associations. That makes this pretty straightforward. It also has default scopes that let you do things automatically, which can be a blessing and a curse. There are really two main packages for this: paranoia (which works on the “automatic” principle), and discard (which wants you to be more explicit).
I’m currently on a project that’s in Elixir. I’ve been learning plenty about Elixir vs. Ruby, and Phoenix vs. Rails, and Ecto vs. ActiveRecord. On top of that, since this is my first significant Elixir project, I’m not entirely sure if anything I come across is project-specific, or if it’s a general framework thing. For instance, Ecto queries in this project are all specifying joins rather than using associations.
So when I looked for soft-deletion packages for Ecto and didn’t find anything compelling, I took it mostly in stride but determined I should do a little more searching. Then I found this blog post and was intrigued by the idea of simply using views. It seemed like it would do a lot of good things without much drawback.
It was so stupidly simple, and what could go wrong?
- Create a view that’s defined as
select * from the_table where is_deleted = false;
- Change the module to use the view instead of the table
- Profit
What could go wrong, indeed. At this point, let’s talk about something more specific and introduce some concrete concepts. Let’s say we have Places and Activities, and of course a Place can have multiple Activities and an Activity can be at multiple Places. This is what this would normally look like.
defmodule SoftDelete.Repo.Migrations.CreateTables do
use Ecto.Migration
def change do
create table(:places) do
add :name, :string, null: false
add :is_deleted, :boolean, null: false, default: false
end
create table(:activities) do
add :name, :string, null: false
add :type, :string
add :is_deleted, :boolean, null: false, default: false
end
create unique_index(:activities, [:name])
create table(:place_activities) do
add :place_id, references(:places), null: false
add :activity_id, references(:activities), null: false
end
create unique_index(:place_activities, [:place_id, :activity_id])
end
end
defmodule SoftDelete.Place do
use Ecto.Schema
alias SoftDelete.PlaceActivity
alias SoftDelete.Activity
schema "places" do
field :name, :string
field :is_deleted, :boolean
many_to_many :activities, Activity, join_through: PlaceActivity
end
end
defmodule SoftDelete.Activity do
use Ecto.Schema
alias SoftDelete.PlaceActivity
alias SoftDelete.Place
schema "activities" do
field :name, :string
field :type, :string
field :is_deleted, :boolean
many_to_many :places, Place, join_through: PlaceActivity
end
end
defmodule SoftDelete.PlaceActivity do
use Ecto.Schema
import Ecto.Query
alias SoftDelete.Place
alias SoftDelete.Activity
schema "place_activities" do
belongs_to :place, Place
belongs_to :activity, Activity
end
end
So now it’s time to go a step further and add some views.
defmodule SoftDelete.Repo.Migrations.AddSingleModelViews do
use Ecto.Migration
def up
execute """
CREATE VIEW active_places AS
SELECT * FROM places WHERE is_deleted = false;
"""
execute """
CREATE VIEW active_activities AS
SELECT * FROM activities WHERE is_deleted = false;
"""
create_if_not_exists index(:places, [:is_deleted])
create_if_not_exists index(:activities, [:is_deleted])
end
def down do
execute "DROP VIEW active_places;"
execute "DROP VIEW active_activities;"
end
end
And what needs to be done to the models? Literally just change the schema
lines to use active_places
and active_activities
, respectively. That’s all.
This works pretty easily, in many ways. Plain queries are just working!
Place |> Repo.all()
Place |> preload(:activities) |> Repo.all()
That seems like perfection, but issues come up kind of quickly. What if you don’t want to load all the activities, but just the number of activities? That’s also nice because it can be done with a single query instead of two.
Place
|> join(:left, [p], pa in PlaceActivity, on: pa.place_id == p.id)
|> group_by([p], p.id)
|> select([p, pa], %{id: p.id, activity_count: count(pa.id)})
|> Repo.all()
However, marking an activity as deleted doesn’t change the count. That can be fixed with another join and checking the flag, but that’s another thing to keep in mind whenever writing a query. What about another view?
defmodule SoftDelete.Repo.Migrations.AddJoinModelView do
use Ecto.Migration
def up do
execute """
CREATE VIEW active_place_activities AS
SELECT pa.*
FROM place_activities AS pa
JOIN active_places AS ap ON pa.place_id = ap.id
JOIN active_activities AS aa ON pa.activity_id = aa.id;
"""
end
def down do
execute "DROP VIEW active_place_activities;"
end
end
And you know what happened with the model here, right? Yup, it’s schema “active_place_activities”
.
This view is a little more complicated, but now we have some nice automatic stuff. It’s working nicely.
But I’m not enjoying getting a simple Map as a result. I’d rather get the real thing and maybe be able to preload the activities, so that’s easy: just add this count as a virtual attribute and use the struct-update syntax.
Place
|> join(:left, [p], pa in PlaceActivity, on: pa.place_id == p.id)
|> group_by([p], p.id)
|> select([p, pa], %{p | activity_count: count(pa.id)})
|> Repo.all()
Wait, what’s this?
** (Postgrex.Error) ERROR 42803 (grouping_error) column "a0.name" must appear in the GROUP BY clause or be used in an aggregate function
query: SELECT a0."id", a0."name", a0."is_deleted", count(a1."id") FROM "active_places" AS a0 LEFT OUTER JOIN "active_place_activities" AS a1 ON a1."place_id" = a0."id" GROUP BY a0."id"
Okay, this sucks. I guess I’ll add the grouping for now. And it works, but something seems fishy here. To check it out, I go directly to the database to see what’s happening.
soft_delete_dev=# SELECT a0."id", a0."name", a0."is_deleted", count(a1."id") FROM "active_places" AS a0 LEFT OUTER JOIN "active_place_activities" AS a1 ON a1."place_id" = a0."id" GROUP BY a0."id";
ERROR: column "a0.name" must appear in the GROUP BY clause or be used in an aggregate function
LINE 1: SELECT a0."id", a0."name", a0."is_deleted", count(a1."id") F...
^
soft_delete_dev=# SELECT a0."id", a0."name", a0."is_deleted", count(a1."id") FROM "places" AS a0 LEFT OUTER JOIN "active_place_activities" AS a1 ON a1."place_id" = a0."id" GROUP BY a0."id";
id | name | is_deleted | count
----+-------------+------------+-------
1 | test one | f | 1
3 | test two | f | 1
5 | other place | f | 0
4 | some place | f | 1
(4 rows)
This is just the beginning of how views are troublesome. I’m not (currently) worried about the performance of the queries — how they’re automatically including where clauses and joins. What got to me is how quickly this became confusing and unhelpful.
When operating on a table, Postgres realizes that the primary key is special, and grouping by that column means only a single row will be returned from that table, and the rest of the columns can be used as-is. Postgres doesn’t know this about a view.
Interestingly, Postgres understands this sort of “simple” (single-table) view and allows writes to it — inserts, updates, deletes — automatically operating on the underlying table. But just like it doesn’t know about the primary key, it doesn’t know about unique indexes. So there are situations where an INSERT … ON CONFLICT UPDATE
won’t work on the view, but needs to be using the table itself. That can be dealt with by creating new copies of these models and setting the schema
to the table instead of the view. And having a way to operate on the tables themselves is useful for things like a superuser or admin section of the app, where you may want to show marked-as-deleted records and restore (or actually delete) them.
Also, even though this is a view defined as SELECT * FROM tablename
, the structure is set on creation. If you add a column to the underlying table, you have to recreate the view (which you can do in place). If you want to remove a column from the table, you have to first drop the view and then create it again. And don’t get me started on the join-table view. You don’t want to see the triggers I had to set up to allow writes there.
So that wraps up the problem statement. Read on for part two of this story, where I share the solution I ended up with.
If you’re looking for a team to help you discover the right thing to build and help you build it, get in touch.
Published on February 6, 2024