What if that isn't a bool?

A common way that code grows difficult to reason about is increasing the number of things you must keep in your head simultaneously to understand it. Often, this simply happens by adding one attribute, one variable, one column at a time. Some people are gifted with a great capacity for working memory, but most of us aren’t – having to hold the state of 5 variables in your head simultaneously to understand a piece of code may be pushing it far, according to this article from wikipedia:

Working memory is widely acknowledged as having limited capacity. An early quantification of the capacity limit associated with short-term memory was the “magical number seven” suggested by Miller in 1956.[20] Miller claimed that the information-processing capacity of young adults is around seven elements, referred to as “chunks”, regardless of whether the elements are digits, letters, words, or other units. Later research revealed this number depends on the category of chunks used (e.g., span may be around seven for digits, six for letters, and five for words), and even on features of the chunks within a category.

A common pattern that pushes my working memory, is when I’m looking at an entity, or some database row, where there are lots of booleans being used to make decisions in code. Here are some examples of names that I think are common for these kinds of things: is_canceled, is_draft, is_published, is_paid, is_approved, is_done, is_withdrawn. These also come with variants like canceled_at, published_at, approved_at and so on, where the boolean is implicitly derived from the presence of a timestamp.

Alone, each of these things all added a tiny amount of complexity, of new behaviour, but together, they may create massive truth tables that make it difficult to get validation code correct when adding the next. You may get code that is deeply nested or you may look at different variations of de Morgan’s laws to check if you can simplify difficult conditions.

At least some of the time, the correct reaction to this is to take a step back, and ponder: Is that really a boolean? Maybe you don’t need Article to own is_draft, is_published, is_withdrawn booleans at all? Perhaps, what you’re looking at is a State of some sort. Perhaps the state of Article can be draft, published and withdrawn? One nice thing about this kind of thinking is that it can often make heaps of nonsense states unrepresentable in the database without relying on lots of check-constraints, and if you really need the booleans for some reason, you can derive them.

Maybe Article isn’t even a thing that owns a State that can be draft. What if Draft and Article are separate things entirely, and Article is what becomes of a Draft that you publish()? This may help you develop a better language for talking about important domain objects. It might also let you use the type system prove the absence of some kinds of bugs: Perhaps you can’t cancel() a ShoppingCart() or empty() an Order.

When modeling the database, it could be that that it makes the most sense to let your columns stay timestamps or booleans, maybe it’s a really rough job to refactor it. On the other hand, maybe you have some kind of complicated trigger logic to keep a history table. Maybe it would help to have some kind of primary table for the entity, and a secondary table for state transitions:

create table article(
    id bigint generated by default as identity primary key
    -- ...
);

create table article_state(
    id bigint generated by default as identity primary key,
    article bigint not null references article(id),
    state_changed_at timestamp with time zone default now(),
    state text not null check (state = ANY('{draft, published, withdrawn}'))
    -- ...
);

Now it is easy to get an article together with its latest state:

select distinct on(state.article) state
from article a join article_state state on a.id = state.article
order by state.article, state.id desc limit 1;

Or the entire history of states:

select
  state.state,
  state.state_changed_at as state_start,
  lead(state.state_changed_at)
    over (partition by article order by state.id) as state_end
from article a
  join article_state state on a.id = state.article
where a.id = 1;
   state   |          state_start          |             state_end
-----------+-------------------------------+-------------------------------
 draft     | 2025-01-08 21:58:51.170734+01 | 2025-01-08 22:03:58.995427+01
 published | 2025-01-08 22:03:58.995427+01 | 2025-01-08 22:09:07.225505+01
 withdrawn | 2025-01-08 22:09:07.225505+01 |
(3 rows)

There are lots of times when adding that boolean or timestamp is the right way to go. But there are also lots of good reasons to ask yourself the question: “What if that isn’t a bool?” I’ll be thankful the next time I can add a state transition to the code base instead of trying to create neat and structured code around 2^5 possible states of booleans. To quote von Neumann, there’s a lot of complexity in 5 parameters:

With four parameters I can fit an elephant, and with five I can make him wiggle his trunk.