[Haskell-cafe] Property checks and PostgreSQL?

Sat Mar 13 11:39:40 UTC 2021

Note: I posted this [elsewhere] — I hope it is permissible to also post
here. There seems to be no single right venue for Haskell questions at this
time.

[elsewhere]: https://discourse.haskell.org/t/property-checks-and-postgresql/2078

So, I have a three layer stack: PostgreSQL data base, Haskell back end,
PureScript front end. And there is a fault somewhere in this stack: sometimes
some items present in the data base do not make it to the front end. A small
fortune in programmer hours has been spent searching for the fault but there is
no progress. No one can figure out even roughly where the fault is. Our last
hope is to property check the whole thing, from end to end, verifying that any
things put into the data base can be retrieved by the front end.

We have in place a fixture that creates a new PostgreSQL cluster, runs the back
end on it and makes some scripted HTTP requests. We use it to run unit tests,
such like _«if I make this `PUT` request and then that `GET` request to the same
end point, I should get back the same thing as I put in»_. In principle it would
not be a problem to make a property check out of this property. Practically,
tearing down the cluster, building a new pristine one and initializing the data
base anew takes seconds, so this way even a trivial property check would take
minutes to run.

Theoretically, we can carefully reset the data base back to the initial state
after each run of the property check. Given that only a small amount of data is
modified, it would take imperceptibly little time. But it is hard to know what
exactly should be reset, and it is easy to get it wrong if done by hand!

One trick we do use is SQL transactions. We take a single connexion, start a
transaction, perform a check and then roll the transaction back. Unfortunately,
even this is not completely safe: it looks as though sequences used to draw
primary keys from are not reset to their previous values! _(Or maybe we are
doing something wrong — I am not really a PostgreSQL guru so I am not too sure
of myself.)_ But even ignoring this problem _(it is irrelevant to most checks)_,
there is another, more severe problem: transactions guarantee that uncommitted
data is only visible inside the same connexion. So, there is no way to request
it from the HTTP API. This trick is suitable for fuzzing the data base – back
end layer, but nothing beyond that.

Another thing I heard being done is to selectively drop and re-create exactly
the tables affected by a run of a property check. This seems challenging to
automate though. How can I know in a general way if a table has been touched?
And then, how can I re-create exactly the affected tables given that the data
base initialization code is an opaque blob?

I wonder if this problem is already solved. In any case, any advice is welcome!