Righting Integration Testing's Wrongs with PGLite
Best explained with an example from The Fairly OddParents.
The engineering team at Numeric is continuously experimenting with ways to increase our development velocity and the overall quality of our codebase. One recent such experimentation was in the sometimes unwieldy arena of integration testing. It's proven successful enough to enshrine in a blog post.
We'll walk through our development philosophies concerning the why and the when to integration test, briefly discuss where we write them, and present the method of how to integration test that we've developed. We'll wrap up with a conversation on key considerations that must be kept in mind when following our testing methodology, and a deeper presentation of the test fixture we've developed for integration testing Postgres systems. It's all powered by the the embeddable Postgres implementation PGLite; these experiments have been a great way to learn and interact with the tool, and we're impressed!
Why
We consider a rich automated testing suite a key component of our efforts to build developer confidence. With increased confidence, we can attain greater speed, free of any hesitation to kick off a scary deploy. Engineers should feel empowered to merge code, fire off a deploy, and move on to the next PR. Good automated tests also provide us increased confidence in code quality; they codify what expectations we have for our software's behavior, guarding us from regressions and adding additional layers of validation for new feature releases.
Integration tests are one pillar of such a confidence-boosting test suite. For our purposes, they are differentiated from unit tests in that they aim to verify the correctness of full systems, rather than isolated functions or components. In particular, they test systems in which multiple components interact to produce results. Complexity explodes at these points of interaction, so one hundred percent verification is not the goal. Instead, through a solid set of integration tests, we aim for coverage of the invariants we expect to always hold.
No presentation of integration tests is complete, of course, without consideration of the canonical aches, fears, and worries their mention can trigger in a seasoned engineer. Because of their very nature, they can be difficult to write, expensive to operate, slow to run, hard to maintain, and prone to flakiness and test pollution. We hope that the demonstration that follows will show that this state of operation is not a fait accompli; it's very, very possible to introduce integration testing into a codebase without incurring these pains.
When
Our canonical target for integration test coverage is a data in/data out system, which you could argue, is every computing system… anyway, many times this is a REST API: the data in (HTTP request) triggers some operation (application logic) that produces data out (data in a database, data in the response, etc). To produce the data, the API may interact with multiple components: data stores, cache clients, data transformers, domain calculations, etc. Even more of a clear match, data ETL or ELT processes pull data from some source store and dump it into some destination.
So, when we decide whether to integration test, we look at our code and ask ourselves: What is the data in? What is the data out? Once identified, you can probably integration test. With the right fixtures in place for preparing the data in and inspecting the data out, of course.
Where
We write our integration tests like any other vitest script. We place tests in a file entitled <component>.integration.test.ts
, where component
is the target system under test; for example, accounts-data-transform.integration.test.ts
would hold integration tests for some accounts-data-transform
system. It should live next to the entrypoint of the system under test.
CI runs are triggered on PRs, where integration tests and unit tests execute in parallel via separate vitest workspaces. This allows us to isolate the generally quicker unit tests from the generally slower integration tests.
How
So we've created an integration test file for our target system. Nice. What now?
We have the plumbing set up today to support integration testing anything that writes result data to Postgres, the main database in our tech stack. Systems that return data via API responses should also theoretically be integration testable, but we haven't yet written those. So, we'll walk through the first scenario, and leave writing an integration test for the latter as an exercise for the reader.
Our goal is to verify the correctness of a simple ETL pipeline that pulls data about ticket sales at the world's finest, fictitious sporting venue, the Dimmsdale Dimmadome , from the equally fictitious "DimmaAPI" (special thanks to Doug Dimmadome for access to this incredibly useful dataset). Once data is extracted, the pipeline applies some calculations on it, and ultimately dumps the results into tables in our internal Postgres database.
We test such Postgres sink systems following a simple pattern: mock data with in memory providers (also known as fakes) and an in memory DB client, run the system under test, then verify what data was written out via the same in memory DB client. In detail:
Source data is provided through the creation of in memory data providers. Naturally, this requires some foresight at the start of development (or after the fact with some refactoring) to separate the source of data from the code that acts on the data. We put that source behind an interface that can then be plugged and played with different implementations. This is generally a good design pattern to follow, so it's worth promoting beyond the benefits of integration testing.[1]
The in memory providers should expose an easy way to configure the test data to input into the system, and a way for state to be reset between tests. For example, consider this
DimmadomeInMemoryDataProvider
. It implements aIDimmadomeDataProvider
interface. The interface abstracts away where the data comes from. In the real system, we have an implementation that pulls data from the Dimmadome Data API. Let's define the in memory provider and instantiate an instance to use in our test:
Source data may also come from Postgres itself as a target. We thus introduce the pièce de résistance of our integration testing capabilities: a PGLite powered in memory db client. PGLite is an embeddable Postgres client. It's lightweight enough to instantiate/tear down quickly, which helps mitigate many of the historic pains of writing and maintaining integration tests. For one, we aren't reliant on spinning up a separate, heavier Postgres process. And because we can afford to reinstantiate new PGLite instances between test runs, we eliminate the potential for difficult-to-debug failures due to data pollution between tests. For our purposes, we've built a testing client using it that is compatible for use with Slonik, our Postgres query library of choice. Our test client exposes methods to execute queries. We use these to configure initial tables and data for use by systems under test. Our Dimmadome ETL depends on some metadata tables, so let's create our test client and get those configured.
Once test data sources have been set up, we can execute our target system. It should perform its magic and ultimately conclude with data written into the in memory test client.
Now we can use the client to inspect the data written. How? Through good old SQL queries. Additionally, the client provides assertion methods that we use to inspect what queries were executed. This is useful for asserting things like "when we see data matching situation X, we should NOT run a query to do Y". We've already asserted that our ETL process returns a success response. Let's run some more assertions to verify it wrote the data we expect.
That's it! We've successfully written an integration test that confirms a particular configuration of data into the system results in data out of it as we expect.
Here's our final test in full.
To what extent
Now that we've walked through an example of integration testing in action, it's worth taking a moment to consider a delicate balancing act our procedure asks of test writers: to what extent should your in memory providers emulate the behaviors of the real data APIs used in the production system under test? In other words, how complex should their implementations be?
Writing in memory providers for use in mocking involves… writing code. And as is the case with any code, the more of it that exists, the greater the surface area is for bugs and disaster. Ideally, the in memory providers require little code, and what code is written is straightforward CS101 stuff. This was the case for mocking the DimmaAPI. But this may not always be the case. There are rougher paths that one might be led down, asking you to mirror more complex data operations or logic. Consider some source data API that exposes a findKMeansNearestNeighbors
endpoint. At face value, it seems that it asks us to fully implement k-means clustering in our in memory provider to match its behavior. If we do follow that path, then what are we testing? We want to say the system as it actually exists, but past a threshold of sufficient complexity, the mocking code itself is what's under test.
As the test writer, it is your responsibility to consider the complexity of your in-memory providers – of your mocking utilities – and to find alternatives when possible to keep it contained. With the findKMeansNearestNeighbors
example, do we need to provide the real k nearest? For many tasks downstream of it, we probably care about what is done with the results pulled from the data source, not the nature of the results themselves. For example, if the data source declares records X, Y, and Z the three closest neighbors of record A, do the downstream processes treat them as such? Viewed from this lens, we arrive at a straightforward implementation of findKMeansNearestNeighbors
… simply return any three arbitrary (but deterministic!) records. Fin.
These considerations are more art than science. And are heavily context-dependent. Ergo, the balancing act.
It's also worth considering why we implement these in memory providers rather than using more traditional mocking libraries or frameworks. Our dominant motivation is to avoid complexity –another reason why the balancing act we discussed is so important! In our experience, mocking frameworks carry footguns that can make test code hard to understand. We consider this the classic "with great power comes great responsibility" issue. Full-fledged mocking frameworks offer a plethora of powerful tools for testing. Spys! Mocks! Stubs! We've found, though, that our tests are simpler with directly written in memory providers that provide exactly what you need, nothing more. As a bonus, they can be extended with custom assertion or verification utilities directly tailored for your use case in similar ways to the assertion methods we demonstrated on the PGLite test client.
The client in detail
We'll now present some of the inner workings of the test client; namely, its constructor (which initializes the PGLite in memory database), internal data structures for tracking executed queries, and its query interface.
All of the heavy lifting of executing queries is passed on to PGLite. The test client serves as a simple wrapper that adds in the special sauce of tracking what queries have been executed. This is what allows us to provide richer assertion functionality beyond just inspection of table content. For now, these queries are normalized (whitespace stripped) and stored as strings in the query log. We have plans to more richly parse queries into ASTs, which can be interrogated with more complex questions like "did an update statement execute against table X settings column A to value W?"
Conclusion
We've benefited tremendously from this method of integration testing, in no small part due to the low barrier of entry to writing and executing them. There's no dependency on Postgres running in your local environment or in some container in our CI environment. It's all in memory. The tests are easy to write, cheap to run, cheap to reset. Which contrasts beautifully with the canonical worries surrounding integration testing we presented at the start of this article. With positive experiences under our belts, we at Numeric welcome the new era of in-memory, embedded Postgres.
---
[1]: We generally consider it nice to think of any data manipulating code in this way, for the benefit of readability and, especially, for testing. Separate the work of getting data from the work of actioning on the data to promote testability. This also produces code that adheres to the pure functions style standard.