TLDR: yes, yes you can.

Scalability is the name of the game, these days. If your website can’t cope with millions of users interacting with it at the same time, while using the new state-of-the-art microservices architectures and a kubernetes cluster (is that still in or is it legacy now?) with resiliency that would rival that song you’ve had stuck in your head for the last 6 months… are you even trying?

But then again, why would you go through all this trouble to manage a bunch of small furry canines and their grooming appointments in your local shop?

That’s right, today I’m starting a Pomeranian Grooming Salon! Or PomSalon, since the former is a mouthful.

Dog grooming is (probably) a tight operation and we don’t want to clog it up with needless complexity. So the first thing I’m tossing out the trash is a full blown database. SQLite is getting quite popular in the Go ecosystem, mostly by people who are focusing on simplicity. Go deployments are truly easy, usually consisting of a single binary. Docker images are ridiculously small, some apps clocking at less than 10 Megabytes. Adding another process for managing a database, like postgres (my favorite and usual choice) implies more and heavier moving parts to manage, unless you pay for a database in the cloud.

Are we losing functionality with this choice? Of course. SQLite does have tradeoffs. Horizontal scaling will be off limits out of the gate (for the most part), since SQLite only works decently with local storage. Some handy features present in more robust solutions won’t be around. Mostly, you should treat your database as a repository of information with little logic behind it, and keep your business logic and features in your own code, like a good boy/girl/other.

But saying that SQLite is having it’s renaissance in Go doesn’t mean that translates to other languages. Can we have peace of mind that our FastAPI backend will survive a horde of Poms trying to schedule their appointment at the same time after a mud party in the garden?

This is the first question we need answered, if we want to bet our money in a FastAPI + SQLite stack. If we can’t handle a reasonable load, we might as well think about another solution, either switching to Go or switching to Postgres and a more complicated deployment.

For now, let’s check if we can push all our Pom customers into the database, using the least effective way.

Our customers

This is the type we want to store:

type dog struct {
	ID          uuid.UUID
	FirstName   string
	LastName    string
	Age         int
	Sex         sexType
	Temperament temperamentType
}

I made a small dog generator, and made a csv with 100 000 poms. It’s a ridiculous number, I think. I randomly chose Go for this generator, but Python would have been just as adequate.

Then, I made a small script in Go and Python to create a database with an appropriate schema, set the PRAGMAs (SQLite database settings, more ahead) and then INSERT all the dogs, one by one (the worst way).

What’s a PRAGMA and what is WAL mode?

PRAGMAs are SQLite settings which configure how the database behaves, mostly through it’s driver. In the interest of better performance and features, I’m changing these “knobs”:

journal_mode=WAL, which sets the database to use a write ahead log. This allows simultaneous reads while writing (writes are still made one at a time).

busy_timeout=5000, which makes writes wait at least 5 seconds before failing due to a database lock. Since we can have many connections trying to write at the same time, and only one can do it, the others will wait for their turn. Of course this means writes should be small, or else we’ll start running into trouble.

foreign_keys=1, which turns on foreign key checks. SQLite is old and favours compatibility, so foreign key constraints aren’t on by default as they weren’t always around.

There are many other PRAGMAs in SQLite, and if you’re going to use it, you should get to know them here. Blog posts are never a substitute for documentation!

Libraries

Python has SQLite built in, so we’ll use that in the interest of keeping everything simple.

Go, on the other hand, uses a combination of the SQL module (built in) with drivers you download. There are two main SQLite drivers:

  • github.com/mattn/go-SQLite3
  • modernc.org/SQLite

Mattn’s driver binds to the official C library and modernc’s is a Go port. This means that the first driver requires C compilation using CGO and the latter doesn’t. Modernc implementation is still beta and we may suffer a performance penalty, but how much? I tried both and will post the results ahead.

Code disclaimer

I followed the Primeagen’s rule that personal projects always have garbage code.

In the Go code I have taken the liberty of duplicating types between the generator and the inserter and I’m using panics like there is no tomorrow. This is wrong. Don’t do this. If someone catches you doing this, it’s straight to jail.

In python, the situation isn’t better. It’s script kiddie land here.

I believe it’s still the right approach to the subject because in real life you might want to get a feel for this kind of information without commiting yourself to good code, which takes thought and time. I’d never write something like this in production.

If you still want to check the garbage code, it’s here in this link.

Results

The numbers speak for themselves:

Go with C driver

100 000 inserts took 11 seconds, making 9091 inserts per second.

Go with Go port driver

100 000 inserts took 548 seconds, or 182 inserts per second.

Python

100 000 inserts took 234 seconds, or 427 inserts per second.

Discussion

When writing, which is the slowest operation, we’ll never get better performance out of a request than this. The numbers can only get worse.

What can we assume about our expected load? Let’s imagine we have 10 groomers, each taking 2 hours with each pom, ending up with 40 appointments per day, which already feels quite unrealistic for this small mom and pop’s pom salon. A whole year’s worth of appointments would mean less than 14 600 insert records, since we won’t be open every day.

Python’s 427 inserts per second is quite decent. We could schedule our whole year in about 30 seconds. We won’t have to but we could. Scheduling also won’t be the whole story, there will be reads and updates, but this kind of performance does bring a lot of peace of mind.

Go’s C driver is a complete beast. According to the first google result, as of 2019, there are an estimated 1.5 million mini Pomeranian puppies in the world. That means it would take us less than 3 minutes (theoretically, SQLite’s performance should degrade with table size) to register all of them in our database.

Go’s Go driver surprised me and not in a positive way. I’d argue the major selling point (C free) is important mostly in Windows builds, as the C compilation process is more painful. In Linux CGo builds, it’s quite trivial. Builds are still fast and you don’t come into contact with the ugly C parts. Also, with the C library we can rely on a stable implementation, as the Go port is still beta. Will 182 inserts per second be enough? Very likely. And we could always start with this driver and move to the other one later down the line if we start having performance issues. Switching is a mater of installing the other module and changing the driver string and the database path (writing the pragmas in the path string also uses a different format).

What’s next?

If I get around to it, the next iteration requires adding the network layer. But, for now, I can rest confident that SQLite can handle the Pom job.

What about backups?

Litestream can be our friend. If we can live with some danger of losing the most recent data, SQLite’s online backup features mean we can create a file backup and ship it somewhere else.

What else should I check?

A more real scenario with concurrent reads and writes would be more interesting but there is a tradeoff with the time it takes to assemble those tests. When starting a project there is and always be a degree of uncertainty you just have to live with.