background

30K Messages Per Second Queue | Scaling Postgres 295

Join Over 1,000 Engineers & Get New Episodes Weekly!

In this episode of Scaling Postgres, we discuss a 30K messages per second queue built on Postgres, a zero downtime logical replication upgrade, the release of the Supavisor 1.0 connection pooler and PostGIS day.

Content Discussed

YouTube Video

Podcast Audio

Transcript

Hey, do you have a queue system for your application? Do you use Rabbit MQ? Or maybe Kafka? Well, would you consider using Postgres? In this episode, we discuss a high-performance Postgres queue. To start, I hope you, your friends, family, and coworkers continue to do well. Our first piece of content is "Yeeting over 30k messages per second on Postgres with Tembo MQ". This is from tembo.io and they are talking about the performance of their queue system that is run on Postgres as an extension called PGMQ. They just released version one and these are some highlights. Number one, the API is stable now. All the PGMQ objects now live in a PGMQ schema. There have been some relaxed compiling requirements. They added an unlogged queue type to give you even more performance. But of course that's, as they say, at the expense of durability and of course updated docs. Now I don't know if the numbers they're reporting here are for logged or unlogged.

My assumption is that they were logged because they are not explicitly saying that it was unlogged. But they've got a number of metrics that they're testing and showing here. Now this is a queue system that we've discussed in previous episodes of Scaling Postgres. So you can look at some of the previous episodes or blog posts on their website to learn more about it. But this is basically a performance perspective with the 1.0 release. In the first example that they're looking at here, they're using a batch of 100 items that need to be queued and each message is super small, only 22 bytes. It's the equivalent of saying "username: elvis", for example. So really really short. And in their queuing system, they could write to the queue at 150,000 messages per second and they could read from the queue at 30,000 messages per second. This instance size is a 16vCPU and 30GB of memory. They did all of these reads and writes using 20 concurrent writers and 10 concurrent readers or consumers of the batches.

Now they have made different changes to help achieve this performance. They're vacuuming very aggressively. They set their shared buffers at 60% of system memory, which is kind of crazy considering the normal recommendation is 25%. So this was actually, I think on their cloud platform offering. This is an open-source extension that we'll look at on GitHub in a second. But I think this was for their optimized all-in-one solution. The next thing they looked at is increasing the message size because 22 bytes is hardly anything for a message in a queue. So they decided to test 1,000 bytes or a kilobyte message size for each message because they said generally that's what is recommended for Apache Kafka. So with that, they still were able to sustain 30,000 messages per second on reads, but it dropped down to 30,000 messages per second on writes. 

But a lot of that reason is also because they dropped the batch size down from 100 to 10 because of the larger message size, but they could still read 100 off of the queue at that larger message size. It says the next thing they looked at okay, what if you don't want to do batching, you just want to do a single message? So some single event has happened, you need to throw it in the queue and do something with it. And that's what this looks like. So they have a batch size of 1, it's still 1 kilobyte per message. They had 50 writers, or they also call them producers, who were able to write 10,000 messages per second into the queue. They had 100 consumers or readers reading one message at a time, and they got up to 7,600 messages per second in the queue. So still I think that's pretty good.

The next thing they looked at was actually a smaller instance with just one vCPU and 4GB of memory, and they reduced the number of writers and readers and the batch size down, but still, it was doing thousands of messages per second. So again, I believe these numbers are from their Tembo cloud product where they basically wrap up everything for you, but it is available as an extension that you can install in your own Postgres installation to my knowledge. So here's the link to GitHub and the license, as you can see here is basically the Postgres license. So hopefully that should be good for most people to use if you're already using Postgres. They offer it as a docker container or you can build it from the source. So if this is of interest to you, definitely check out this piece of content.

Next piece of content- "Zero downtime Postgres upgrades". This is from knock.app and they're describing their upgrade from 11.9 to 15.3 with zero downtime using logical replication. So this is a super long post that they provided a lot of detail in. Some of it's redundant, but it is really great content. And if you're thinking about doing a logical replication upgrade, I would definitely review this blog post. They have a lot of recommendations on how to handle that, but I'll hit up some key points here. So basically they used logical replication to do the upgrade. So they brought up an entirely new server. They're using Aurora in this case, but of course, you could use it with just straight Postgres. They then got the old database in sync with the new database using publishers and subscribers.

And then at a given point in time, they transitioned to using the new instance and they had different techniques depending upon the tables that they wanted to migrate. So how they replicated their small tables is using this technique here. Now they set up the publication and they set up the subscription on the subscriber side. But what they did is they added a table at a time to the publication to minimize the amount of writes and reads that were going on. So basically they are altering the publication that they established, and then they're adding the new table to it. Then on the new database, you need to alter the subscription to give it a refresh so it can then see that table and start synchronizing it. So that's how they did all of their small tables. For the large tables, that were primarily append-only, they were large, but they were only changing recent data. 

This is the technique that they use. So they still used the same technique for the small table, except they did not copy the new data over, so they were just synchronizing the new data. But then they used a technique where they took a snapshot of the old database and data was replicated in the background by a logical replication to the new database, even while new data was being inserted from the old database. So this definitely did require a little bit of juggling. But they also had cases where they had large tables with many updates over most of the rows. So they weren't able to use this technique. But I think they just got through it by monitoring the replication to make sure that it negatively impacts the system. Now, they did make a note that if you were on 15 and greater trying to do a logical replication upgrade, you could use the new filters in logical replication. 

I've actually done this for some of the logical replication upgrades that I've done where you have a multi-terabyte table. You can create multiple publishers and subscribers that only bring a portion of that table over at a particular time, using a WHERE clause to bring a quarter of it over or half of it over, or whatever you choose. And there's even a technique that you can use that wasn't mentioned in this blog post to bring over really large tables in earlier versions and that's by setting up logical replication using a snapshot. We covered this in a previous episode of Scaling Postgres. So basically you take a snapshot as if you were doing a backup. You set up your logical replication using that snapshot, and then you take a backup using pg_dump, pg_copy, or some other technique.

Then you restore it on the subscriber database and enable replication at that point. I actually had to use that technique for a multi-terabyte database to get it replicated quickly. I basically did ten parallel streams of a large multi-terabyte table to replicate it much faster than one publisher subscriber stream could do. So that's another technique you could use. They talked about being able to monitor the replication status to make sure things are okay, how you can abort the replication of a particular table if you need to, and restart it again, re-add it. Then their process of actually doing the switchover in their application. So this was a great post with a lot of detail that I definitely suggest you check out if you're looking to do a logical replication upgrade.

Next piece of content- "Supavisor 1.0: a scalable connection pooler for Postgres". This is from supabase.com, and of course, they're talking about Supavisor version 1.0. The highlights for this version are supporting query load balancing, name prepared statement support, and query cancellation. So the first one talks about load balancing and basically read requests can be sent to any Postgres instance in the cluster. What they mean by cluster is probably the primary and however many replicas you have.

It says, quote "It randomly distributes these read requests across the entire Postgres cluster", and it "...targets write operations to the primary automatically by probing the read replicas until it hits the primary". And this is what you can do with the libpq client. Now, looking at this, the concern that I have is I know a lot of applications do a write and then immediately do a read for some things. So I'm always wondering, is that read going to time out? For that reason, I normally see applications dedicate certain reads to the replicas and certain reads to the primary. So it looks like that question they're trying to address with the read-after writes here. And they say, quote "...it's easy to guarantee read-after writes from the same client by wrapping the read and write in a transaction".

So it sounds like if you're going to be reading something right after you've written it, you have to rely on putting it in a transaction to make sure the data is there as opposed to hitting something that hasn't been replicated yet. But there's a future plan to support set server as primary to make sure the reads go to the primary basically then of course they have the name-prepared statement supports that we've seen for other connection poolers. So that's great. And query cancellation apparently wasn't working, which I guess is a thing now. So if you're interested in trying Supavisor, definitely check out this blog post.
Next piece of content- "PostGIS Day 2023 Summary". This is from crunchydata.com. This was hosted a couple of weeks ago and this lists all the different presentations and the things that were discussed at PostGIS Day that Crunchy Data put on. So if you're interested in that, definitely check out this blog post.

Next piece of content- "Updates on trademark actions against the PostgreSQL community". This is from postgresql.org and apparently, the organizations that were having the trademark dispute have, quote, "...reached an amicable resolution". So that's great. Fundaciòn PostgreSQL has surrendered all trademarks and entered into a trademark license agreement with the PostgreSQL Community Association.

Next piece of content- "A Sneak Peek Into the State of PostgreSQL2023". This is from timescale.com and they did their State of Postgres survey between August 1 and September 15 of this year, 2023. The survey is being prepared for release later, but this is an early preview and they addressed a few questions, such as where the primary geographical location of the survey respondents came from. They asked how long you've been using PostgreSQL. They asked what is your current profession or job status? So a lot of different types of engineers at the top here, followed by management, and then finally database administrators. They ask have you ever contributed to PostgreSQL?

Rate your experience with it, and a question on AI in terms of your thoughts about it and why you're using it in PostgreSQL. The number one reason is not surprising. Basically, keeping your vector and your relational data in the same place is why people would use Postgres for AI work. And this is also interesting too. The top extension that people are using. PostGIS is number one. I would have thought that it would have been pg_stat_statements. I think Timescale's DB is here. I guess it's because predominantly people interact with Timescale. UUIDs, this is unsurprising, being on the list. Pgcrypto, it seems more people than I thought are using pgcrypto. I haven't used it yet. Pg_trgm I expected it to be a little bit higher, but then we got the postgres_fdw. That makes sense. Pg_repack, pg_cron, and citus so you can check out this blog post if you want to find out more about the preview.

Next piece of content, there was another episode of Postgres FM last week. This one was on "Constraints". So they talked about the six different types of constraints in Postgres, basically. So check constraints, null constraints, unique constraints, primary constraints, foreign key constraints, and lastly exclusion constraints. So if you want to learn all about those you can definitely listen to the episode here or check the YouTube video down here.

Next piece of content- "Understand Table Statistics Using pg_stat_all_tables". This is from stormatics.tech and they show the layout of the schema for the pg_stat_all_tables, but then also some different questions you can answer with it. Such as how to identify tables with the highest frequency of sequential scans in the database. So maybe you want to add some indexes. How to identify unused or infrequently accessed tables in PostgreSQL, so maybe you don't need those anymore. Next is how to check the right activity of tables in Postgres and then how to determine the number of live and dead tuples in a table and check their vacuum status. And this is usually the number one reason why I'm using it. To check how often a table is being vacuumed. Is it enough? Is it perhaps too much so? I tend to like to look at it for this purpose, but check this out if you want to learn more.

The last piece of content is "Specialized CPU Instructions". This is from momjian.us and we had a discussion on a Timescale blog post that was talking about them using SIMD or single Instruction Multiple Data functions that are a part of CPU instructions for them to accelerate some parallel compression. Apparently, Postgres 16 is using SIMD intrinsic functions as well on the x86_64 and the ARM CPU platforms. Different operations include doing copies set to a constant test for equality or test for less than a value. It says most of these right now are used at the byte level, but some may also process four-byte integers doing different finding and checking functions. So this is interesting, I haven't heard this before and pretty interesting. So if you want to learn more you can check out this blog post.

episode_image