background

Storing Bits, Autovacuum, Assertions, Logical Recovery | Scaling Postgres 16

Join Over 1,000 Engineers & Get New Episodes Weekly!

In this episode of Scaling Postgres, we review articles covering efficiently storing bits, autovacuum, setting up assertions and logical replication recovery.

Content Discussed

YouTube Video

Podcast Audio

Transcript



[00:00:00] In this episode of Scaling Postgres, we talk about storing Bits more auto vacuumassertions and logical recovery. I'm creston jamison. And this is scaling postgres episode 16.

[00:00:21] All right, the first article this week is using Bits Efficient Storage in Postgres. And thesubtitle is an example of how you can efficiently store millions of data points in postgres usingbinary. And this is from the Open Digirati blog.

[00:00:38] Now, the problem that they had is they wanted to track streaks of doing a particularactivity. In their case, it was daily reading of a bible. But they wanted to be able to track thesestreaks and they were debating the most efficient way to do it. So they tried one way and itwould give them a table that would grow by 4GB a year per million users. But they wanted to seeif they can do better.

[00:01:04] So what they decided to do is to store one bit a day, because what they could do is doa 32 bit integer for each month. So one bit per day per person they're doing per person per year.So they're using an array of twelve integers. So basically one user per row with this data typeenables you to store a whole year's worth of data. So basically allowed them to only grow by 78megabytes per year per million users.

[00:01:39] So this was a pretty innovative and interesting way to find a way to store dataefficiently when they plan to have quite a bit of data collected. So it's not too long a blog post,but definitely one I suggest checking out just to see the techniques that they used.

[00:01:58] The next post is auto. Vacuum slides from Pgcon 2018 Ottawa. This is from theDataegrid.com blog and in it they have their slides from the Talk. And again, if you're havingparticular issues with your database that you suspect is vacuum, the solution generally is tovacuum more. But I find it amusing. Here the slides, guys, what's wrong with the database? Andthe queries are canceling. Where's the standby? It seems vacuums is killing everything. So whatto do? Turn auto vacuum off, but then that causes tons of other problems. So definitely apresentation to go through that talks about how best to configure auto vacuum. It kind ofexplains the Y four, again in terms of NVCC. But the meat of the presentation is about aroundthe middle where it talks about proper tuning of auto vacuum. So again, another great post tocheck out if you're in the process of tuning your auto vacuum to be more efficient.

[00:03:01] The next post is actually a YouTube video. It's called RailsConf 2018 dropping into BTrees by David McDonald. This is on the Confreaks YouTube channel and in it he talks aboutbinary trees and what they are in terms of databases. Now, he's a developer by trade, so he'sdescribing what binary trees, b tree indexes are from a developer's perspective. And he talksabout some of the questions that he had and basically he did some research to determine whatthe correct answers are. So for example, when and how do I use an index? Why is an index onan integer faster than on a string? Why do inserts bring a performance penalty? And how canyou drop an index and not hurt the data? So a lot of this is described in this presentation from anarchitectural standpoint to be able to answer these questions. So if you're wanting to learn a bitmore about Btree indexes, definitely a presentation to check out.

[00:04:00] The next post is from a presentation. It's actually the slides and it's assertions andhow to use them. And this is from David Feder and this is the GitHub IO presentation link. Now,in terms of what assertions are, there are data constraints expressed in SQL which can spanmultiple tables in ways we haven't been able to do before. So there's constraints that you canplace in a single table for what data is allowed, but assertions operate across tables.

[00:04:33] Now, PostgreSQL does not have this feature yet. However, in this presentation hegoes over how you could potentially do it using a trigger with its own function for each table foreach assertion. So he goes into the code here. So if you're looking to potentially set up anassertion like this, you can use this technique today. However, he also mentioned in thispresentation that some assertions were started in 2013 in the Postgres dual code base. Butmore recently they've been doing some additional patches. And this one's from this April. So thistwelve or 13. This feature will get in there. So if you're potentially interested in what assertionsare and potentially how to set one up with the tools available today, be sure to check out thispresentation.

[00:05:30] The next post is Recovery use cases for Logical replication in PostgreSQL Ten.

[00:05:37] Now, this is a really long blog post and it goes into extreme technical detail. Basicallywhat it is talking about is cases where you have logical replication that has some failure point, itneeds to be restarted. How do you get it restarted again? Now, PostgreSQL Ten is great forgetting logical replication set up and there's minimal things you need to do. But what happens ifit fails? How do you assure the data gets back in sync? And whatnot or if you have a failover toa replica, how do you handle that? This particular individual, and this is from Avitotech has beenusing another logical replication solution. I believe it was Londist. But he was investigatingPostgreSQL Ten and he is documenting kind of what potential changes would be needed forthem to be able to rely on PostgreSQL Ten.

[00:06:34] And right here, as you can see, a very long blog post near the end here, he talksabout some of the things that kind of would be needed to do to be able to properly recover thestate of logical replication. I know the big one here that says blocker for production usage, andthis is something I myself have looked at in terms of logical replication, is that the logical slot onthe standby, in other words, logical slots are not replicated to standby. So you're going to have agap in the data after a replica is promoted. And there was some discussion around this particularissue, like there was a proposal for failure slots, but nothing's really become of that. But there'ssome people that are apparently interested in it. And this blog post mentions a number of bigcompanies in Russia using PostgreSQL, where having these type of recovery features for logicalreplication would be a huge benefit. So if you use logical replication and want to learn moreabout how recovery can potentially be done in some of the areas to watch out for, definitely ablog post to check out.

[00:07:43] The next blog post is actually a presentation. Slides will postgres. Live forever. So thiswas a quick and easy presentation, but it basically talks about how because Postgres is opensource, it's going to have a really long lifetime. The code will always be there, it can always berevisited and upgraded and improved based upon what the community chooses to do, whereasa for profit company, as soon as that product is no longer generating revenue, essentially it willbe abandoned, whereas the code is always available to be changed with an open sourceproject. So basically this presentation is saying that Postgres will have a very, very long life dueto its open source nature. So definitely a presentation to check out if you're interested in that.

[00:08:29] The next post is I or the tiger. I believe that's I of the tiger benchmarking cassandraversus timescale DB for time series data subtitle is how a five node timescale DB clusteroutperforms 30 Cassandra nodes with higher inserts up to 5800 times, faster queries at 10% ofthe cost, a more flexible data model, and of course, full SQL. So this is from the Timescale.comblog. Now, I have presented this before that timescale DB is actually an extension that you addto your PostgreSQL instance. So in this, they tested setting up a Cassandra, actually a five nodeCassandra ten node and a 30 node Cassandra DB to compare insert performance versus a fivenode timescale DB. So they go through over the methodology how they tested and essentiallythey saw improved performance with Timescale and up to the point where they tried to matchwhat timescale DB was doing compared to Cassandra, they went from five to ten to 30 nodesand still didn't achieve that performance. Now, of course, this is timescale DB that has done thisanalysis, so you have to take that into consideration. But definitely it appears like really largeperformance using timescale DB compared to Cassandra for a much lower cost because you'rehaving less nodes to manage and pay for. So if you're interested in that, definitely a blog post tocheck out.

[00:10:02] The next post is multidata center setups with PostgreSQL. And this is from theSeveral Nines blog. Now, this basically goes over when you want to have multiple data centersset --up potentially around the world. What's the best way to synchronize the data between themand allow access to the data? So it goes over when you can think about doing synchronousreplication, when you need to be doing asynchronous replication. It also discusses somedifferent tools that are available at different providers to be able to achieve this. So relativelyshort overview post about that. So if you're considering a multidata center setup, definitely a blogpost to check out.

[00:10:44] And the last post is altering default privileges for fun and profit. What not to ExpectWhen You're expecting. And this is from the Celia US blog, and in it he goes over Setting defaultprivileges. So basically you can use grant statements to grant access to schemas or tables forparticular users, but setting default privileges enables you to set for new objects created movingforward. It doesn't go back for existing objects you have to use grant, whereas if you want to dofuture objects, you use set default privileges for them. And he goes over how to use it andcompares it a little bit to coming from Oracle on what the differences are. So if you have tomanage complex privileges to your database, definitely a blog post to check out.

[00:11:33] That does it. For this episode of Scaling Postgres, you can get links to all the contentpresented in the show Notes. Be sure to head over to Scalingpostgres.com, where you can signup to receive weekly notifications of each episode, or you can sign up via YouTube or itunes.Thanks. --

episode_image