background

Stat Tracking, Temporary Tables, pg_squeeze, pg_auto_failover | Scaling Postgres 136

Join Over 1,000 Engineers & Get New Episodes Weekly!

In this episode of Scaling Postgres, we discuss stat tracking, temporary tables, a new utility called pg_squeeze and an update to pg_auto_failover.

Content Discussed

YouTube Video

Podcast Audio

Transcript



[00:00:00] In this episode of Scaling Postgres, we talk about stat tracking temporary tables,PG Squeeze and PG Auto Failover. I'm creston. Jameson. And this is scaling postgres episode136 one.

[00:00:22] Alright, I hope you, your friends, family and coworkers continue to do well. Our firstpiece of content is tracking postgres stats. This is from Pgcraftsman IO and this is a blog postabout a very simple SQL script that has been put together that looks at existing system tableswithin your database and it basically takes snapshots of them. So a lot of these stats tables thatexist within postgres, they either have point in time statistics like PG Stat Activity, of course, orthey have cumulative statistics like PG Stat, all tables. He argues you basically need to createsnapshots of these tables to glean useful information from this information over time. And he hasthis PG Craftsman's snapshots SQL script that basically takes snapshots and records it in thedatabase itself of these different statistics. So basically, maybe you want to schedule this with acron job on a periodic basis and take a look at the data, or maybe do some work or some joband then take another snapshot to compare how your system has been modified by changes. Sohe gives an example of running the command once you have this SQL script and it creates aseparate schema for itself, as well as a number of tables and indexes. And he shows anexample of the tables that are present here. Now, he does mention some of the things likememory and CPU aren't currently used now because there's no way to query postgres to getthat, but he's alluding to something coming in the future because he mentions a future post. Isuspect he may be using an extension that actually contacts the underlying OS to pull things likeCPU stats and things of that nature. But basically, taking a snapshot is rather simple. You justrun this command to take a snapshot from those existing tables. Now he talks a little bit aboutwhat it takes to do a snapshot and it basically depends on how many different objects you havein the database or in the case of static activity, how many connections you have. And this postbasically goes through this allows you to check out what the snapshots, as they exist in theseparate schema are and then doing a report from one snapshot to another and it gives youstats from this table. So for example, for the PG bench history, you could see from a workloadthat was run, over half a million rows were inserted. So even though this looks like it was justintroduced and it's relatively simple, it looks like something that could build up to somethingpretty interesting. It's basically a very bare bones way to collect a time series of statisticsinformation from these tables over time. So if you're interested in that, definitely check out thisblog post the next piece of content is PostgreSQL. Sophisticating, temporary tables, this is fromCyberTech Postgresql.com, and it's all about temporary tables. These are tables that are createdwithin a session and then once that session is complete or the connection is closed, thetemporary table goes away. Now, in addition, he mentions a few other configuration options youcan use with temporary tables. The default is to preserve rows on a commit of that table.However, you could also choose to delete rows like do a bunch of work, and then when you do acommit, it actually deletes the rows and blanks out the table. Or you could actually choose toentirely drop the table. Now, if you're going to start using more of these temporary tables, onething he says here to be aware of is that the amount of memory dedicated to temporary tables,by default it's at eight megabytes. Well, you may want to increase that if you want more spacefor temporary tables to continue to be memory resident. So that's a configuration you may wantto change. So this is a very quick, simple post about temporary tables in postgres. So feel free tocheck it out.

[00:04:04] The next piece of content is PG Squeeze optimizing PostgreSQL Storage. So this is anew utility that's available that enables you to shrink tables live. While the database is working,move tables and indexes from one table space to another. Index, organize, or cluster a table, aswell as change the on disk fill factor. Now, it seems like the primary use case is to shrink tablesin a live database similar to what Pgrack does. What's interesting about this, it actually uses --logical replication to do the majority of its work. So this may require less locking to be takingplace than maybe PG repack. Now, they discuss a little bit about table Bloat in general, andbasically because of how postgres works, whenever you do an update, a separate row iscreated and then that old row is deleted eventually. But even though vacuum happens andcleans up that dead row, it never reclaims the disk space. So for example, if you have a table,you've inserted some data, it's at 69 megabytes. If you then update all those rows, it goes to 138megabytes. But vacuuming the table won't reduce the size of the table. However, if you use theirutility, PG Squeeze, it's actually able to compress the table back down to the original size of 69megabytes that they show here. They also talk about you can set up a scheduling system with itbecause they have a squeeze tables where you can insert a row in here to be able to squeezetables on a periodic basis. And they have a schedule option that appears to work like Cron inorder to schedule these operations. So it looks like a new utility to do something similar to whatPgrapak does, but it does it slightly differently. So if you're interested in that maybe you want tocheck out this new utility from Cybertechnposgresql.com.

[00:05:53] The next piece of content is what's new in PG auto failover 1.4 for postgres HighAvailability. This is from Citusdata.com and PG Auto Failover is a very simple utility to do aprimary to a secondary failover to maintain a High Availability postgres solution. And basically itjust requires having a monitor that does health checks against a primary and a secondaryinstance. So relatively simple setup and basically it just takes these four commands to get asetup running. Now what they're mentioning here in 1.4 that's new is the ability to have multiplesecondary or multiple replicas available because how it worked previously if the primary wentdown, of course it would promote the standby or the secondary node to become the primary. Butwhat if your secondary node goes down? What happens then? Well, if you have SynchronousCommit Set, it's actually going to stop writing to both the primary and the secondary becausethey need to happen in synchrony. So what the utility did was actually turn off that synchronousreplication. So you can still do writes to the primary but then essentially your data is unprotected.But with being able to support multiple replicas, even if a secondary node goes down, you stillhave data redundancy on both the primary and the secondary if you have two or more nodesavailable. So this version supports that capability as well as Postgres 13. So if you're looking fora High Availability solution that looks relatively simple to set up but they continue to add newfeatures to it, definitely check out this post from Slitusdata.com.

[00:07:32] The next piece of content is Multitenancy with Postgres schemas key conceptsexplained. This is from the blog@arcagency.com and basically using schemas they areessentially namespaces for tables. So if you're doing multitenancy where one account has adedicated schema, another customer account has another dedicated schema if you want to set itup this way. This blog post describes that and it basically shows you how you can createadditional schemas and how they namespace the objects within them and how when you'regoing to query against it, it's basically separate objects under different namespaces. So they hadcreated a tenant one namespace and now you can look in the things table in that namespaceand that there exists a default namespace public that all objects are created in by default unlessyou specify a namespace. So it talks about being able to query different one by specifying thefull namespace or you can define a search path and customize it so that it will only look inparticular schemas for tables and how you could do this with a set command for sessions inpostgres. Of course the thing with that that you need to keep in mind that they cover here is thatif you're using something like PG Bouncer and it's set to transaction pooling, you're probablygoing to be mixing tenants data up because transaction pooling is not compatible with sessions,so it's definitely a big caveat to keep in mind. But this post goes over some of the things to keepin mind if you're exploring using Schema based separation for multitenancy.

[00:09:05] The next piece of content is PostgreS --QL monitoring for app developers, alerts and troubleshooting. This is from Crunchydata.comand it's following up a series of posts that they've been talking about monitoring, particularly theirtool Postgres Operator that works with Kubernetes clusters and PG monitor for the monitoringand it discusses using the Alert manager from Prometheus. I believe in order to set up thesealerts to be able to get notified if there's something that's outside the bounds of what's expected.And it looks like they have some defaults that have been set up for their particular solution. Andthe most important ones that they talk about in terms of setting an alert on is this thing on?Basically is the postgres database up and available the second 01:00 A.m. I out of space.Basically tracking disk usage to make sure that you're not going to run out of space either due toexcessive logging or just database growth. Next is why is this not clean? Basically trackingvacuum operations, in particular keeping an eye out for transaction ID wraparound to make sureyou don't get close to that limit. Next is falling behind the leader, which is basically monitoringreplication lag as well as running out of numbers, which is basically a sequence exhaustion,which means a sequence is set to a particular limit and you either run out of that sequencenumber or your table itself. The column that the sequence is used for you overrun the size of theinteger that's present there. So definitely some key things to monitor that are discussed in thispost. So definitely a post to check out for monitoring these critical things as well as the tools thatthey have set up if you're interested in checking those out.

[00:10:43] The next piece of content is tuning your postgres database for high write loads. Thisis also from Crunchydata.com and basically they're only talking about one parameter changehere, and that is the max wall size. Now, they do discuss some others related to checkpointsbecause basically what's happening that they've seen in log files sometimes is that thecheckpoints are occurring too frequently. That's because not enough wall size is maintained tobe able to prevent early checkpoints. So basically the only thing you have to adjust here to getthings back to normal is increasing the max wall size. So this is a very brief post on that, but ofcourse, you do want to consider more configuration changes. For high write loads, such asadjusting your checkpoint timeout, the warning completion target. A lot of these parameters youwant to change, but this particular post only covers the max wall size, so you can check this postout if you want to learn more.

[00:11:37] The next piece of content is postgres 13 observability updates. This is fromDataegrid.com and they've taken an image and have updated with the changes to Postgres 13with regard to where you can get insight into performance with postgres and it points to all thedifferent subsystems of postgres and the system views you can look at to get insight into it. Sofor example, you can look at the PGSTAT Wall receiver or the PGSTAT replication. So it lists allthese different system views where you can get insight into performance of postgres and theytalk a little bit about the changes from 13. But if you're interested in this image like this, definitelycheck out this post.

[00:12:16] The next piece of content is Talking postgres on Podcasts. This is from CleverelifentCA and he has referenced to a couple of different podcasts he's mentioned since he hasn't beenattending conferences recently. You can get more GIS content here.

[00:12:32] And the last piece of content is the PostgreSQL person of the week is Greg SabinoMulane. So if you're interested in learning more about Greg and his contributions to Postgres,definitely check out this blog post that does it. For this episode of Scaling Postgres, you can getlinks to all the content mentioned in the show notes be sure to head over to Scalingpostgres.comwhere you can sign up to receive weekly notifications of each episode, or you can subscribe viaYouTube or itunes. Thanks. --

episode_image