background

PGCon Online, Performance Tips, Tracking Counts, Essential Monitoring | Scaling Postgres 115

Join Over 1,000 Engineers & Get New Episodes Weekly!

In this episode of Scaling Postgres, we discuss PGCon going online, application performance tips, ways to track counts and essential areas to monitor.

Content Discussed

YouTube Video

Podcast Audio

Transcript



[00:00:00] In this episode of Scaling Postgres, we talk about Pgcon online performance tips,tracking counts, and essential monitoring. I'm creston jamison. And this is scaling postgresepisode 115.

[00:00:17] One all right, I hope you, your family and coworkers continue to do well as we stilldeal with the situation that's with us. But our first piece of content is Postgres 13. Beta one isreleased and this is from Postgresql.org. And this essentially goes over some of the releasenotes, talking about the new functionality that's being added, talking about differentadministration, enhancements, security and other highlights. So if you're interested in tryingPostgres 13 before it's released, go ahead and give the beta one a try.

[00:00:53] The next piece of content is that Pgcon, which is a PostgreSQL conference for usersand developers, is actually going online this year. And from what I've seen, there is a no fee. Sothey say there will be no charge for this conference and everything is free, thanks to the amazingsponsors that have helped put this on. So if you have some free time for the talks that are goingto be taking place online on the 27th and 20 Eigth of May, as well as tutorial day on the 26th,definitely check out this link to participate in the conference.

[00:01:27] The next piece of content is postgres tips for optimizing Django and Pythonperformance from my Python Workshop. And this is on the Citusdata.com blog. And soessentially these are performance tips. Now, this is specific to Django and Python, but really theguidance applies to any application framework when I took a look at it. So they're talking aboutwhat to look for when you're looking at your logs in a development environment, monitoring PGStat statements to be able to track what statements are doing, how to avoid loops like N plusone queries. And again, this is specific to Django, but you're going to want to do this in otherapplication frameworks, limiting the columns returned to avoid slow SQL queries pagination tips.In other words, don't use offset and limit to do those types of queries, but specify the orderingusing an order by. So if you want to get insight to all of this, this was posted at this YouTube linkthat you can view right here to get these performance tips. So if you're interested in that, goahead and check out this post and video.

[00:02:35] The next piece of content is how to count hits on a website in PostgreSQL. And this isfrom CyberTech postgresql.com. Now, I thought this was talking about hits on a website, butactually if you look at the detail, he says, suppose you want to grant a user access to a certainpiece of data only X times. How can one implement that safely? So I was a little confused withthat given the title. So it looks as if what they're trying to achieve is that someone wants to onlyallow certain number of hits to a website and if it exceeds it, basically it puts a limit on it, or ifthey know if they go over the limit. So it seems to be kind of an unusual use case. But this postgoes over a lot of different techniques to kind of achieve this use case that I thought wasinteresting and you should check it out. Now, I'm not going into detail, but I'm just going tohighlight some of the things that he goes through so you can get a sense of different techniques.Like for example, in his implementation of going about this, he's using a Btree Gist indexbecause normally you can't combine binary tree index queries with a Gist query, but with thisextension I believe it allows you to do it. He's using range types to be able to set some of thelimits on how many clicks per day are allowed per Customer ID. So this Customer Limit tabledefines the limits. Now he's also logging all of the web traffic so every hit, regardless of beingtracked as the number of clicks per day, he is recording all of the log entries and then he has alog count table. So basically this is a summarization of what's in the log table by Customer ID byday, and he has this counter limit. This is what is allowed for that customer. But then ifsomebody exceeds the limit, then this counterlimit real records that. And again, he has a uniqueconstraint on Customer ID and date. And he's using this fill factor here to not fully fill the tableand allow updates to happen with better performance because maybe they will be placed on thesame page. Now, the meat of the action takes place all in this function here where he's using th--is CTE to basically insert a row in his TLoG table so everything will get logged in that table, butthen it will try to do an insert or an update. He's using On Conflict to do the update into the TLoGcount table. So again, a lot of different techniques are used in this use case. And where I haven'treally seen a parallel to something like this. This blog post uses a lot of different techniques thatyou may want to check out and use some of them for your own application.

[00:05:11] The next post is SQL Window Functions Cheat Sheet and this is from Learnsql.comand they have this convenient two page sheet. I know I don't use window functions regularly andfrequently I have to look up when I want to do one and having this cheat sheet available makesa lot of sense. In conjunction with this cheat sheet, I actually go into a blog post that covers allthe different items listed in it. So if you want a helpful reference to be able to remember how todo certain window functions, maybe check out this post from Learnsql.com.

[00:05:45] The next post is essential PostgreSQL Monitoring part one. This is from PG IO. Sothey're going over in this part one essentially nine things that you should be monitoring. The firstis the Transaction ID range or basically what's your max TXID. You don't want it to exceed 2billion of course. And they give you the code to check that. Monitoring the number of back endsand you can find that just by doing account on the PGSTAT activity table.

[00:06:14] Tracking the number of inactive replication slots because if you have an inactivereplication slot then the wall won't be archives so that's very important to deal with these orphanor inactive slots. And he shows you the query. Just query PG replication slots where it's notactive. Checking on backends that are waiting on locks. So again, you can see whether thingsare locked up by checking the PG Stat Activity table and looking where the wait event equalslock number five is backends Idling in transactions you want to avoid time being spent Idling intransaction and you can check that just by looking at the PGSTAT Activity table. Checking thereplication lag for active connections. So you can see how much of a lag you're running withwhen you have Replicas and that's in the PGSTAT replication table. And you can use these togive you the right flush and replay lag. Tracking the replication lag for replication slots and thelast two are checking the wall file count first of all the number of wall files in the PG wall directoryitself and then also in the Destination Archive directory and they give you for the differentversions the commands to check that. So this is a really great list of the number of things tomonitor that you can get access to by simply querying the postgres System tables. So if you'reinterested in learning more about this, check out this blog post.

[00:07:37] The next piece of content is missing piece failover of the logical replication slot. Andthis is from Procona.com and they're talking about a problem when you have a high availabilitypair of postgres instances. So you have your primary and you have a physical hot standby. Soyou're doing streaming replication to the standby. Now if you have a logical replication enabled toanother instance the problem is what happens when the failover happens because there's noinformation about logical replication slot that's on the primary here and if there's any delay at allwith the replication you could be missing some data once you activate the slot on this side here.So there's no state transfer of the state of the logical replication slot from the primary to what willbe the failover. And they're saying essentially this is a problem if you want to maintain dataintegrity. And they talk about different designs and discussions for solving this problemunfortunately up to this point and they said it's been addressed as early as 2016 in a blog postas they mentioned here. Quote there's no generic way to solve this for everyone. So essentiallythis problem still exists because it's been difficult for people to find a way to handle it to thesatisfaction of everyone. So it's a lot of discussion, but nothing has happened as of yet. And theytalk about some possibilities to address this. So this blog post is important just to bring this issueto the forefront, particularly if you're using logical replication along with a High availabilitypostgres pair. But even if you're not tryin --g to operate in a High Availability mode, the fact that you have a physical replica, if you'reusing logical replication, you're going to have to figure out the best way to deal with this. Andthey talk about a little bit about it, but it's definitely something to be aware of and to plan how youwant to mitigate this potential issue.

[00:09:27] The next post is it time to upgrade your PostgreSQL database? This is fromEnterprisedb.com and they go over why you should upgrade and the benefits to it. And I actuallyliked seeing some of the information that they presented here, particularly looking at this graphwhere they ran PG bench performance for different versions from 9.5 up to the current 12.1. Andyou could see the greater transaction performance with increasing number of clients across thedifferent versions. Now, it's not only for performance reasons, but there are a number of otherfeatures, of course, that have been added that could greatly make your lives easier if youupgrade to a more recent version. And then they also talk about what are the risks of not doingan upgrade, particularly if you're staying on an older version. And then they talked about somedifferent ways that you can do the backup and the best, and which is best suited for what.Generally, I use the PG upgrade for doing the databases to minimize downtime. But if you'rethinking whether or not you should upgrade, definitely check out this blog post.

[00:10:31] The next piece of content is Phony Table columns in PostgreSQL, and this is fromHigo CA. Now, I don't know if I'd call these Phony Table columns, but they're basically talkingabout system columns in a table that are essentially used to manage the database. They aren'tvisible. And if you look here, they created a one column table and the field name, they called afield. And then they queried the PG attribute table and you can see the field. The attributenumber is one and the Type ID is an integer. But then you can look at all the different columnsthat it adds to manage essentially this data stored within each row. So you can see the table.Object ID the Cmax, the XMax C min xmin Ctid as well. And then they go through and theydiscuss the purposes of each of these hidden columns, essentially system managed columns foreach of the tables. And then they interestingly came up with some use cases where you couldactually use these to do some queries. Like for example, they were using the Ctid column toeliminate duplicate records. They have a scenario where you could use this system columnessentially to do that. So if you're interested in learning more about these system managedcolumns and how they could potentially be used, definitely check out this blog post.

[00:11:55] The next piece of content is the PostgreSQL person of the Week is Dmitri Fontaine.So if you're wanting to learn more about his work and contributions to Postgres, definitely checkout this blog post.

[00:12:06] And the last piece of content is how to use logistic regression. Machine learningmodel with two UDA PostgreSQL and orange. Part five. So this is of course the fifth part aboutmachine learning that has been presented on the second quarter Com's blog. So if you'reinterested in learning more about that, check out this blog post that does it. For this episode ofScaling Postgres, you can get links to all the content mentioned in the show notes. Be sure tohead over to Scalingposgrows.com where you can sign up to receive weekly notifications ofeach episode, or you can subscribe via YouTube or itunes. Thanks. --

episode_image