background

Postgres 12, Window Frames, Index Types, Synchronous Replication | Scaling Postgres 83

Join Over 1,000 Engineers & Get New Episodes Weekly!

In this episode of Scaling Postgres, we discuss the release of Postgres 12, how to use window frames, the different index types, and when to use synchronous replication.

Content Discussed

YouTube Video

Podcast Audio

Transcript



[00:00:00] In this episode of Scaling Postgres, we talk about postgres twelve window frames,index types in synchronous replication. I'm Kristen Jameson and this is Scaling Postgres,episode 83.

[00:00:16] You all right? I hope you're having a great week. Our first piece of content end isactually an announcement that PostgreSQL twelve release candidate one is released.

[00:00:32] And it looks like the release of PostgreSQL twelve is set to release the full release onOctober 3, 2019. So that would be this Thursday. So this week it looks like postgres twelve isbeing released.

[00:00:46] Related to this release, there's also a timely webinar that was done called Webinar.New features in postgres twelve follow up. This is from second quadrant postgresql.com andthey go through the different topics they show. Here an intro to PostgreSQL twelve. SQL JSON,improvements to partitioning, reindex concurrently, progress monitoring, generated columns,case sensitive correlations, and plans for postgres 13. So, definitely a webinar to watch. Ihaven't had an opportunity to watch it yet, but I will be registering to go ahead and look at thereplay, but to get ready for postgres twelve. Definitely a timely piece of content.

[00:01:30] The next post, also from second quadrant is PostgreSQL twelve. A few special caseperformance enhancements and these are very short and abbreviate. They're basically talkingabout a minimal decompression of Toast values. Toast is where values get stored when they'retoo large to be put in a single record. It's kind of a spillover area and it says it decompressesprior to twelve, decompresses the entire value. Whereas in twelve they say quote, we onlydecompress what we need, which can speed up accessing the Toasted value. So that could beof benefit for very large records that you may be storing. Another improvement is faster floatconversion to text. And then the third one is a parallel query with serializable. So it's nowpossible to do parallel queries with serializable isolation levels. So just a few additionalperformance enhancements done for PostgreSQL twelve.

[00:02:28] The next piece of content is advanced SQL window frames. So this is a very greatarticle. I definitely suggest you checking out. So he said there was a previous article on windowfunctions and this talks about window frames, so areas within a window. So here's a quickoverview of what window functions are basically allowing you to do things like rolling averagesand things of that nature. And he has these great graphs here that represent what he's talkingabout. So for example, he's looking at a windows by release year. So you can see this is aparticular window and it's going to average the values across that window to give you what theyear average is. Same thing for the next window, which is 2016, and the next window which is2017. So to expand upon his window example, he wanted to have a window frame example andhe wanted to use quote for each film, find an average rating of all strictly better films in itsrelease year. So you're basically doing a comparison among each row. So he gives againexample here of where you had this partitioner, this window and then you're looking at thewindow frames. He goes over the different syntax and then talks about the three differentmodes. So there's the rows mode that operates at the row level essentially and all the differentways that you can define that frame. He looks at groups mode and again how to define theframe start and the frame end and then following up with the range mode then he breaks thatinto some real world examples. Now I of course don't have time to cover all of this in depth butthis is a very great post and I definitely suggest you take the time to check it out if you want toget better with window functions and or window frames.

[00:04:22] The next post is what Django Khan has to do with Postgres and Crocodiles aninterview with Louise Granjon from Microsoft. This is from the Citusdata.com blog and actuallythis interview is mostly talking about a presentation that's going to be given called postgres indextypes and where to find them. So that's mostly what the content is about. But this starts off great.These first three things that she's going to be covering in the presentation talking about onewants the audience to understand that postgres indexes are useful for two reasons performanceand constraints. So you can use a unique constraint to ensure un --iqueness and also get you better performance when doing lookups. And then she talks aboutsecond thing is an overview of different options with indexing. So you could use partial indexes,unique indexes, multicolumn indexes as well as well as just standard indexes and then alsoknow when to use the different types. So there's the standard b tree but also gen that's generallyused for JSON, b or text gist and then brin indexes which again can give huge space savingsparticularly for data types that are more sequential in nature. Now going through this she talksabout being a Django developer and then also what advice she gives to fellow developers interms of learning postgres. So it gives a lot of valuable information for developers that are usingPostgreSQL. So if you're a developer I definitely suggest you check out this piece of content tosee how you could potentially improve working with PostgreSQL.

[00:06:01] The next post is synchronous Replication is a trap and this is from the Robert Haasblog@blogspot.com and what does he mean a trap? It means don't just rely on synchronousreplication to assure that your data is safe. And on two systems. You need to take into accountthe holistic system that you're developing to ensure that data doesn't get lost. And it uses anexample of where you have a user inputting data into a web application that then talks to adatabase system and that there's different failure points along that. And just implementing highavailability or implementing synchronous replication will automatically make sure everything'ssafe. And you have to do more and think more holistically about the system to actuallyaccomplish not losing data. And he just wants to make sure that people are using the featuresappropriately. So like, one thing he says here is I have few reservations about the use ofsynchronous replication for data consistency. Basically making the master remote apply for itssynchronous commit setting and then set the synchronous standby names to a value that willcause it to wait for all standbys to respond to every commit. So that assures that things arewritten to disk across all the synchronous Replicas at the same time, so that you truly shouldn'tlose any data. But there's of course downsides to that that he discusses here is that if one of theReplicas happens to go down now, your whole system is down because it cannot synchronouslyreplicate to that Replica. So you have to have a process in place to handle those particularconditions so that your processing can still continue. He also mentions that quote I also don'tthink it's a big problem to use synchronous replication to control replication. Lag basically have asmaller delay when Replicating to Replica by using synchronous replication. But the issue thathe mostly covers here, it says, quote, where I think a lot of people go wrong is when they thinkof about using synchronous replication for data durability. So basically a reliable system thatdoesn't lose data. And that's when he goes into the discussion about looking at it as a holisticsystem. And this is just one feature that may or may not help you accomplish that overall goal.But overall this is a great blog post, as is almost every blog post he does. So I definitely suggestchecking it out.

[00:08:34] The next post is how to run short Alter Table without long locking concurrent queries.So this is a Casey's recently seen where an application had to add a column to a table. Theapplication ran Alter Table add column without a default and everything stopped for manyminutes. So basically there was some long running query and then when this Alter table startedrunning, it had an Access exclusive lock. But what happens is there's a lock queue and thingsstarted backing up behind this lock waiting for it. So of course, what you need to do now in this isuse a timeout. Now he talks about using a statement timeout, but as mentioned in the commentsthat he agrees with, what would be more appropriate is the lock timeout settings. So generallywhen you're wanting to do these types of DDL changes, definitely use a lock timeout of somenumber of milliseconds or seconds and then if it does not complete, it just errors out and doesn'tdo the DDL statement. So definitely something to keep in mind when doing database changes toyour database.

[00:09:45] The next series of posts are all related to security. The first one is very brief, but ittalks about implementing transpa --rent data encryption in postgres. And this says that this is something that they've beendiscussing and now it looks like it's actually moving forward and they're going to beimplementing a transparent data encryption method where the first thing they're going toaccomplish is all cluster encryption. So that's what they're going to start with. And the hope is,quote, this can be completed in postgres 13. So we're already looking to version 13 and whatcan potentially be an end. And it looks like this is something that they're going to try for. And thisis from Momgm US blog.

[00:10:31] The next post related to security is using row level security to make large companiesmore secure. And this is from CyberTech postgresql.com. And they're talking about setting uptwo tables, one a company table and a manager table, and defining a relationship like this andthen going through and defining a policy based upon a query of these tables to determine whohas access to what data in the system. So they grant various roles and then show how it can beable to query different outputs. Now one thing they mention here is that relying on a query likethis, basically this needs to be run to check that policy every time. So you could get into aperformance issue. But it's just something to keep in mind if you're looking into an investigatingrow level security.

[00:11:21] The next post is new version of PostgreSQL anonymizer and more. And this is talkingabout the anonymizer. So it is a tool that they've developed. And this is from Tadim Net thatenables you to anonymize data. So if you have production data, it enables you to take that data,that database, scrub it and create an anonymized version of it. So blanking out names orreplacing with something else, or email addresses, contact information, things of that nature.And they're talking about with the GDPR rules in place, this could be important. And he'senvisioning it for use cases. Like if you have a standing system and you want to take productiondata and put it in there, you can do that if you anonymize it. So you're not worried aboutproduction data being on a separate system. So if you're interested in a tools like this, definitelya post to check out.

[00:12:21] And the last piece of content is there was PG. Comp Brazil in 2019 and they've justuploaded a playlist of it looks like 22 videos to the Info Cube Brazil YouTube channel. Now as faras I could tell, all of these presentations are in postgres, so I was unable to understand them.But I do know that we have an international audience. So if this is of interest to you, definitely thevideos to check out.

[00:12:51] That does it. For this episode of Scaling Postgres, you can get links to all the contentmentioned in the show notes be sure to head over to Scalingposgres.com, where you can signup to receive weekly notifications of each episode. Or you could subscribe via YouTube oritunes. Thanks. --

episode_image