background

Replication Conflicts, Postgres Governance, pg_crash, MobilityDB | Scaling Postgres 140

Join Over 1,000 Engineers & Get New Episodes Weekly!

In this episode of Scaling Postgres, we discuss replication conflicts, how Postgres is governed, a pg_crash utility and tracking temporal spatial data with Mobility.

Content Discussed

YouTube Video

Podcast Audio

Transcript



[00:00:00] In this episode of Scaling Postgres, we talk about replication conflicts, postgresgovernance, PG crash and mobility. DB I'm Creston Jamison and this is Scaling Postgresepisode 140.

[00:00:21] All right, I hope you, your friends, family and co workers continue to do well. Our firstpiece of content is that new versions of PostgreSQL have been released from 9.5.24 to 13.1.Now, it seems like the main reasons that these updates were published were due to threesecurity issues that were mentioned at the top of the newspaper here. But of course, it alsoincludes a number of bug fixes and improvements that are as well listed right here along with thelinks to the release notes. So it looks like due to the security issues you should probably upgradesooner as opposed to later.

[00:00:58] The next piece of content is dealing with streaming replication conflicts inPostgreSQL. This is from CyberTech Postgresql.com and they're talking about replicationconflicts. Some conflicts that can occur when you have replication set up is snapshot replicationconflicts. So example that they mentioned here is that you have a vacuum process on theprimary and it removes dead tuples and that action is replayed on the standby. And now anyqueries will have to be canceled. There's also lock replication conflicts. For example, the primarytakes an access exclusive that interferes with other queries that are ongoing on the standby.Therefore they have to be canceled for that to take place. Then they also mention buffer pinreplication conflicts. So those are the three most common that would cause query cancellationson a standby. But they also mention a few other rare ones such as deadlock replication conflicts,table space as well as database. Now they do mention there is a view on the standby where youcan look at these database conflicts and it's called PG Stat database conflicts that you can queryso you can determine what kind of database conflicts are happening for your standby. Now, interms of controlling these query cancellations, one parameter you can adjust is the max standbystreaming delay. This is how far the standby is allowed to be out of sync from the primary interms of the number of milliseconds. And they show an example that a query gets canceled andit gives some details that the user query might have needed to see row versions that must beremoved and that's example of a snapshot replication conflict. Now, this max standby streamingdelay has a default value of 30 seconds. So the standby has 30 seconds to apply those changesfrom the primary. And by adjusting this value you can determine whether you want the standbyto keep in close proximity to where the primary is or to have it be a little bit further out so it canhandle more longer queries. And they talk about some of these use cases such as highavailability. So that means you have a standby that is running very close to the primary and inthat case you want to keep the streaming delay low. Another scenario is you just want to offloadbig queries that maybe take a long time to run. Well then probably you want to have a highermax standby streaming delay for that Replica or that standby. Another scenario they mentionedis horizontally scaling. This is where you have one primary that does read write and you havemultiple standbys that you can do read only queries against. Now, unfortunately, they mentionedwith this particular settings there are really no good settings and if you're wanting to do this,perhaps you want to look in using synchronous replication with remote apply set on yourstandby. So that way you'll be confident as soon as data is inserted in the primary it is reflectedin all of the Replicas. Now, the downside of that is it's a performance set because it actuallywaits for everything to be in synchrony across multiple databases before it returns a validconfirmation to the client. So doing something like synchronous commits with a remote applydefinitely hits your performance but it definitely assures that the data is in multiple places at onetime. Now, they don't cover that too much in here, but that is an option. Now, they say one wayto avoid some of these replication conflicts is by disabling the hot standby. So that means youdon't have any queries hitting the database. Essentially. Another way is avoiding lock conflicts.So basically avoid some of these operations on the primary when you don't want any queries tobe canceled on th --e standby. They also mention for completeness is that Vacuum will also take an Accessexclusive lock to see if there's any pages that can be given back to the operating system. Andthat you may want to set Vacuum truncate off for tables because that avoids this truncation stepwhich is another Access exclusive lock that needs to be set, albeit temporarily. So that'ssomething you may want to consider doing. And some things you can do to avoid snapshotconflicts is actually turn on hot standby feedback. Although the disadvantage of course is thatthis can cause bloat on the primary because you're essentially having the standby communicateto the primary hey, don't get rid of this. Yep, because I'm still using it and I need it for a query. It'salso possible to increase the Vacuum deferred cleanup h but again, this could also lead to tableBloats as there's some disadvantages with setting these two and then they have a conclusionbasically is that some of the best ways to avoid these replication conflicts is to have dedicatedstandby servers. One for high availability purposes and one for offloading queries and orbackups. So if you're interested in learning more about some of these features and settings thatyou can make, go ahead and check out this post from CyberTech Postgresql.com.

[00:05:37] The next piece of content is is it time to modernize the processes, structure andgovernance of the PostgreSQL core team. This is from PostgreSQL Fund and in light of the newmembers that joined the core team and as a reflection of EDB acquiring second quadrant, thisarticle goes into more of a discussion about governance of PostgreSQL as a whole. Now, itmentioned some things that I found beneficial and part of the reasons that they're wanting topotentially examine this is looking on how postgres being governance from a company influenceperspective, from a diversity perspective, from a democracy perspective, and a transparencyperspective. So I definitely felt the article is a good read and I think examining how thecommunity is structured could always be beneficial and how can we make improvements. So Idefinitely encourage you to read this blog post and think about how maybe the PostgreSQLcommunity could be improved from a governance perspective.

[00:06:36] The next piece of content is PG Crash crashing PostgreSQL automatically. This isfrom CyberTech Postgresql.com and is talking about an extension that purposefully crashes thedatabase. So you load it into your shared preload libraries and you emit signals to thebackground worker to set a crash with a particular delay. Now of course they advocate this forlike QA environments, but it's basically a way to test out your, say, high availability infrastructureto make sure that you can handle downtime gracefully. So, definitely interesting tool I had notheard of. So if you want to learn more, definitely check out this blog post.

[00:07:14] The next piece of content is analyzing GPS trajectories at scale with Postgres,Mobilitydb and Citus. So this is from Citrusdata.com and they're talking about Mobilitydb, whichis a software tool that they say supports temporal and spatial temporal objects and works withthe Postgres database and its spatial extension Postgres. So it kind of works with Postgres andPostGIS and essentially what it does is it not only judges distances from one point to another,but also in a point in time making those calculations. Now, they made a point that you can just,with pure PostGIS, answer the same types of questions. So for example, a bus is traveling thisroute. On what point will it see, say, this billboard or this billboard and during what period of timeand for how long? But doing it in that it has a very complex query to get that answer. With theMobilitydb tool added to it, the queries become much easier to work with and you can getsomething like this so much simpler than the string of CTEs that were presented above. So thisarticle goes through the use cases and how they're using it to track the spatial temporal visibilityusing Mobilitydb. So if you're interested in that, definitely check out this blog post.

[00:08:37] The next piece of content is Postgres Constraints. This is from Mohit Care and this isa very short post, talking about the different constraints from a primary key constraint, foreignkey constraint, check constraints, unique constraints, not null constraints, and exclusionconstraints. And for each one he gives an --example with code about how it is and what it does. So if you're interested in that, definitelycheck out this blog post.

[00:09:01] The next piece of content is actually the Percona YouTube channel. Again, theycontinue to be posting videos to their YouTube channel with regard to Percona Live Online theyhave a number of PostgreSQL postgres you may want to check out such as Wall Commit andSynchronization optimization Opportunities, PG, Stat Monitor as well as others. So if you'reinterested in more video content you can check out this link.

[00:09:25] The next piece of content is another YouTube video called Transaction IsolationLevels with PostgreSQL as example. So it goes through the different transaction isolation levelsusing Postgres as an example. So this is a very short video at only eight minutes, so if you wantto learn more about that you can check out this video.

[00:09:42] The next piece of content is Postgres streaming replication on Windows. A quickguide. So most of the examples are done in some form of Linux, but this one talks about how todo it on Windows. So if you want to learn how to do that, you can check out this post fromCrunchydata.com.

[00:09:59] Next piece of content is a webinar from Secondquarter.com. It's called commit withoutfear. The beauty of camo. So Camo refers to commit at most once. So this is talking aboutPostgres synchronous commit, how it works and working with it and compares and contrasts itto second quadrant's BDR product or their bi directional replication, basically their multimasterproduct that implements this commit at most once. So if you're interested in learning more aboutthis product, you can definitely check out this webinar by clicking the link here.

[00:10:34] The next piece of content is the PostgreSQL person of the Week is Carl Mopple. So ifyou're interested in learning more about Carl and his contributions to Postgres, definitely checkout this blog post. And Last piece of content is a link to the PostgreSQL weekly news posted onthe Postgresql.org website. So in addition to listing some product news, this is definitely theplace to get all the different patches that have happened with Postgres over the past week. So ifyou're interested in that, definitely check out this blog post that does it. For this episode ofScaling Postgres, you can get links to all the content mentioned in the show Notes. Be sure tohead over to Scalingposgres.com where you can sign up to receive weekly notification of eachepisode. Or you could subscribe via YouTube itunes. Thanks. --

episode_image