background

Replication Complications, One Word Performance, Pgbouncer Users, Serializable Isolation | Scaling Postgres 119

Join Over 1,000 Engineers & Get New Episodes Weekly!

In this episode of Scaling Postgres, we discuss replication complications, adding one word to boost performance, handling pgbouncer users and serializable isolation testing.

Content Discussed

YouTube Video

Podcast Audio

Transcript



[00:00:00] In this episode of Scaling Postgres, we talk about replication complications, oneword performance, PG bouncer users and serializable isolation. I'm Kristen Jameson. And this isscaling postgres episode 119 one.

[00:00:24] Alright, I hope you, your family and coworkers continue to do well in these times. Ourfirst piece of content is PG Friday ten Things Postgres could improve part two. And this is fromsecondquander.com. In part one they talked about XID or transaction ID wraparoundcomplications. In this one, they talk about replication complications and the first thing they cover,they talk about replication in general and how it's pretty much based off of the wall files becausethat's kind of what existed prior to they say version nine here is basically log shipping. You havethe right ahead log and you could transfer those files to another server and replay them. Soessentially that is a delayed form of replication. But streaming replication didn't happen toversion nine and it's still all based upon the wall files, the write ahead log that is generated byPostgres. Now, in terms of complications that they're mentioning, the first one they're talkingabout is synchronous replication. And he says if you have a primary node, you create a table,then you stop the synchronous standby. You begin a transaction, insert a value into it and thencommit it. It should hang because the synchronous is not available. But he says if you nowcancel that transaction and do a select from the table, you will actually see it there. So thetransaction, as he says, is alive and well. So basically your synchronous standby missed thiscommit and the reason is because it's all based upon wall files and that this commit has to getinto the file to know it that it's there. But it knows it can't go any further than that because thesynchronous standby is not available. And he says it also doesn't do any quorum in relation tothis so that won't help either. And by his assessment, the only safe way to use synchronousreplication is to deploy at least two such replicas and then only hope that one goes down at atime. So there is a risk of losing some data in this case of course, because if your synchronousstandby goes down and then your primary goes down, well then you've lost this data. Andbasically as a best practice, he said the easiest way to address some of these deficiencies is toalways maintain at least one additional active synchronous stream. So if you want to alwayshave one, we'll have a second one as a redundancy, or if you want to have two, we'll then add asa third as a redundancy. Now, in terms of this redundant connection, he actually recommendsusing something like a PG receive wall that just streams the wall information from the primarydatabase. So you don't have to keep a copy of the database on this. All it does is stream the wallfiles so you have a full set of the wall files and you can do this in synchronous mode so you don'thave to have a full copy of the database. You can have this running. Now, Second Quadrantdoes produce the product bar manning for doing backup and recovery scenarios does handlethis type of solution but you can go ahead and set this up for yourself if you'd like. So definitely acomplication to be aware of. Now, further down they start going into logical replication andlooking at it because in this section drinking from the fire hose he's basically saying when you'redoing physical streaming replication you're doing a binary replication of what's on the primary toa Replica. But if you want to, as he says, take a sip instead basically only replicate from a logicalperspective tables and data changes. You could do that using logical replication. But again,when you're using logical replication the problem you run into is that the LSN replay position isnot kept up to date on all the Replicas. So if you have replication slots set up on your primarydatabase and you need to fail over well, that slot information does not carry over to theprimaries. And he says it's been this case since 9.4. So you basically have to recreate those andthen get your logical replication up and running again. And he also mentions some issues thatyou can't actually rewind. Logical replication you can only move forward which could also causesome issues. Then another complication he mentions is that the Origin Postgres instance thatyou're replicating from is actually used to decode the wall information --via the local system catalog. So in other words, it's decoding this into something like a tablenamed FUBAR. So basically one thing he says to do to kind of try to avoid some of theseproblems is to keep a sufficiently large wall keep segments. Even though you're using replicationslots that is supposed to ensure you don't lose any wall it's important to still keep somesegments around to be able to handle issues such as these. Now, ideally to solve this hebelieves Postgres needs a mechanism for relaying replication slot information to standby nodes.Definitely true. And secondly, quote some way of adding extra wall content such as objectnames that exist in a decoded form within the wall file itself or as a supplementary transactionpinned, decode, mapping. So basically being able to not have to translate those names on theorigin or on the publisher but allow the subscriber to do it from the wall itself. Now, again, thisSecond Quadrant has a product they call PG Logical Three which is an extension and he says itcan be configured to regularly copy slot positions. But this is only available for commercialcustomers. So the community version of Postgres does not do this. But this has been a reviewof things you can run into when you're using replication both physical and logical in PostgreSQLand I definitely encourage you to check out this blog post.

[00:06:10] The next piece of content is how one word in PostgreSQL unlocked a nine xperformance improvement. This is from James Long@jlongster.com and he's describing anapplication he developed that consumes messages for doing syncing. And when one user triedto do 169,000 messages on one day, basically it caused a huge number of issues. So we lookedinto optimizing it so all of these messages get inserted into a postgres table that looks like thestructure here so relatively simple and he wants to avoid duplicates. So his insert is using A onconflict do nothing and he was basically inserting one row at a time based upon how manymessages there were. Now, the first thing he discovered is that you can do multiro inserts soyou can send one insert statement that inserts multiple rows. And this is definitely more efficientthan doing an insert at the time, particularly if there's network latency. But even without that, thisis much faster to insert a large number of rows than doing one row at a time. So that's the firstoptimization he did. But the concern was this is because he was doing on conflict do nothing,how would he know what was inserted and what's not inserted? So here this refers to as oneword he's using returning to return, essentially his primary key A timestamp. And what's greatabout this is that it only returns that timestamp on successful inserts which is exactly what hewanted. And because of this he basically got a ninefold performance improvement with thesetwo changes, which is pretty great. Now, he tested up to 40,000 in this post, but the 169,000 hewas running into other areas of his application that prevented that from working successfully. ButI imagine at this point you can just break down that work. So if you want to look through aninteresting story and how we worked through it and found optimizations he could use for hisapplication, definitely check out this blog post.

[00:08:17] The next piece of content is understanding User management in PG Bouncer. This isfrom Secondquader.com and they're talking about how you set up users in PG Bouncer and thetypical way you do it is there is a file called User List where you list out each user and theirpassword. Now this not only authorizes users to connect to the PG Bouncer, but then also sendthat information to the destination PostgreSQL instance to grant access to it. And essentially theAuth file is what it's called internally, but it's basically named UserList text. Typically you canrename it, he says, so this has been the most common way to set up. However, you can also doa query so you can query a PostgreSQL instance to get that list of users. And here the defaultvalue of Auth query in PG Bouncer is getting the username and password from the PG shadowdatabase and he describes how you could set this up to of course be more secure. He also talksabout you could use Pam in order to do it as well or potentially LDAP. He says another scenariois that you can set up forced users so in your actual connection to the PostgreSQL database,separate from the users connecting to PG Bouncer, y --ou can hard set a username and a password and they need review some things about whichto use. Generally the Auth file is definitely the more traditional setup, but if you have a lot ofusers that could be a little bit onerous and you may want to resort to the off query method forexample. So if you want to learn more about managing users in PG Bouncer, check out this blogpost.

[00:09:55] The next post is PostgreSQL 12.3 and this is from Jepson IO and they're doing an indepth analysis of transaction isolation in PostgreSQL. So this is quite a long post, but thesummary listed here is that when using serializable isolation under normal operation,transactions could occasionally exhibit G two item which is an anomaly involving set oftransaction which roughly speaking mutually fail to observe each other's rights. So basicallythere is a bug in PostgreSQL that they've discovered with regard to serializable isolation andthey mentioned that a patch for the bug we found scheduled for the minor release on August 13.So good news. And they also mentioned that the repeatable read isolation is actually snapshotisolation. So it's a little different than some other databases behavior and they advocate updatingthe documentation to reflect this. So if you want to check out this in depth analysis ofPostgreSQL, definitely check out this blog post.

[00:11:05] The next piece of content is Webinar understanding the PostgreSQL table, pagelayout, follow up and this is a webinar that was put on by second quarter. You can get access byclicking here to the webinar and it basically goes into the internals of Postgres. How databasesare laid out in the file system, how each page on the file system is laid out how data getsinserted and updated or deleted how it handles when rows are too long. How does the toastsystem work. So if you're looking to learn more about the internals and how Postgres lays outdata within the file system, definitely check out this blog post.

[00:11:50] The next piece of content is SQL Trekkery hypothetical aggregates. So basicallythey're asking the question of if you had this particular data, where could it go in a particularnumber series? So here they are doing a query that generated this series of aggregates andsplit it into two rows. And then ask the question if you had a number, say 3.5, where would it fallin the order to rank here? So if you look at this series of numbers, a 3.5 would fall after the two.So it would be in the second position or a 3.5 in this series of numbers would fall after the three,so that would actually be in the third position. So this seems to be definitely a unique use case,but it is using window functions to kind of hypothetically see where something would fall in arange. So if you're interested in that, check out this blog post.

[00:12:46] The next post also from Cybertechn. Postgresql.com is wrapping DB two withPostgreSQL. This is basically referring to the DB Two foreign data wrapper, so they show howyou get that set up to be able to query and get information from the DB Two database.

[00:13:04] The next piece of content is Oracle to PostgreSQL basic architecture. So for those ofyou who are migrating from Oracle to PostgreSQL, they're talking about the different terminologyused between Oracle and PostgreSQL. So for each of the different components that you wouldtypically talk about, they look at the Oracle term versus the PostgreSQL term in terms ofservices and what tasks need to be done, they show the Oracle term and then what the postgresterm is. And then finally, with regard to data and the different components, they show the twodifferent terms for each database system. So if you're going to be converting from Oracle toPostgreSQL, definitely a blog post to check out.

[00:13:45] The next piece of content is Release Notes for Citus 9.3, the extension that scales outpostgres horizontally. So basically this is a new release of the Cytus extension that allows you toscale out your PostgreSQL installations. Seems like the biggest feature improvement for thisversion is full support for window functions. They also did some improvements with ShardPruning Insert and select with Sequences, and then some support for reference tables on theCitus coordinator. I believe that's a node that coordinates the interactions of the cluster. So if youwant to learn more about the improvements for the new version of Citus, definitely check out thisblog post.And the last piece of content is the PostgreSQL Person of the Week is Thomas or TomasVondra. If you want to learn more about Thomas and his contributions and work in PostgreSQL,definitely check out this blog post.

[00:14:42] That does it. For this episode of Scaling Postgres, you can get links to all the contentmentioned in the Show Notes. Be sure to head over to Scalingposgres.com, where you can signup to receive weekly notifications of each episode, or you could subscribe via YouTube or itunes.Thanks, our. --

episode_image