background

Postgres is Fine, Generated Columns, Postgres Inserts, Synchronous Replication | Scaling Postgres 122

Join Over 1,000 Engineers & Get New Episodes Weekly!

In this episode of Scaling Postgres, we discuss determining if Postgres is fine, generated columns vs. triggers, insert features and synchronous replication usage.

Content Discussed

YouTube Video

Podcast Audio

Transcript



[00:00:00] In this episode of Scaling Postgres, we talk about Postgres is Fine, generatedcolumns, postgres inserts and synchronous replication. I'm Kristen Jamison, and this is ScalingPostgres episode 122.

[00:00:20] You all right? I hope you, your family and co workers continue to do well in thesetimes. Our first piece of content is PG Friday. It's the end of the world as we know it andpostgres is fine. This is from Secondquadrant.com, and this was a presentation that was givenat the Chicago Postgres Users Group that was converted into a blog post. And it's all aboutmonitoring. He first covers why monitor and of course this is good for determining whysomething potentially broke or failed. However, it's also important to know potentially what couldbreak in the future. He then goes over to talk about some different areas you would want tomonitor, such as transactions, sessions, status of replication, disk space, et cetera. And hementions a few key areas that you definitely want to be monitoring to make sure that yourpostgres installation is running optimally. The first is a transaction ID wraparound to ensure thatessentially that does not happen on your tables. And he gives you two different queries, one atthe database level and at the table level to be able to see where the TXID is at. The next ismaking sure you don't have any prepared transactions that have been orphaned. Now again,this is not used very much, but if it's being used at all, you want to make sure that you don't haveany of these orphaned because that could definitely cause a problem. Also monitoring yournumber of idle transactions, again, you want to minimize these as much as possible. And alsothe question about do you have hot standby feedback on because that could also cause somewall to build up potentially. Then he talks about checking replication lag, looking at slots, andalso just checking PGSTAT replication in general, reviewing your temporary storage spacecheckpoint activity as well as your active transactions and your potentially your wall throughput.So if you are looking for a good set of areas to monitor for your postgres installation, definitelycheck out this blog post from Secondquader.com.

[00:02:21] The next post is a comparative approach between generated columns and triggers.This is from Angres.com and they're comparing the new feature generated columns in postgrestwelve to using a trigger method. So a generated column enables you to run an immutablefunction against data in the table to generate a new row, essentially. So in this table here thatthey've created, they have a radius named R and it calculates the area using a generatedcolumn using the function to determine the area based upon the radius. Now they check theperformance of this. They show the system that they used here. And here are the benchmarksthat they looked at. So using generated columns was the fastest approach. Creating a trigger inC got very close, as well as an immutable trigger in C got very close to the performance ofgenerated columns, but generated columns still came out on top. The slowest was the plpgsqltrigger and then the second slowest followed by the Immutable trigger. So in general, if you havean Immutable function, you would want to use generated columns. If you have use a trigger andneed better performance, writing it in C would give you better performance than using, say,plpgsql. Now this is for insert operations. The update operations weren't that different betweeneach of these methods, so that's pretty interesting. So if you want to learn more about thesegenerated comms versus trigger benchmarks, definitely check out this post from Ongress.com.

[00:03:52] The next post is a walkthrough PostgreSQL insert. This is from Crunchydata.com andit's talking about insert statements. The first thing it covers is that if you're needing to insertmultiple values, you definitely want to do this as a part of one statement. So you could see hereit has values and then comma between each row you want to insert. So if you have multiplevalues, this is a much faster way to do inserts. The next thing they cover is that if you happen tohave need of data after an insert, you can use the returning function. So for example, in thiscase you're returning the ID to know what the ID of the row you just inserted was. So that couldbe beneficial if you then need to do work with that record. And then the last thing to cover is thedifferen --t on conflict arrays you could use. So if you have a conflict during the insert, you could donothing. So just ignore that insert such as you're violating a primary key, it would do nothing inthat case. Or you could do an update, essentially an upset. It's going to try the insert. If that fails,it's going to do the update. So this is just a very quick post showing some different features ofthe insert statement in postgres.

[00:04:59] The next piece of content is actually a YouTube video called when to useSynchronous Replication in PostgreSQL. And this is from the Enterprise DB YouTube channel.And this is a follow on from the blog post that was mentioned in the previous episode of ScalingPostgres. But I think this goes into a little bit more depth of the different individuals in the videodiscussing when to use synchronous, replication, and also considerations where there are timeswhere you actually may want to have one setting. At the database level, but then for a particularsession or transaction, use a more, say, aggressive settings for the synchronous commitcommand so that you can ensure that that particular transaction or set of data gets committedon all replicas with a high degree of confidence. So if you're investigating using synchronousreplication, definitely check out this YouTube video for some additional insight.

[00:05:49] The next piece of content is features in PG 13 deduplication in Btree indexes. So thistalks about the new feature that dedups more entries in the B tree index in version 13 comparedto twelve. They talk a little bit about how it works. It doesn't dedup everything, but it significantlyincreases the amount of deduplication that is done. Now, in their example here, they insertedabout a million rows, but then to create the duplicates, they actually just updated those rowsthree times. So this isn't really a duplicate of the data itself, but they're creating multiple rowsthat I'm assuming haven't been vacuumed. So it's interesting they actually didn't create duplicateentries, so that may have given a slightly different result. But looking in postgres twelve, youcould see the index size is 86 megabytes, whereas in postgres 13 beta two, the index size is 62megabytes. So that's a pretty good space savings. Now, they also did a query to see, hey, is itany faster? And looking at version twelve, it returned a query from this table in 190 milliseconds,whereas in postgres 13 it returned in 174 milliseconds. So again, a little bit faster in terms ofperformance. So if you want to learn more about the performance implications and spacesavings coming with the deduplication of Beatri indexes in postgres 13, definitely check out thisblog post from Higo CA.

[00:07:17] The next post is random strings and integers that actually aren't. So this is a blog postthat covers how to create random alphanumeric strings of this nature. So things used for maybelike coupon codes or things of that nature. And what it does is it converts an integer into this typeof string repeatedly and it uses a feistal network. Now this is done using plpgsql functions. Sothis function takes an integer. So in this case, imagine this is a serial data type. So you're gettingessentially the integer primary key and it's converting that into a random string of numbers first.So that's the first step. And you can actually use the same function to get this other numberback. So once you then have this random number, you can use the string generation algorithmagain in plpgsql and it converts it into these alphanumeric strings. Now, as a final bonus, if youuse Python, he gave a Python implementation as well. So if you have a need to generate thesetypes of random strings for coupon codes or other use cases, maybe you want to check out thisblog post from Endpoint.com.

[00:08:30] The next post is getting started with PostgreSQL operator four three in OpenShift. Soif you use Kubernetes and want to use the PostgreSQL operator by crunchy data, check out thisblog post on how to install it and how to use it. With OpenShift, the next piece of content is asimple way to trace a bug. This is from Higo CA and this is talking about a bug that the writerdiscovered in postgres in the circle function. And it talks about how he actually went through thesource code using tracing to actually find where the bug is. So if you're interested in thistechnique, he used to be able to discover where this bug was, definitely check ou --t this blog post.

[00:09:13] The next piece of content is PG Timetable startup Improvements so this is someimprovements that CyberTech Postgresql.com has implemented for their tool called PGTimetable, which is a scheduling tool for PostgreSQL. So if you want to learn about the changesand improvements to that, definitely check out this blog post.

[00:09:33] And the last piece of content is the PostgreSQL Person of the Week is Kohei Kaigai.Forgive me if that pronunciation is incorrect, but if you want to learn more about Kohei and hiscontributions to PostgreSQL, definitely check out this blog post that does it. For this episode ofScaling Postgres, you can get links to all the content mentioned in the show notes be sure tohead over to Scalingposgres.com, where you sign up to receive weekly notifications of eachepisode, or you can subscribe via YouTube itunes thanks. --

episode_image