background

Partitioning, Logical Replication Upgrade, Columnar Compression, HAProxy Connections | Scaling Postgres 88

Join Over 1,000 Engineers & Get New Episodes Weekly!

In this episode of Scaling Postgres, we discuss partitioning, logical replication upgrades, columnar compression and HAProxy connections.

Content Discussed

YouTube Video

Podcast Audio

Transcript



[00:00:00] In this episode of Scaling Postgres, we talk about partitioning, logical replication,upgrades, columnar compression and Ha proxy connections. I'm Kristen Jameson. This isscaling postgres episode 88.

[00:00:22] Alright, I hope you're having a great week. You the first piece of content is WebinarPostgreSQL partitioning follow up. And this is from secondquadrant.com. And this is a webinarabout partitioning with PostgreSQL, particularly talking about a number of the new features thatcame with twelve. Most of them I would say probably related to performance, but also someothers related to being able to support more foreign key capabilities and things for enhancingreferential integrity. And this was a great piece of content. Now it's a little long. The presentationor the webinar replay was over an hour and a half, I believe. There were some questions at theend, but it is filled with a lot of content. It explains partitioning, how you can set it up, things thatwork well and then some things that don't quite work yet. For example, being able to do areindex. That command doesn't apply to the partitions. You can run them individually onpartitions, but not on the partition table or concurrently. When you create an index that doesn'thappen concurrently on the partition tables, you would have to do that manually. Thispresentation just has a lot of great information. I definitely encourage you to check it out. Andyou can just click the link here to view the complete recording. You do have to fill out a contactform, but then you're able to watch it immediately. So I definitely encourage you to check it out.

[00:01:47] The next post is Upgrading Postgres major versions using Logical replication. Andthis is from Cyber Hyphen postgresql.com, and they had a previous post talking about differentways to do version upgrades with PostgreSQL. This one is using logical replication and the firstthing it discusses are the benefits in terms of minimal downtime required. Basically, once youhave a system set up and replicated, you could choose to switch over at any point that you wantit's flexible, so you can target just the data you want to replicate over. So essentially it's a logicalreplication, not a binary direct replication. It's safe in terms you can always go back to the priordata if you need to, and it also gives you a lot of opportunities to test and make sure that thingsare okay. Now, cons being quite a few steps to set it up. It's always per database. This way ofdoing an upgrade as opposed to Per Cluster could take a long time for big databases that arehighly active and large objects need to be exported and imported manually. So it goes throughthe process. Basically turning on logical replication on the old primary system, make sure all thetables have a primary key, or if not needing to set Replica identity. And it has a little query so youcan identify if there are any tables that match that criteria, set up the new database, and thenbasically you need to export the schema as well as the roles. So one way you can do it here isusing a PG dump all with the globals only and a PG dump for a specific database with schemaonly. So you need to get that base over there. And during the process of doing this type ofupgrade, you kind of want to keep your schema, ideally in a static state. Then you create apublication on the old DB, create a subscription on the target database, and they say warningjust make sure you're aware this creates a replication slot on the old DB. So again, if there's anyconnection that breaks down, this could start causing wall files to grow if the replication stops forany reason. But the primary database keeps running. So just keep that in mind. Then check thereplication process, make sure there's no errors. The replication slot is active and all tables areactively replicating on the subscriber side. And they mentioned here PG subscriptionrel SRsubstate should equal R for all tables, then go through your own verification process and thenthe switch over time. Now they have a particular case here where they're shutting down the oldinstance, starting the old DB in read only mode. I don't think this is necessarily a requirement.It's definitely the safer way to do things to make sure that you have all the data, because youcould choose to start using the new database at whatever point you want, but you'll have to workout that process of how you do the actual switch over. --Do you want to have downtime? How can you mitigate to make that as short as possible, butonce you're done, you just drop the subscription on the new DB. So this blog post walks over theprocess of using a logical replication to upgrade your database using this method. So if you wantto explore this possibility, definitely one to check out.

[00:05:09] The next post is Building columnar compression in a row oriented database. This isfrom a Timescale.com blog and basically they've made an enhancement where they've addednative compression to timescale DB, and their technique for doing it has resulted in a 90% to96% compression, which is pretty dramatic. Now postgres of course is a row oriented relationaldatabase system. It doesn't really store data in a column or method. So I was kind of interestedin what they did here. So even if you're not interested in Timescale DB, I particularly like lookingat the different technique that they mention here. So I'm going to scroll down to some of themore technical details. And basically you typically have this kind of row oriented data, atimestamp, a device, a status code, and then a temperature for this set of data. But what theyactually do is they clump rows together and then store them in an array format. So this is storingthe data by column. So all the timestamps are converted into essentially like a one row array andthen all the devices are and the status codes are and it makes reference to probably theplacement in the array to know kind of how that equates to the row. So this will always be in thefirst position of the array in this example. And by doing it this way, it says that they can actuallyuse specific compression for each data type to dramatically get high levels of compression. Sofor example, this is compressed using a particular method for timestamps or this for integers orthis for floats because these are all floats and also just reducing the number of rows reduces theoverhead per row of storing data. And they actually use some other techniques for like settingranges for mins and maxes and actually storing a lot of this data in a toasted compressed way.So for example, they talk about some of that here where you have the timestamp temperatureand they have like the min and the max timestamp. So if you're doing queries it can basically pullout this information much faster. And then they had some time performance benchmarks lookingat this and the compression did very well and even a lot of cases performed better thanuncompressed. So I found this blog post really interesting and fascinating. So we've coveredother blog posts that talk about using ZFS file system or other types of compressed file systemsto reduce the size of the database as well as in some cases boost performance. But this isanother very interesting method that's being used, actually converting the data into a morecolumn based format. So if you're interested in learning more about that, definitely a blog post tocheck out.

[00:08:07] The next post is PostgreSQL application connection failover using HAProxy with Xinitd. And this is a follow on from their simple application failover using libpq features. Andthey're talking about basically connecting through HAProxy that will then route you to the readwrite primary or to read only replicas. And that you can use the load balancing features ofHAProxy. And it uses it in conjunction with Xinetd, which is an extended Internet service daemonwhich can listen to requests on custom ports and respond to requests by executing customlogic. So in this case, they have this bash script that basically gets called to check the status of adatabase system. And with this query it checks the status of the currently running database andsays is it in recovery? If it's in recovery it returns a 206 status code, which means it's a standby,if it's a 200, it's classified as a primary, or if it's another value, then it returns a 503 serviceunavailable. And this actually gets run to check the status of the database so it makes sure itcan route it to the proper location. And they set up Ha proxy to use this xnetd. So if you'reinterested in using Ha proxy as a means to manage connections between their primary and thereplica and potentially do failover activities, definitely a blog post to check out.

[00:09:36] The next post is Virtual computed columns in PostgreSQL twelve. So again, this istalking about the new feature generated columns and this i --s another developer's take on it. And here he's primarily interested in using it for doing textsearch through JSON. So he's actually taking JSON and then extracting, say, the title andwanting to put it into a generated column field. So extracting the title from the JSON Blob that hereceived and putting it in this title and it goes a bit further and converts it to a TS vector. Itmentions some things you need to be aware of where you need to specify like English oranother language because you need to be cautious when using functions that the results areimmutable and this language can change. Therefore this function is not necessarily immutable, itcan be changed. And even using a function here such as concat doesn't really work when usinggenerated always you actually have to use, I'll call it the concat or the append symbol here twopipes to bring text together if you want to search. In his case, you want to search both on thename and the title. So, a relatively simple post talking about generated columns for a text searchuse case. Now, related to that, he has a second post called Fine Tuning Full text Search withPostgreSQL twelve and this goes into more in depth. He continues to use generated columnsbut he's actually going into more depth of doing a search. And he has a scenario here where hespecifies three different individuals or wanting to do different types of searches for like aconference that's going on. So he goes about talking about parsing out the data and how to usedifferent dictionaries to get different search results. He goes into applying weights for keywordsas well as also going into rankings. But this is a very comprehensive post that covers a lot ofdetail in terms of getting to learn more about full text search. What's also interesting is thediscussion in the comments where a few individuals are talking about the benefits of storing thedata using generated columns or maybe doing it dynamically with expression indexes inPostgreSQL. So if you're going to check out this post, definitely be sure to check out thecomment section as well. And I should mention, this is from the Rob Connery IO blog.

[00:12:03] The next post is Monitoring PostgreSQL databases using PMM and this is fromPercona.com and PMM is a Percona monitoring and management open source tool set. So itcomes as a docker image and then they go through the process of installing it and it looks like ituses Grafana and perhaps Prometheus to be able to monitor your PostgreSQL instance and itgoes through all the process of installing and setting it up. Different permissions required. So ifyou are looking for another monitoring tool for PostgreSQL, definitely another one to check out.

[00:12:41] The next post is benchmark partition table one. And this is from EmbeniaarningsBlogspot.com and basically with all the different partition features, they've actually altered PGBench and I believe this is for PostgreSQL version 13. So this is a patch where they're addingpartition capabilities where you can dynamically add the methods and different ones to that PGBench benchmarking tool. And this goes over some doing some tests and seeing some of theresults. So if you're interested in using PG Bench in the upcoming version 13, definitely a blogpost to check out. The next post is postgres case statement basics. By example, this is from thePinopoly blog and it's a very simple post that covers the case statement which is basically aconditional statement within PostgreSQL and the SQL standard in general. But if you want arefresher for that, definitely a blog post to check out. And lastly, how to query with PostgreSQLWildcards like a pro. And again, this is another simple post talking about different ways that youcan query using the Wildcard operator. The percent symbol as well as the underscore. Thepercent is any character and this is usually one character as well as covering things like like andI like as well as using lower function to be able to search the database ignoring case. So if you'reinterested in that type of content, definitely a blog post to check out.

[00:14:11] That does it. For this episode of Scaling Postgres, you could get links to all thecontent mentioned in the show notes. Be sure to head over to Scalingpostgres.com where youcan sign up to receive weekly notifications of each episode. Or you could subscribe via YouTubeor itunes. Thanks. --

episode_image