background

Active-Active & Federated, Memory Speed, Table Rewrite Causes, PostgreSQL Shibboleth | Scaling Postgres 146

Join Over 1,000 Engineers & Get New Episodes Weekly!

In this episode of Scaling Postgres, we discuss setting up active-active instances with federation, the impact of memory speed, the causes of table rewrites and the Postgres shibboleth.

Content Discussed

YouTube Video

Podcast Audio

Transcript



[00:00:00] In this episode of Scaling Postgres, we talk about active, active inferiorated memoryspeed table rewrite causes and postgres. QL Shibolith I'm Kristen Jameson and this is ScalingPostgres episode 146 one alright, like, I hope you had a great holiday and hope you're having agreat new year's. And as always, I hope that your friends, coworkers and family continue to dowell. Our first piece of content is Active active PostgreSQL federation on Kubernetes. This isfrom Crunchydata.com and they're talking about a process of essentially creating a set offederated PostgreSQL instances where all of them are essentially primaries or masters. Now, forthis post, you can actually ignore the Kubernetes part because some of the interesting partsabout it is how you're maintaining these active ones and they're using a combination of logicalreplication and partitioning. So what they did is they set up three different PostgreSQL clustersor three different instances. Now, their example here uses Kubernetes, but you don't have to useit that way. But basically you have three different instances. They called one Hippo East, oneHippo Central and One Hippo West. Now, I'm going to skip ahead to the partitioning schemebecause that helps you understand a little bit about what this post is talking about. So theycreated a Hippos table, they're using a Uuid for every ID of it. And they're doing this to avoidusing sequences because sequences don't get carried over with logical replication. And if you'redoing this type of federation, it's easier to just use a Uuid for your IDs in this case. And then theyset up a default partition, they just called Hippo Default and then a Hippo East Central and west.So for each geographical server, essentially they set up a separate partition for it. So eachinstance only inserts data into its own partition. So east will only insert into the east partition. Thecentral server will only insert data into the central table and then the west will only insert into thewest table. Now, they did some additional steps here where they set up a way to identify eachcluster and they actually insert the data into the default partition and then create a trigger thatthen moves it into the proper partition based upon what the node name was. But if you have away to insert the data directly into the table, like your application knows, it's only inserting datainto say, Hippo East, it's always going to be setting Hippo East. I don't think you need to dothese additional steps that they've laid out here, but basically each instance inserts its data intoits own partition. And then what you do is you set up the sharing where you set up a publicationon each of the nodes. So for example, you create a publication in Hippo East for the Hippo Eastpartition. You set up a publication in Hippocentral for the Hippocentral table, and then you haveevery other node subscribe to every other publication. So Hippocentral needs to subscribe toeast and west. Hippo west needs to subscribe to east and Central, and of course Hippo Eastsubscribes to Central and west. And once you get this all working, all the data will besynchronized across all the different nodes and they're placing it into the proper locations ofeach node's partition table. He's showing this example here where he queried east and yet youcan see all the data from the different nodes because he's querying it from the parent table andyou can see that it is properly placed in all of the different partitions. And if you want to set upanother node, you just set up another node. Define a new node name for it, add a new partitionfor that node to the partition table scheme that exists throughout all the nodes, set up apublication for it on the primary node for that partition to be distributed, and then subscribe itfrom the other node. And they describe how to do that. Here they do discuss some otheralternatives. So for example, you could use the sequence instead of a Uuit by defining that it willbe generated, say, on one partition, starting with one, incrementing by three for the next table,started at two, increment by three. Next table, three, start by increment by three. But that's goingto be a problem trying to add new nodes to it. That becomes very complicated very quickly. Theonly downside they're mentioning to the UUID is that you may run into a collision. So you'regoing to have to plan for that or figure out how to address that. But this was a --very interesting post on how you could set up a Federated PostgreSQL with multiple masternodes. So if you're interested in that, definitely check out this post from Crunchydata.com.

[00:04:46] The next piece of content is PostgreSQL 13 benchmark memory speed versustransactions per second. This is from Crunchydata.com and they're examining increasing thememory speeds of a particular instance from 2133. So basically asking the question, doesmemory speed matter in terms of transactional throughput? And looking through this scaling upwith a different number of clients, you can see the more clients you have, the better it gets. Buton average from his analysis, it looks like about a 3% improvement for memory speeds. So not ahuge improvement, but it can give you a little bit. So if you're interested in this analysis, goahead and check out this post.

[00:05:31] The next piece of content is what postgres SQL causes a table rewrite. This is fromthatguidefromdelly.com and he has presented this great table here that shows you whatoperations require a table rewrite. So essentially you're going to be doubling the size of the tableif you execute these commands on the table. So for example, if you alter the table and set it to anew table space, it's going to rewrite that whole table. So essentially you're going to use up twicethe disk space of that particular table, or if you're going to alter the table and set it to be loggedor unlogged, both those operations require a full table rewrite. And he's also listed this byversions from 9.5 to 13. And as you can see here, the great improvement that was made inversion eleven is defaults not having to require a table rewrite. So that was a great, greatenhancement. So as of right now, I mentioned the table space, I mentioned setting, logged,unlogged. Of course when you cluster that's reordering the data, that rewrites the whole table aswell as truncating the table. But the thing about this, it really doesn't use up double the diskspace. It's basically kind of what, as he says, rewrites the rel file node. And he also lists thevacuum full and then of course altering column types. So when you're going to alter a columntype and presumably go from an int to a big int, that does require table rewrite or altering thecolumn of type of text does require table rewrite. So this was a great table and blog post talkingabout SQL that can cause a table rewrite.

[00:07:06] The next piece of content is the shibolith of PostgreSQL. This is from CyberTechPostgresql.com and he's talking about shibolith, which as Wikipedia defines it as any custom ortradition, usually a choice of phrasing or even a single word that distinguishes one group ofpeople from another. And they're basically talking about the postgres community and really howto write and pronounce PostgreSQL. And the way to do it you can do just do postgres or theother way to do it is PostgreSQL. So those are the accepted ways to pronounce it. Ways youshould not pronounce it are postgres or postgres SQL or postgres SQL. So if you're interested inlearning more about that, you can check out this post.

[00:07:54] The next piece of content is a YouTube channel. It is the PG day, San Francisco. Sofor the last month or so they've been posting videos and they've got five on the site now,presumably more maybe coming because they seem to be posting one about every week. And ithas content such as explaining, explain, advanced data types, vacuum through pictures, etcetera. So if you're interested in this type of video content, you can check out this channel onYouTube.

[00:08:22] The next piece of content is checkpoints in PostgreSQL and they're talking aboutwhat a checkpoint is. This is from Hygo CA and basically a checkpoint is when data is changedin postgres. It's written to the wall and on a periodic basis a checkpoint is done that takes all thedifferent changes that have happened in the wall or that still reside in memory and it flushesthose changes to the actual database file. So it's making a consistent checkpoint that the dataon the disk essentially matches what's in memory. And he says what a checkpoint does is that awall record of this checkpoint is recorded in a log. All dirty pages of data are written to disk, thecheckpoint is written to the PG Control file, and then it synchronizes all the data files and hesays what triggers it starting is the max wall size. So if it reaches that limit of the maximum amo--unt of wall you want to maintain if it runs into the checkpoint timeout, basically a time expiresand needs to do a checkpoint. When a backup is started from PG based backup or PG Startbackup is issued, or if there is a database server shutdown or an actual checkpoint command isexecuted by an administrator and then it goes over some of the ways that checkpointing can becontrolled through defining the Max wall size, the checkpoint timeout that were mentioned uphere, as well as checkpoint completion target how long the checkpointing process happens, aswell as a checkpoint warning to let you know if checkpoints are happening, say, too frequently.So if you're interested in learning more about checkpoints, check out this post from Higo CA.

[00:09:54] The next piece of content is how to install and configure PostgreSQL debian Ubuntufor developer use part One and Part two. Part one covers general installation. Now, of course,you could just download the package manager from the distribution and install it, but this postwalks through actually using the Postgres repository, that gives you access to all the differentPostgres versions that exist for doing. Your installation, as well as discusses how to get it set upand how to connect to it using psql and the different configuration that you might want to do tobe able to connect as a user. The second post covers a lot about configuring the system, suchas altering log settings, defining a wall level, enabling archive mode as well as an archivecommand, and looking at PG Stat statements and Auto Explain to understand what your queriesare doing. Now again, this is for development servers, not necessarily production servers, but ifyou're interested in that, you can check out these posts from Dep.com. Also from the same site,three posts were done on what's coming in version 14 of Postgres. The first is PG StatStatements, where you can track time at which all statistics were last reset. When you do a resetof PG Stat statements, it wasn't recorded where that was happening, but in 14 now you can getthat information from the PG Stats Statements Info view, where you can look at the Stats Resetcolumn to know when it was reset. So that's a great addition. The next post is Support formultirange data Types. In addition to supporting ranges within a data type, you can now domultiple ranges that are separated by commas using a curly brace. So then that's an interestingimplementation. Now they did mention it doesn't work for Indexes yet, but it's still a ways off for14, so that may get implemented before version 14 goes live. And the last post is allowsubscripting of H store values. So in addition to using syntax with the Hyphen and the greaterthan symbol to be able to pull out a value, you can now use this subscripting where you usesquare brackets. Now, he said this doesn't actually work with indexing, but you can use it to pullout values. So if you're interested in learning more about that, you can check out this post.

[00:12:12] Next piece of content is Postgres PL Python and SciPy NumPy for processingimages. So this post from Crunchy Data is about using a Django app that uses some of thesepython tools in order to process images along with postgres. So if you're interested in that, youcan check out this post.

[00:12:31] The next set of blog posts are all from Devram Gundaz's PostgreSQL Blog, andthey're talking about the Rpm packages that are used on Red Hat, Fedora and CentOS. So thefirst post is what is new in PostgreSQL 13 RPMs, and he covers all the different changes thathave been made for the new ones. The second post covers LLVM issues with PostgreSQL Yumrepository on CentOS Eight, basically ways to address that. The third post is Rpm repository forunsupported PostgreSQL releases and distros and basically how to get access to those.

[00:13:08] And fourth is installing postgres 3.1 and PostgreSQL 13 on CentOS Eight. So if you'reinterested in any of these posts about running postgres on CentOS or Red Hat, Enterprise, Linuxor Fedora, definitely check out these.

[00:13:23] The next piece of content is Loading data into PostGIS an overview. This is fromCrunchydata.com and it covers numerous ways that you can import data. Namely they sayvector data into PostGIS and of course Postgres. And it covers using the command line toolssuch as SHP Two, PGSQL or OGR to OGR as well as covering using Raw, SQL as well assome GUI options such as QGIS. So if you'r --e interested in that you can check out this post from Crunchydata.com.

[00:13:56] Next piece of content is query load balancing in PG pool Two. So this discusses howto do query load balancing in the load balancer PG pool Two. So if you're interested in that youcan check out this post.

[00:14:10] And the last piece of content is Timeouts in PG pool two connections. So this is fromPgsqlpgpool Blogspot.com and describes how you can set up these timeouts.

[00:14:22] That does it. For this episode of Scaling Postgres, you can get links to all the contentmentioned in the Show Notes. Be sure to head over to Scalingpostgres.com where you can signup to receive weekly notifications of each episode, or you can subscribe via YouTube. Rightunes,thanks. --

episode_image