background

Community Acquisition, Space Saving Terabytes, WAL Archiving, Vacuum Analyze Tips | Scaling Postgres 134

Join Over 1,000 Engineers & Get New Episodes Weekly!

In this episode of Scaling Postgres, we discuss a company acquisition within the Postgres community, how to save terabytes of space, setting up WAL archiving and vacuum/analyze tips.

Content Discussed

YouTube Video

Podcast Audio

Transcript



[00:00:00] In this episode of Scaling Postgres, we talk about community acquisition, spacesaving, terabytes wall archiving, and vacuum analyze tips. I'm Kristen Jameson, and this isScaling Postgres, episode 134.

[00:00:17] You all right? I hope you family, friends and coworkers continue to do well. Our firstpiece of content is EDB completes acquisition of Second Quadrant becomes largest dedicatedprovider of PostgreSQL products and solutions worldwide. So, this is an announcement from theEnterprisedb.com website, or EDB, and as you can tell, they have acquired Second Quadrant,and this post goes over some of the details as to why they did it. They give you some highlightsof the deal in terms of number of employees, customers, locations, things of that nature. There'salso an announcement on their blog how EDB became the leader in the postgres market andhas some discussions about why they decided to do this now. So I find this definitely noteworthyitem, even though it's not specifically about how to scale.

[00:01:15] You know, I believe this has impact probably both positively and negatively for thePostgres community. Like having a larger organization advocating for postgres is beneficial. But Ican't help but thinking that we lose a little bit of the community's diversity when something likethis happens. Either way, feel free to check out these two posts.

[00:01:35] The next piece of content is PostgreSQL, a community project. This is fromSecondquarter.com, and he's talking about how PostgreSQL is truly a community open sourceproject. And he says most of the open source governance models usually have three differentstructures. One, it's run by an individual, or maybe two, but not that many. The second one is it'sruled by one company, or third, it's actually community led. So he talks about each of thesemodels and how one can transfer into the other one and transfer back. But it's very hard toactually transfer to a community led because it's hard to go from one company to suddenlymultiple companies and individuals supporting it, or even if you have one or two individuals togrow a community where you have multiple organizations as well as people assisting on theproject. And talks about how Postgres is definitely this community led version and it has a lot ofbenefits. And he mentioned this because when Postgres 13 came out, he noticed a lot of peoplediscussing it as being a successful community led open source project. So definitely interestingperspective and kind of why I mentioned one of the potential negatives of having acquisitionsbetween EDB and Second Quadrant is maybe we lose some of that community diversity bydoing that. But definitely some interesting news items happening this week.

[00:02:54] The next piece of content is PostgreSQL at scale saving space basically for free. Thisis from Braintree Product Technology on Medium, and basically they're talking about all aboutordering columns to get a space savings. Now, he says with this technique, they've been able tosave about 10% of their disk space and they have over 100 terabytes of data. So essentially itlooks like they've saved maybe about ten terabytes of disk space. And as he mentions quote,this technique isn't revolutionary and it's been well documented by others such as SecondQuadrant, EDB, GitLab, et cetera. And a lot of it goes down to having padding for types when it'sneeding to preserve some alignment boundaries. And they give a good example of this becauseif you're going to have a big int, it's going to want to have an eight byte alignment. So if you havean integer followed by a big Int, it only wants to start this big int at the next 8th byte or eight byteincrements. So if you have a simple integer before that, it has to pad this space so it essentiallybecomes wasted space. Whereas if you order it with the big int first followed by the integer, asyou can tell, we've saved about a quarter of the space. So it's using these techniques as well asa few others ones that they mentioned here, such as handling text binary types in a certain way,handling defaults and not nulls and primary keys a little bit differently, where putting primary keysfirst, basically things that are likely to contain data put them more so ahead of time. So this ishow they've been able to achieve these space savings. And they actually packaged it up into aRuby gem because they use Ruby but it's called PG column Byte Packer. And for those pe --ople who use Rails as their application framework, what it actually does is it helps order yourmigrations that you do to the database, so that when you do a new one, it places things in theproper order to try and conserve space. And it says they even went one step further andbasically used PG dump to rewrite tables for their whole database to reorder their data in it.Now, normally I wouldn't anticipate wanting to do this for just the 10% savings, but when you'retalking about hundreds of terabytes and you're going to save tens of terabytes doing it, thenmaybe it makes sense to do that. But definitely an interesting way to save space in yourdatabase. So if you're interested in learning more, you can check out this post. The next piece ofcontent is PostgreSQL wall archiving and point in time recovery. This is from Higo CA and it's apost about exactly that, about how you have the write ahead log that logs all the databasesactivity to do crash recovery and maintain durability and consistency. And for backup purposes,you're going to want to archive those walls because they enable you to restore the database andrestore to a specific point in time. And they're talking about the different archiving options youcan set to be able to do that. One is the archive mode, whether generally setting on off, there'salso an always setting the archive command where you're going to save these wall files toessentially it's a command or a shell script you can put in there. And then how much of the wallto save do you want? Just a minimal level, enough sufficient for replica or to be able to do logicalreplication with that wall stream as well. And then they go through setting that up as well asdoing a backup and a point in time recovery. So if you want to learn more how to do that,definitely check out this post.

[00:06:15] The next piece of content is PostgreSQL, vacuum and analyze. Best practice tips.This is from secondquarter.com. They're talking about vacuum and analyze. Basicallyvacuuming up dead tuples after an update or delete, as well as keeping statistics up to date withAnalyze. The first tip is don't run manual vacuum or analyze without reason. Now basically theymentioned here just waste resources. They didn't mention any reason other than resource usageas to why you wouldn't want to run them too frequently. The second is fine tune the auto vacuumthreshold. So basically define how often you want auto vacuum kicked off for a particular tableand that the default values are good for smaller tables. But once your tables start getting larger,you are going to want to adjust these. Next, do the same thing for the autoanalyze threshold todetermine how often you're analyzing it for statistical purposes. Then he mentions fine tune theauto vacuum workers. So how many workers to have. And the thing to keep in mind with autovacuum workers is that they are in a pool. So by default it's three and you can increase it. Butthat's not going to actually make your auto vacuuming faster because there is a common limitset by the vacuum cost limit or the auto vacuum cost limit and that is the limit for all thoseworkers. So just increasing the number of workers won't make things go faster. Increasing thelimit is the best thing you can do to actually make vacuums run faster or more frequently,essentially. So if you're wanting to learn some more best practice tips on vacuum and analyze,definitely check out this post from Secondquader.com.

[00:07:51] The next piece of content is PostgreSQL FDW authentication changes in PostgreSQL13. This is from Percona.com and they're mentioning two changes that were done for thePostgres Farm Data wrapper. The first is the super user can permit the non super users toestablish a passwordless connection on Postgres Farm Data wrappers. Previously it was onlysuper users that can do it, but now the super user can specify it can work for non super users aswell. And secondly that you can do authentication via an SSL certificate and they even gothrough the process of how to generate a certificate and a key to be able to set this up to use inpostgres. So if you're interested in that, you can check out this post.

[00:08:35] The next post also from Bocona.com is PostgreSQL 13 new feature, Dropdb Force.So normally you have to have no connections to the database if you're going to drop a database.However, force has been added so that now even if you have existing connections, it will goahead --and drop that database. Definitely a high risk thing to do, but this new feature is now availablein Postgres 13.

[00:09:00] The next piece of content is a quick look at PostgreSQL 13 release Candidate Onequery performance. So he ran a bunch of queries that he had and he mentions this test set uphere and looking for performance differences between Twelve, Four and Release Candidate Oneof Postgres 13. And basically there were positives and negatives on balance. Nothing wasdramatically more speedy in the queries that he was testing and essentially his conclusion wasso in the end, some test items were a bit slower, others faster, and most importantly it seemslike there are no grave problems. So that's basically some good news. This is from Cybertechnpostgresql.com. Now. Similarly, there's another post called PostgreSQL 13 upgrade andperformance check on Ubuntu debian 1.6gb/second random reads. This is fromCrunchydata.com and he did an analysis using a bunch of different queries that he has to runagainst 13.

[00:09:59] It looks like he did not use the Release Canada but the actual release, and it lookslike he checked 13 and twelve as well. And he found one example where some things were alittle bit slower, but for the most part, all the queries he checked, again, it seemed to come to thesame conclusion. Some were a little faster, some a little slower, but overall everything lookedpretty good. So not any great improvements from either of these two posts, but basically therewere little to no regressions. So, good news to know. And if you want to learn more about thedetails of these posts, definitely check them out.

[00:10:32] The next piece of content is PostGIS versus Geocoder in Rails. So they're talkingabout Ruby and Rails, which is a Ruby application framework. And there is a Ruby gem calledGeocoder that helps you do things like geocode cities, states, countries into longitude andlatitude and then being able to calculate distances to those. And they were comparing it toactually using PostGIS because geocoder does not use PostGIS to do its calculations. So theyran through the process of actually installing PostGIS into their environment and they got anactive record PostGIS adapter gem. So this enables the Rails database layer to support usageof postgres specific data types, such as their point data types. So PostGIS tends to use pointsand things of that nature, whereas the Geocoder gem just uses floats for longitude and latitude.And then they talked about building some helper classes to be able to do some of thesecalculations and finding nearby records. And they compared both using Geocoder and PostGISand they got pretty much the same performance, although it seems like PostGIS will be moreaccurate because it actually takes into account the curvature of the earth, whereas Geocoderdoes not. But that should only matter if you're doing really long distance calculations, I wouldimagine. So the shorter distances between one another, it won't matter as much. Then theylooked at finding records within a bounding box with PostGIS and Jetocoder and looked at thecomparisons there. Then they looked at some specific things you actually require PostGIS for.So if you're looking for records inside of a specific polygon as opposed to just a bounding boxhere, that's not something that the Geocoder gem does. But you have to reach for PostGIS to dothat. They have an example of doing that type of query here, as well as finding related nearbyrecords. So if you use Ruby on Rails and you're interested in learning more about actually usingPostGIS as opposed to something like Geocoder, definitely check out this blog post.

[00:12:38] The next piece of content is PG Two. Four two features. This is from Higo CA, and it'sbasically covering all the list of features that are coming in the next major release of Pgpool Two,which is 4.2. So they talked about things like the Logging Collector, support for Logdisconnections, health check improvements, health check stats, different pool back end stats,the LDAP authentication support, the Snapshots Shot Isolation Mode, as well as others. So ifyou're interested in using PG Pool Two, definitely check out this post.

[00:13:11] The next post. Also related to PG Pool Two is how to configure Scram and MD Fiveauthentication in PG Pool Two. Now, this was from a previous post talking about authenticationin general in PG Pool Two, whereas this one focuses on how to set --up Scram and MD Five authentication with it. This is from Bping Blogspot.com, so check outthis post if you're interested in doing that. And the last piece of content is the PostgreSQLPerson of the Week. It's Thomas Monroe. So if you're interested in learning more about hiscontributions to postgres, definitely check out this blog post that does it. For this episode ofScaling Postgres, you can get links to all the content mentioned in the Show Notes. Be sure tohead over to Scalingpostgres.com, where you can sign up to receive weekly notifications of eachepisode, or you can subscribe via YouTube or itunes. Thanks. --

episode_image