background

Faster Partitions, Count Speed, Extensions, Education | Scaling Postgres 82

Join Over 1,000 Engineers & Get New Episodes Weekly!

In this episode of Scaling Postgres, we discuss faster partition performance, count speed, extensions and Postgres education.

Content Discussed

YouTube Video

Podcast Audio

Transcript



[00:00:00] In this episode of Scaling Postgres, we talk about faster partitions, count speed,extensions and education. I'm Kristen Jameson. And this is scaling postgres episode 82.

[00:00:20] All right, I hope everyone's having a great week. Our first piece of content is postgresSQL. Twelve partitioning is now faster. This is from the second quadrant.com blog. The firstthing they talk about is copy performance, where they say that bulk loading of data into apartition table using copy is now able to make use of bulk inserts. Previously, only one row wasinserted at a time. And they show a scale here going from two to 8192 partitions. And theylooked at version eleven and twelve and the number of seconds to do a copy from 100 millionrows. And as you can see, twelve was noticeably faster. It starts getting into diminishing returnsas you go higher and they're talking about the reason for the slowdown is due to how copy codemakes up to 1000 slots for each tuple per partition. So there may be some use cases where youmay not see some of this effect, but definitely much improvement with partitions in general, notjust with larger numbers of partitions. Next thing you talked about is insert performance and theysaid in version eleven every partition was locked when you were doing an insert, no matter if itreceived a new record or not. But with version twelve, quote we now lock a partition just beforethe first time it receives a row. So if they're just inserting one row, then only one partition islocked. And as you can see here, as the number of partitions goes from again two to over 8000,the transactions per second, it's pretty good, up to about 32 starts getting the difference. Andyou can see a huge difference that happens as you add more and more partitions. In versiontwelve, you can see there's no penalty as you're increasing the number of partitions that exist fora parent table. So that's a huge improvement. If you have a lot of partitions in your partitiontables, then they talk about a select performance. Now they mentioned here, of course, thatPostgreSQL eleven added a partition pruning and with version twelve they just made it a wholelot better. And as you can see what a whole lot better means. Again, the TPS, the transactionsper second is still pretty consistent, although you start getting a little bit of a drop down in the4000 to 8000 partitions level, but it knocks the socks off version eleven. So version twelve ismuch improved in this area. Then they go over some other partitioning performanceimprovements in terms of ordered partition scans, getting rid of single subpland append andmerge pin to nodes, and some various improvements to the runtime partition Pruning. And thereare some particular use cases that again, twelve is improved. But as you can see, as you getlarger partition sizes, there still can be some issues that of course they'll be working on. Buteven with this drop off, they said if they change the where clause slightly, swapping out thestable function, call for a constant, the planner is able to take care of Pruning, so there's stillsome use cases. Again, that again your mileage may vary, but definitely a lot of greatimprovements with partitioning. So it seems like Postgres just keeps getting better and betterwith regard to this.

[00:03:20] The next post is what's faster, count asterisks or Star or Count One. And this postbasically says there shouldn't be too much of a difference in it and you don't see much of adifference with MySQL Oracle, some other databases, but with PostgreSQL you actually see orhe saw a little bit of a difference and they know what is the difference. Count asterisks orAccount Star counts all Tuples in a group account by an expression counts all Tuples in a groupfor which the expression evaluates to something that is not null. So sometimes this comes inuseful when doing accounts of outer joins and also when you're doing things like countinggroups in a subset when using a case or if you're using Filter in PostgreSQL. Now. He also talksabout looking at various different database systems. And pretty much for MySQL Oracle SQLServer doesn't really matter. But it does matter for PostgreSQL. And that Count asterisks orCount Star was consistently faster by around 10% on 1 million rows. So more than he hadanticipated. And someone actually in the comments mentioned that they saw something similartoo. So it's definitely something to be aware of. B --ut he does note here quote do note that the benchmark only tried a very simple query. So asyou get more sophisticated in your counting with different types of queries, you may not see thisbehavior, or maybe when using having or order by, there's different things that can affect it, butit's something interesting to observe and remember and see if you see this kind of a difference inyour queries. And this is from a J Oq.org blog.

[00:04:57] Next post is PostgreSQL Extensions a Deeper Look. And this is from a presentationthat's been embedded here on the LinkedIn SlideShare called PostgreSQL Extensions a DeeperDive. Now, this is specifically about Amazon RDS, but I did still find the content relevant foranyone using PostgreSQL. But it talks about what are extensions, what are some of thecommon extensions that people use. Again. PG Stat Statements is a huge one. PostGIS fordoing spatial support, postgres foreign data wrapper, procedural language and JavaScript. Andthen UUID, he shows different commands to be able to look at the supported PostgreSQLextensions. Again, this is on RDS, how you would go about creating them, listing the used onesand how to upgrade or remove them. And then he also goes into different plugins and how todevelop extensions as well. So if you're interested in extensions and getting a little bit moreinformation with regard to them and development. Even though this is again about RDS becauseit's an AWS presentation, still some relevant content in this presentation so I suggest you tocheck it out. And this is at Jignesh Shah's Blog.

[00:06:09] The next post is PostgreSQL top Learning and Training Resources. So this isbasically a list of where you can get more information about learning PostgreSQL and of coursethe number one resource of course is the PostgreSQL Manual. So it's the online manual that'savailable@postgresql.org. Other thing mentioned is distribution lists. So email list or distributionlist, you can get on these to get communication and communicate with others in the community.

[00:06:36] It talks about partners and software tools, official PostgreSQL webinars, a list ofPostgreSQL events, all the different PostgreSQL events taking place, local and regional usergroups, the PostgreSQL Planet which is a list of different blog posts. So a bunch of differentresources to learn more about PostgreSQL. Continuing with the theme of PostgreSQLeducation, Postgres Open just posted 30 to 40 different presentations that occurred at thatconference. So if you're looking for more PostgreSQL education material, definitely check outthe Postgres Open YouTube channel to look for more content.

[00:07:16] Again, continuing with the theme of education, there is a website that has come upcalled the Postgrescheaat.com and it is a cheat sheet for accomplishing various differentfunctions in PostgreSQL. So for example, I'm on the tables, it tells you how to show a tablebackslash d the table name showing more details with it get the list of tables from the currentschema. You can look at users and it tells you how to set or change a user's password. Sothere's all sorts of just quick little cheat sheet of different commands you can use forPostgreSQL and this was built from the team at TimescaleDB so definitely an interestingresource to check out and keep on hand in case you need a way to look up something quickly.

[00:08:02] The next post. Another educational resource is postgres execution plans. Fieldglossary. Now this is defining each of the different things when you're looking at education plansand it is truly a glossary. It tells you a node type and then what it is tells you the different parentrelationships, what a filter is, parallel aware, relation, name, alias, estimate fields in terms of totalcost, startup cost, plan roles plan with actual value fields in terms of actual loop, actual totaltime. So this is a good resource in case you're trying to look, explain, analyze a query and youwant a little bit more information about what you're seeing in the data returned. It would be greatif this had some maybe small examples that could highlight it. But this was intended to be aglossary and that's exactly what it is. So if you're interested in this, definitely a blog post to checkout. This is from the Pgmuster blog on Medium next post. Again, related to education is The Artof PostgreSQL the Transcript part one. And this is a text version of the presentation that wasgiven by --Dimitri Fontaine at Postgres Open. Again, one of the YouTube videos that was uploaded in thePostgres Open channel. And this is a text version of his presentation. Now this presentationactually seems to cover more why PostgreSQL? In other words, why would you usePostgreSQL? And it's basically talking about its relational database roots and how PostgreSQLfits into that. So if you're interested in this content, definitely a blog post to check out.

[00:09:38] The next post is SQL and now GQL. So this is talking about making GQL an officialstandard alongside SQL and the subtitle says here a standard query language for propertygraphs. So this is a little bit outside of PostgreSQL's bailewick. But again, talking on the topic ofconversation, this is pretty interesting and of course it brings up questions in my mind. Are theredifferent types of support that will come into PostgreSQL with this potentially new definition of alanguage at GQL? So if you want to get more information about this graph query language andthis is a great post to check out, that is on Pulse on LinkedIn.com.

[00:10:23] The next post is Manage Linux Control groups in PostgreSQL with PGC groups. Sothis is in reference to a previous blog post on CyberTech Postgresql.com that they did talkingabout C groups. And they actually have a PostgreSQL plugin called PGC Groups for making themanagement of them easier. Now, the first thing to talk about is what is a PGC Groups goodfor? Basically the C groups kind of put a constraint around the processes that are running on asystem like a Linux system. So if you have one PostgreSQL cluster running on a system andyou want it to have full resources of that machine, you wouldn't want to use C Groups. But if youhad multiple clusters that were running and you wanted to restrict their access to certainresources on the machine, then maybe C Groups is something you would want to potentiallyuse. And PGC Groups, as they talk about here, is something that makes it easier. So they've setup a project page for it and they go through to the steps to how to set it up. And then once youset it up, you can, as they say here, support limits on memory consumption, CPU time, and I Obandwidth. And they say one particularly nice feature is this is the only way to limit the amount offile system cache that is used for PostgreSQL. So if you have a use case that would benefit fromconstraining the resources that PostgreSQL uses on the machine, definitely a blog post to checkout.

[00:11:49] The last post is PostgreSQL twelve beta four, up and running in less than six minutes.And this is from Luca Ferrari at Fluco 1978 GitHub IO. And he's using PGN, which is kind of likean environmental builder for postgres, so much like you hear of Python's Environment Builder orRuby's RBM for building a Ruby environment. This builds postgres environments, and you coulduse this utility to just build version twelve, beta four pretty quickly, set it to use, and then start upthe database. So if you have a need to run different versions, this is a quick and easydemonstration of how you could use PGM to accomplish that.

[00:12:32] That does it. For this episode of Scaling Postgres, you can get links to all the contentmentioned in the show Notes. Be sure to head over to Scalingposgres.com, where you can signup to receive weekly notifications of each episode, or you can subscribe via YouTube or itunes.Thanks. --

episode_image