background

ARM Tests, Tips & Tricks, Hierarchical Structures, Benchmarking Framework | Scaling Postgres 149

Join Over 1,000 Engineers & Get New Episodes Weekly!

In this episode of Scaling Postgres, we discuss tests of Postgres on ARM processors, 2021 tips & tricks, working with hierarchical structures and creating a benchmarking framework.

Content Discussed

YouTube Video

Podcast Audio

Transcript



[00:00:00] In this episode of Scaling Postgres, we talk about Arm tests, tips and tricks,hierarchical structures and benchmarking framework. I'm creston, Jameson. And this is scalingpostgres. Episode 149.

[00:00:18] Was all right. I hope you, your friends, family, family and coworkers continue to dowell. Our first piece of content is PostgreSQL on Arm based AWS EC two instances. Is it anygood? This is from Procona.com, and they're talking about the new Graviton two based AWSinstances that have been added last May 2020. And they compared them against an X 86platform. Now, they're calling these older, so maybe there'll be a new one at some point, but anolder M large against the new Graviton ones, arm based the M six GD eight X large. Now, theyboth have the same amount of virtual CPU cores, same amount of memory. The storage is alittle bit different. They use the local NVMe storage, not the EBS store that Amazon offers. But ofcourse, the difference you'll notice here is that it's cheaper about, I think they said, 25% cheaper.So if the performance is equivalent, you already get a win here by being able to pay 25% less.So they installed Ubuntu on these systems, did the postgres configuration as defined here, andthey ran some PG bench tests. Now, with a read write workload, they found that the Armprocessor gave a 19% performance gain. So even though it's 25% less, it gave 20% moreperformance. That's pretty good. And with read only, it was a little better. It gave a 30%performance gain in transactions per second for the Arm processor versus the X 86 processor.So it looks like by switching to these, you could immediately get better performance for a lowerprice. Next, they tested a sysbench TPCC. Now, what they did mention is that whereas theydidn't run into any issues with PostgreSQL on Arm, they did have issues with Sysbench on it. Sothat's just something to keep in mind. So they looked at an in memory load with 16 threads, andyou could see the Arm in blue here.

[00:02:28] Gives you about a 15% better performance. Now, this drop off, they said, is due to thecheckpointing and the full page writes, that happen. And it just happened earlier here. Since itwas processing more compared to the X 86 platform, they did in memory with 32 threads, andthe actual difference kind of reduced. Now, remember, these are using 32 virtual CPU cores.Then they went to 64, and the difference between them, as the threads increased got narrowerand narrower. In some I O tests, they also saw some better performance for X 86 platform.

[00:03:05] So it's not always the case that Arm wins every time. But in a fair number of the teststhat they've shown here, it gave what they're calling about a 15% to 20% performance gain for25% less money. So definitely an interesting finding. And perhaps you want to check out usingPostgreSQL on Arm for your workloads.

[00:03:25] The next piece of content is Tips and Tricks to kickstart the Postgres year 2021. Thisis from Cyprduck Hyphen postgresql.com and the first one they cover is partial indexes. And thisis just something that is frequently on his Tips and Tricks list, he says because it's so importantfor performance, because if you can only index certain parts of the data, that index will besmaller and much faster to look through. So places where I see partial index as being importantis if you have say, like a status column and there's only a few different statuses. Well if you onlyquery for one or two of those statuses, then just create a partial index for each of those statusesthat you use frequently and then that index can be that much smaller because it only recordsthose or things with a low cardinality. Just create partial indexes for each of those values inthose particular fields. He uses an example of when you have say, a lot of nulls in a column buta few values, a partial index would be good in that case. The next step he mentions is estimatingthe average row or record size in bytes and he says you can do this pretty easily using thefunction PG column size. Now this says column size, but you can actually apply it to all thecolumns of the table by using a query such as this where you're referencing the entire table sothat is possible to give you an average row size.

[00:04:46] The next tip he mentioned is avoiding risk on upgrading older postgres instances tonew major versions and basically saying use logical replication. So if there's some reason youcan't ha --ve downtime, logical replication is a means to do that upgrade, ideally without any downtime atall. The next tip is detecting which index types are available for my column type. So it gives aquery on how you can define what column type or data type you're using and it gives theavailable indexes for it. And he mentioned you can also add some additional extensions such asBtreegen or Btree gist to get a few more different options as well. So basically be aware of thedifferent index types that can potentially help give you better performance for what you're tryingto query. And the last tip is inspecting metadata at the speed of light with Psql's gdesc. So thisgives you a description of all the data types from a query that you've run by running thebackslash gdesc. So if you're interested in some of these steps, go ahead and check out thispost from CyberTech Postgresql.com.

[00:05:49] The next piece of content is hierarchical structures in PostgreSQL. So they're talkingabout a hierarchical structure where you have departments and what departments are locatedand others. So they have a name. So the engineering department, it has no parent, whereassay, the product department is a part of Engineering and Finance and human resources are partof administration, so it's a hierarchical organization. And then they wanted to create a path thatdefines that hierarchy that you could just look through. And they built this using two differentimplementations. The first implementation used materialized views with recursive Cdes. Thematerialized view was just to be able to refresh it conveniently, but the recursive Cde is how itgoes through and builds that path structure that we saw. And this is the relatively simple query togo ahead and do that. And the materialized view is just to refresh that on our frequent basis. Andthey show an example of it, of how you can query what it looks like, what the path that itgenerates is looking like, and how you can pull out individual values as well as look for, say,where product is in the hierarchy. The second implementation is using lTree, and we'vediscussed this in a previous episode of Scaling Postgres, but lTree is an extension you can addthat actually builds this for you. So you can see when they created this teams table, again, theyused the data type of lTree for creating it. Then they loaded the data and it gives you the sametype of path. Now it doesn't look identical, it's a little bit different, but it shows you how you canquery this path to get the same types of answers. So if you're interested in learning how to storehierarchical data, then maybe you want to check out this blog post from Hoverbear.org.

[00:07:35] The next piece of content is demo of another benchmarking framework. This is fromMarkwkm Blogspot.com and he's talking about developing a more convenient benchmarkingframework that you can load in your own queries to do benchmarks against different data sets ordifferent databases. So he used a data warehousing example and used a Dbgen and QGENfrom the TPCH framework. And his user defined benchmark is called PG, not TPCH.

[00:08:08] So it's basically him developing this benchmark that could potentially be used to putyour own information into to test against databases. So if you're interested in that, you can checkout this blog post.

[00:08:21] The next piece of content is waiting for PostgreSQL 14. Add PG database countersfor sessions and session time. So basically to this view, the PG Stat database, it's addingsession time, the active time, idle and transaction time, number of sessions, the number ofsessions abandoned, fatal or killed. So it gives you an overview of how much time differentsessions are spent in different activities, as well as where they potentially may fall down. So youcan gather statistics on that. And this is from Dep, also from that same website. Are are therelimits to partition counts? Now this is interesting. He actually created up to 70,000 partitiontables because he saw someone was referencing hey, there's some limit on it. And he went allthe way up to 70,000 with no issues of doing that. However, he did notice an extension in theplanning time required for the queries that were going against the database for these partitiontables. So it does work, but you probably don't necessarily want to do that. But definitely aninteresting post about what's possible. And he actually has a quote here I would probablydiscourage --people from making 5000 plus partitions on a single table, but PG can definitely handle it. Soif you want to learn more about this, you can check out this blog post.

[00:09:41] The next piece of content is how to with Dave Page episode Three install and run PGAdmin using the Python package. So there's a convenient YouTube video here that shows youexactly how to do that. This is from Enterprisedb.com.

[00:09:56] The next piece of content is golden proportions in PostgreSQL. This is fromCyberTech UL.com, and they're talking about golden proportions where they're basically certainmathematical values that have interesting concepts. So one particular value he mentioned is 1.6118, and he uses this in reference to building a new office that they were discussing. Now, theinterest of this post is basically on all the different mathematical and conversions you can usewithin PostgreSQL, because this whole post talks about doing different calculations andcomputations using SQL. So if you want to learn more about the different functions andmathematical computations that are capable of postgres, maybe you want to check out this blogpost.

[00:10:41] The next piece of content is production. PostGIS Vector tiles. Caching. This is fromCrunchydata.com. They're talking about an issue where you have, say, a public website thatneeds to look at images, and you're using PG Tile Serve to dynamically pull that data to presentto users of your application. And that creates a huge load on the PG Tile Serve as well as on thedatabase to pull that data from it. Now, one way you could do it is set up multiple databases in acluster like this. But he advocates probably the easiest thing to do is set up a caching. Now, heused Varnish in this case, but basically you cache the results of the PG Tile Serve to serve toyour application. So you could probably get by using a 1 minute or a five minute or even a 30minutes cache of data. You don't necessarily need it to be all real time, but this would allow nearreal time results from your application if you have this type of caching in place. So if you'reinterested in learning more about that, you can check out this blog post.

[00:11:47] And the last piece of content is the PostgreSQL Person of the Week is Mark Lindster.So if you're interested in learning more about Mark and his contributions to PostgreSQL,definitely check out this blog post.

[00:11:59] That does it. For this episode of Scaling Postgres, you can get links to all the contentmentioned in the show Notes. Be sure to head over to Scalingpostgres.com where you can signup to receive weekly notifications of each episode or you can subscribe via YouTube. Ritunes.Thanks. --

episode_image