background

Modeling, Query Performance, Statistics, pgmetrics | Scaling Postgres 3

Join Over 1,000 Engineers & Get New Episodes Weekly!

In this episode of Scaling Postgres, we review articles covering modeling, checking for long queries using pg_stat_statements, statistics in PosgreSQL 10 and pgmetrics.

Content Discussed

YouTube Video

Podcast Audio

Transcript



[00:00:00] In this episode of Scaling Postgres, we talk about modeling, query performancestatistics, PG metrics, and high availability. I'm creston. Jameson. And this is scaling postgresepisode three.

[00:00:21] All right, we're going to start things off a little bit different. I'm going to prioritizecontent that is more applicable to developers first, and then follow up with the other articles thatwould be of general interest to someone looking to scale postgres. So the first article we have isdatabase modelization antipatterns. This is from the Tap Oueh.org blog.

[00:00:48] So in this article, he discusses three antipatterns that he finds when people aremodeling their application.

[00:01:00] The first is Entity attribute Values, and how you should pretty much not use thispattern. The second is having multiple values per column. So the example he uses here is youhave a Tags field that you contain in a text field with some sort of delimiter. Here's a semicolon,for example. And these are, for example, Tweets and how this is an anti pattern. And if you usenormalization, there's a better way to structure this data with a Tags table and a Tweets to Tagslinking table. And the third is looking at is actually UUIDs. So clearly he's not a fan of UUIDs. Imean, I generally tend to use sequences. However, UUIDs do become interesting when youneed something that's globally unique, like you're having data coming in from multiple, say,postgres databases, that you need to do something to ensure uniqueness. So why I chose thisarticle is that I think this is some education that would be of benefit to a lot of developersbecause really, it's the modeling that can really impact performance. And if you're looking toscale your database, it's important to get the modeling right as soon as you can.

[00:02:21] So definitely encourage you to check out this article. The next article is from theCyberTech blog and the title is Detecting Performance Problems Easily in PostgreSQL.

[00:02:33] So right from the get go, he says, improving PostgreSQL performance beyondparameter tuning. And that goes back to what I just mentioned. That a lot of the performancegains. If you're looking to scale your database, getting the modeling right, getting the indexingright, that can give you amazing performance gains, 100 fold, 1000 fold, potentially, compared totrying to tune the database better, there's a place for it. But generally, query optimization can getyou bigger wins. So in this article, he goes over how to look for slow query statements in termsof enabling PG Stat statements and gives you some queries you can look at to be able to pullout those queries that are slow in the database using some of these PG Stats statements.

[00:03:29] So, definitely an article to check out if you're unfamiliar with using PG Statstatements.

[00:03:36] The next article is from the Citize Data blog, and the title is the postgres Ten featureyou didn't know about. Create Statistics.

[00:03:45] So this goes over, of course, a new feature in Postgres Ten where historically theStatistics Planner only operated on single columns, and that's pretty much what it does bydefault. However, it has introduced two additional statistics where you can define relationshipsbetween columns. So it's not just looking at single columns, but you can define that there arerelationships that exist between these two columns to get better statistics. So it offers twodifferent ways to define these related statistics. The first way is to define that there aredependencies between two columns, and near the bottom of the article they mention some realworld implications of these types of dependencies. So for example, here's a quote havingcolumns for month, quarter and year because you want to show statistics grouped by all inreports, well, there's a relationship between month, quarter and year. Also relationships betweengeographic hierarchies, so a country estate city columns and filtering grouping by them.

[00:04:58] There are only certain cities in a state, there are only certain states in a country orcities in a country. So by creating these statistics, you give the optimizer more accuratedata could benefit from it, I definitely suggest checking out this blog post on it. The next articleis from the ops.com blog Announcing Pgmetrics so this appears to be a command line tool thatenables you to retrieve from the system tables of postgres all sorts of statistics information andexport it via JSON or just get human readable text. So as opposed to opening a databaseconnection and querying certain tables, they've designed this utility to be able to pull out thisinformation. And it covers replication status, replication slots, standby replication Status whileArchiving BG, Writer Backends, Vacuum Progress, Table Spaces, databases all the sort ofinformation to help you get a gauge on how your database is performing. So if you're looking forsome different ways to monitor your database, you might want to check out this project.

[00:06:34] So this is the Announcing blog post, but there's also Pgmetrics IO, which is a websitededicated to this project. The next article is from the several nines.com blog and the title is TopPG Clustering. Ha Solutions for Postgres so Ha means High Availability, and this just goes overa broad overview of all the High Availability options that are available in postgres. Some built in,some you need separate products for so standby databases clustered using DRBD, et cetera.So if you're considering a High Availability solution, this gives a broad overview of differentoptions that are available.

[00:07:21] The next article is from the second quadrant blog PostgreSQL meltdown Benchmarksso, with the recent vulnerabilities that have been discovered in CPU speculative execution,meltdown Inspector, there's been a lot of concerns about how will this impact performance ofservers? And they went in and did some performance testing with PostgreSQL, as well aslooking at it with different patches. So they have these graphs that define the differentperformance impacts for both online transaction processing loads and online analyticalprocessing loads. And it's not an insignificant impact in the conclusion here, but definitelysomething to keep in mind that this will have an impact on performance as you install thesepatches. Potentially, maybe they'll get better in the future.

[00:08:18] And the last article is actually a tutorial I developed@scalingposgres.com and it'sPostgreSQL Backup and Point in Time Recovery. So if you're at the getting started stages ofsetting up your backup and a point in time recovery plan, I encourage you to go ahead andcheck it out. That does it. For episode three of Scaling Postgres. You can get the links to all thecontent presented in the show. Notes be sure to head over to Scalingpostgres.com to receiveweekly notifications of these episodes. You can also subscribe via YouTube or itunes. Thanks. --

episode_image