background

High Volume Processing, pg_wal Issues, Restore with pg_receivewal | Scaling Postgres 9

Join Over 1,000 Engineers & Get New Episodes Weekly!

In this episode of Scaling Postgres, we review articles covering high volume processing, indexing, pg_wal issues, Postgres data types and restoring when using pg_receivewal.

Content Discussed

YouTube Video

Podcast Audio

Transcript



[00:00:00] In this episode of Scaling Postgres, we discuss high volume processing, indexing,PG wall issues and restoring with PG receive wall. I'm Kristen Jamison and this is ScalingPostgres, episode nine.

[00:00:21] Welcome to Scaling Postgres. An event of notes that happened this week is PostgresConf took place in Jersey City in the US.

[00:00:30] Now, as a part of that, our first content is related to that because it actually waspresented there. It's how to Index Your Database by Baron Schwartz. That occurred at PostgresConf. Now, some of the initial slides are things that were already pretty familiar with to me, butstarting on around slide 29, it started to get a really good refresher on six ways to optimize indexusage visa vis, performance related issues.

[00:01:02] So for example, he covers how not to defeat indexes in that you need to do a searchon a value in an index, not necessarily an expression, and about the left prefix rule, whereasmulticolumn indexes are sorted by column one and column two. And that queries can use theprefix of an index, but not necessarily the suffix.

[00:01:28] Related to that is that it's always best to do equalities first and then the ranges whenyou're using a multicolumn index. Another piece of advice is to exploit index only queries. Sobasically create covering indexes for important queries so that you can get index only queries.

[00:01:49] The next piece of advice is to exploit clustered indexes. However, this is notsupported in PostgreSQL, so this presentation is more generic. But the other pieces of advice upto this point have been beneficial.

[00:02:03] And again, five is consider column selectivity and order. So again, the leftmost prefixrule applies.

[00:02:12] And lastly, he mentions avoid over indexing in terms of it does increase your rightload. But the reality is the other side of it is don't fear indexes because you can probably put a lotmore indexes on a table than you think you can, particularly if you're using partial indexesstrategically, you're not covering the entire database with them. So, some pretty good pieces ofadvice. So definitely a piece of content to check out.

[00:02:43] The next article is PostgreSQL's developments for High Volumes Processing. So in itsince about PostgreSQL 9.5, he goes over all the different features that have been added up toand including what is projected to go into version eleven of postgres on features that help withhigh volume processing. And this is from the select all from Adrian blog. And in it he goes overfeatures that impact parallelism, how well it can execute queries in parallel, different accessmethods, methods of partitioning, internal improvements to the code to execute things fastermaintenance tasks, task areas, foreign data wrappers. So, a very comprehensive post abouteach of these features. So definitely a piece of content to review to ensure you're using all thefeatures that can help you scale your database.

[00:03:41] The next article is postgres as the substructure for IoT and the next wave ofcomputing. And this is from the Timescale.com blog. And again, Timescale is a extension forPostgreSQL that adds time series features to postgres. Now, this title is not very specific aboutwhat it covers, but why I wanted to take a look at this post and what I thought was interesting isit's reporting down here showing once you have large data sizes, millions of rows in a table, howinsert performance can start to degrade. And this is something that they experienced with theirparticular instance size they were using. Now, there's different ways to address this with basePostgreSQL in terms of you can start using partitions, larger instance sizes, there's some otherthings you can do. But because of this pain that they were experiencing, because they weredoing consulting in this area with regard to time series, they developed their solution where youcan get consistent performance gains up to looks like up to a billion rows here in the table. Andcomparing the performance drop off. So if you're using, or I should say collecting time seriestes and up to 14,000 times faster queries. I'm sure some of the partitioning can get you someof the way there, but I'm sure some of their enhancements have definitely increased that ormade it faster.

[00:05:40] The next article is actually a series of articles about PostgreSQL data types. So ifyou're a developer, this is something that you might find of interest. And I will include three linksto the three posts. One is on arrays, one is on range data types, and one is on networkaddresses, so IP addresses. The one I find of more interest is the arrays where he goes over aparticular use case where these could be advantageous and goes over loading data andshowing you how to set up the array data type.

[00:06:18] But he also goes over when you're needing to search in certain ways, applying a genindex to them to be able to search for certain values, as well as cases where it's beneficial touse a nest to UNNEST the array data type. So, definitely a blog post to check out and theseseries of blog posts are from the Tap Oueh.org blog. So the next series of content would be ofinterest to database administrators. The next article is PG wall is too big. What's going on? Sothis is a very interesting blog post in terms of where they went in and diagnosed a problem thatwas occurring and trying to find out the solution to it and giving you some general advice abouthow PostgreSQL works. So if you are doing management of a database, I definitely suggestchecking this out. Basically, he talks a little bit about how essentially the active log files thatPostgreSQL uses are stored in versions prior to Ten in PGx Log. And you never want to touchthose, you never want to delete them manually. Let Postgres manage those because theyactually mentioned in the post how some people thought they were just logs and they coulddelete them.

[00:07:35] And that may be part of the reason why in version ten they changed the name fromPGx Log to PG Wall. So it's clear that these are the right ahead logs. And basically it talks aboutdifferent ways to address if your disk space is filling on, how you can grab some free space, andhow you can diagnose different or resolve different issues that you encounter. And there's evenan interesting comment here about another issue that someone ran into. So definitely if you're aDBA, this is very good post to check out.

[00:08:10] The next article is a guide to PG Pool for PostgreSQL part one. And this is from theSeveral nines.com blog. And as the name implies, PG Pool is a connection pooler for postgres,but it also does a lot more. So for example, they list terminator replication load balancing forread, scalability, high Availability.

[00:08:32] So PG Pool is one of these utilities from my understanding. I've never set up PG Pool,but it's kind of has a lot of different features under it, whereas I've tended to use PG Bouncerbecause it's a focused connection pooler. But if you're interested in looking at PG Pool, theydiscuss how to set it up and how to get it working. So definitely a blog post to check out if you'reinvestigating using PG Pool.

[00:08:58] The next article is an expert's guide to Slony replication for PostgreSQL. So Slony isanother way to do replication. It's a third party replication system. It's not part of the PostgreSQLcommunity. I tend to use the built in the community's PostgreSQL replication, but this is anotheroption that has advantages in some use cases. So if you're interested in checking out, you canreview this blog post.

[00:09:28] The next article is PostgreSQL restore when using Pgraceive Wall. So I did a tutoriala week or two ago about setting up Pgraceive Wall and that basically streams wall files from say,your primary database so that you don't have to worry about archiving them on the master andfinding ways to get those files off the primary database. You just use a replication slot andstream them to another system to aggregate them there and then send them off to long termstorage, say in S Three or whatever you would like to do. But with using PG Receive Wall, youhave to take into account some special situations like it creates partial files. And how do youdeal with that when doing a restore, particularly when you want to get it up to a particular point inthis is something you're interested in, be sure to check it out.

[00:10:36] That does it. For this episode of Scaling Postgres, you can get the links to all thecontent presented in the show Notes. Be sure to head over to Scalingpostgres.com, where youcan sign up to receive weekly notifications of each episode, or you can subscribe via YouTube oritunes. Thanks. --

episode_image