Postgres 14 RC1, Little Things, Index Advisor, Million Inserts

In this episode of Scaling Postgres, we discuss Postgres 14 RC1, little improvements in PG14, an index advisor and how to handle 1 million insert statements.

Content Discussed

YouTube Video

Podcast Audio

Transcript

All right. I hope you, your friends, family, and coworkers continue to do well. Our first piece of content is "PostgreSQL 14 RC 1 Released!". So this is the first release candidate of Postgres. This is from postgresql.org. So this should be the release prior to the finalized version that is expected to be released on September 30, 2021. You can tell there's not a lot of changes since the third beta. However, one item of note here is that quote "The feature where autovacuum analyzes partition tables is reverted and will not be available in PostgreSQL 14". So I wasn't really familiar with this. I looked into it a little bit and basically, the analyze portion doesn't get run on partition tables by autovacuum. So apparently this patch was an attempt to start doing that, but unfortunately, it didn't make it in. So if you want to learn more about the release candidate, you can check out this content.

The next piece of content- "Postgres 14: It's The Little Things". This is from blog.crunchydata.com and he's talking about a few little features that he definitely will appreciate with Postgres 14. The first one is the JSON syntax where you can now use JSON subscripting. So much like you can with hashes in other languages, you can use the brackets with the quotes or double quotes to identify a piece of nested JSON that you want. I definitely appreciate this syntax better than the standard syntax up here. The next area he mentions is read-only roles. So it's quite easy to add someone to some predefined roles. The one he's talking about is pg_read_all_data, and then there's also the pg_database_owner. So those are great additions to have. He also mentions a few enhancements to psql, the command line client for Postgres. Then he also mentions the enhancement of query pipelining.

So this is a way to communicate with Postgres in an asynchronous fashion. Basically, you can send it a query and then send another query before you receive the results back. So you can queue up multiple queries to send to the database and it can go ahead and execute them and then send them back asynchronously. They're calling this a pipeline mode. So you can click on the link and it talks more about the pipeline mode in Postgres. But mostly it has to do with the client being able to handle it and the libpq client, which is what a lot of adapters use for application frameworks. But of course, the application framework needs to handle it itself. So if I wanted to use this in Ruby on Rails, I would be using the Postgres jam and that would have to have support for this. And the Ruby on Rails framework would have to utilize that as well. But it definitely is an interesting performance improvement that I hope people are able to leverage.

Next piece of content- "A better way to index your Postgres database: pganalyze Index Advisor". This is from pganalyze.com and they're talking about a way to determine what indexes should be placed on a table. They were thinking about different ways of doing this. One way they considered is machine learning to do this. They actually tried doing this using GitHub's CoPilot, which is an AI-based helper for developers and they really weren't able to get anything effective out of it. So they asked 'How else can we do it?'. They thought about using actually the Postgres planner to do it and they had the question, what if we use the Postgres planner to tell us which index it would like to see based upon a given query? So basically ask the question, quote, "...'what's the perfect index for this query?'".

So basically have a query presented to it and then let Postgres say, well, this would be the ideal index for it. Well, then that would be the index you would probably want to create. So, based on that logic, they created the pganalyze Index Advisor. So this is a tool that is within the pganalyze product, but they also made a standalone one that's entirely free. And you can just click on this link to get to the free one. Basically, you post the schema that you have into the top field and in the bottom field you place your query and it gives you index recommendations. Now, I tried this out and it does work for relatively simple indexes, and they say it only works for B-tree indexes at this point. But like, I tried some like, operators or greater than and less than a date and I really didn't see it kind of responding to that. So this is probably still early days, but it's an interesting concept. So you may want to check out this post and this new tool.

The next piece of content- "When you have millions of insert statements in a file...". This is from dev.to in the Yugabyte section. He's talking about the situation where you have a file that just contains multiple insert statements in a row and how to run, say, a million of those and load up a table. Now, of course, the post mentions this is probably a very inefficient way to do it. A more efficient way to do it would be to use a COPY command and just have the raw data. The other scenario that might work well is to batch multiple rows per insert statement. So that way you get more data inserted per insert statement. But assuming that this is just how you have the data, how could you make this execute quickly? Well, the problem is, because of Postgres's autocommit, it actually does a commit after each statement is run. Running this actually took 21 minutes. However, when he did an explicit transaction, begin transaction, and then ran through that process, it ran in a little over a minute. So just avoiding all that transaction overhead drastically improved its performance. So that's just something to keep in mind. Then the latter part of the post does talk about YugabyteDB, but the Postgres-related portion is up near the top of the post. But if you want to learn more about that, you can check out this post.

The next piece of content- "Restarting a sequence: How hard could it be? (PostgreSQL and Oracle)". This is from fluca1978.github.io and he's talking about sequences in Postgres. So this is the way that you can get auto-incrementing IDs for your primary keys. Generally, it creates an integer, either an integer or a bigint, and then applies a sequence to it. It's the sequence that gives it what the next value should be. Now you can create a sequence by just doing CREATE SEQUENCE, giving it a name, and incrementing it by whatever value you want, typically one, and determining where you want to start it at and that gives you a sequence. Then he runs through and does a next value for this sequence that he created up to 100 times.

You can now see that if you look at the current value, it is at 100. Now in terms of resetting it, you can do select setval, the sequence, and then the number you want to set it to. However, I prefer the ALTER SEQUENCE command just because everything else is ALTER TABLE, ALTER INDEX, et cetera. Give it a name and then you can say restart. So by default, it will restart over at the beginning, which would be one, which is how you've generally set it here. Start with one, it'll reset it to that, or you can tell it to restart at a particular location. Like this says restart with a given number.

So it's pretty simple. But he says the thing to keep in mind is that when you're looking for what the value is, if you do SELECT CURRENT VALUE, it'll just tell you what the current value is, not what the next one is going to be. For that, you have to do SELECT NEXT VAL and give it the sequence. Then he goes into talking about the comparisons to Oracle. What is interesting is that it wasn't until Oracle 18 that their ALTER SEQUENCE could actually restart a sequence. So I wasn't familiar with that. But if you want to learn more about sequences, you can check out this blog post.

Next piece of content- "BUILDING AN ORACLE TO POSTGRESQL MIGRATOR: LESSONS LEARNED". This is from cybertec-postgresql.com. They're talking about a tool that they developed, an Oracle to Postgres migrator. These are some of the best practices this tool has taken into account. The first one is that transaction length matters, particularly for Oracle when you're trying to transfer terabytes of data. Doing a change of data capture to do an initial copy of the data and then keep it up to date, and getting ready to do a switch over to using Postgres. With Oracle, you can sometimes get snapshots to old error messages or can't serialize access for this transaction. So there are some considerations you need to take into account on the Oracle side when you do this, and apparently, the tool does support that.

Talks about the Oracle catalog are very slow compared to Postgres and had to do some things like having a GUI to avoid reading from the Oracle catalog or even caching the data that's in the Oracle catalog, which seems a bit surprising to me. The next one is that it's easy to have invalid objects coming in, you just need to be prepared to handle those when you're doing a migration and also be aware of null-bytes and broken table content that could possibly happen. Then the post follows up and talks about ways to efficiently load data into Postgres because you're writing to a WAL log as well as the data files itself.

If you're doing the data transfer using a tool such as COPY, you would probably want to do a COPY FREEZE and also be aware of the hint bits to try and minimize the number of writes that have to happen to transfer data into Postgres. Then they give some final advice about making sure you get your data types aligned up correctly when transferring data. As they say, quote "A migration is a chance to do a cleanup". So generally you're going to be doing this if you're migrating from one database to another. But an interesting blog post, and if you plan on moving from Oracle to Postgres, maybe you would like to check it out.

The next piece of content- "Pgpool-II Configuration Parameters- reserved_connections". This is from b-peng.blogspot.com and this talks about Pgpool II and reserved_connections, where you can set a setting that will actually send the message "Sorry, too many clients already" as opposed to blocking connections. So if you're interested in that, you can check out this blog post.

A similar post is "Pgpool-II Configuration Parameters - enable_shared_relcache". This is from b-peng.blogspot.com and it talks about the relation cache that exists in Pgpool II and how it works, so you can check that out as well.

The next piece of content. The PostgreSQL person of the week is Michael Goldberg. If you want to learn more about Michael and his contributions to Postgres, you can check out this blog post.

The last piece of content, we had another episode of The Rubber Duck Dev Show this week. This past week was on "Background Job Processing". So if you're looking for two developers discussing how to handle background jobs, you can check out this episode. This coming week, we're going to be covering "Working With UUIDs".

Postgres 14 RC1, Little Things, Index Advisor, Million Inserts | Scaling Postgres 184

Join Over 1,000 Engineers & Get New Episodes Weekly!

Content Discussed

YouTube Video

Podcast Audio

Transcript