background

UUID Use Cases, pg_settings, Automated Restore, Parallel Future | Scaling Postgres 102

Join Over 1,000 Engineers & Get New Episodes Weekly!

In this episode of Scaling Postgres, we discuss the use cases for UUIDs, using pg_settings, setting up an automated restore and the future of parallelism.

Content Discussed

YouTube Video

Podcast Audio

Transcript



[00:00:00] In this episode of Scaling Postgres, we talk about UUID use cases, PG settings,automated restore, and the parallel future. I'm Kristen Jameson and this is Scaling Postgres,episode 102 one.

[00:00:22] Alright, I hope you're having a great week. Our first piece of content is why I'm not afan of UUID data type. And this is from Depsz.com and he's basically stating the reasons whyhe's not a big fan of the UUID data type. Now, this is a controversial topic, as you could tell bythe number of comments on this post, but still I found it particularly interesting. So the firstproblem he says, is that UUIDs are completely opaque. So for example, you have thesesequences of letters and numbers, whereas something like an integer, it's much more easier toreason about, to remember if you're trying to reference it somewhere and it typically occurs inorder. The second problem, you can't really sort them very well. Now, there are ways aroundthis. He mentions potentially version one of it, but also in the comments they have a link of anarticle that we actually covered on a previous episode of Scaling Postgres that enables you tohave partially sequential UUIDs. The third problem, it's a pretty wide value, definitely larger thanmost integers you would be using. And then he did a little bit of a performance test to comparethe speed of doing a process with integers versus UUIDs. And the UUIDs were more than twiceas slow. Although he does note, if this was a more complex table, the difference would besmaller. And then there's a number of comments that discuss this here is this sequential UUIDgenerator? But a number of comments go back and forth discussing this particular issue. Now, Iwould say I have to agree with them that I'm not a fan of UUIDs. And what I mean by that is thatI tend to use integers as primary keys and they only reach for UUIDs in specific use cases. Thefirst use case is if you're going to have a data set that's spanning multiple databases or servers,then I could see needing to use a Uuid. So for example, some of the shared nothing databasearchitectures, they use UUIDs because you're literally generating on separate machines.Another use case is if an actual client is generating a particular ID, having it generate a Uuid thatyou can then store in the database so the actual record gets created outside the database. ThenI could see using UUIDs similarly separate database system, someone was mentioning howthey have an on premise solution and a hosted solution, and to make it easier to deal with, theyused UUIDs so that someone could easily move back and forth between those situations. Andthat's listed in the comments here. So, an interesting discussion, but I kind of use it from theperspective of what are the use cases that I would use UUIDs now, related to. This is anotherarticle that it's not PostgreSQL specific but it's talking about natural versus surrogate primarykeys in a distributed SQL database from the distributed SQL blog. Now this is specifically abouta gigabyte but still the kind of things that they mention here is when you would want to use aUuid and a surrogate primary key is when you're having essentially multiple databases and inthis case I believe they're kind of synchronizing between each other. So in the discussion ofUUIDs, this was another article this week, even though not specifically about postgres that Ithought was relevant. And lastly, another article is generate a Uuid as PostgreSQL default value.So this is a relatively brief post that tells you how to set up UUIDs in your table as a defaultvalue. The first way is using the PG crypto extension or the UUID OSSP extension. This is theone that I use and use the select UID generate for the default value when creating tables thatneed it. So if you're interested in this UUID content, definitely check out one of these three blogposts.

[00:04:28] The next article is take advantage of PG settings when dealing with yourconfiguration. And this is from Luca Ferrari at fluca 1978 GitHub IO. Now he's talking aboutwhere someone's trying to diagnose something and they ask you what's your configuration? Sothey just grep say the postgresql.com file to give what the configuration is. But the problem isthat's the kind of star configuration not even considering the auto.com file, but what is theruntime configuration? Now he lists all the different downsides of this about omitting thepostgresql.com auto file. It inc --ludes commented lines and again it doesn't reflect the running configuration. So how do youget the real configuration? He says here you use the PG settings system table so you can do aquery as you want to and it shows you the setting, what the current setting is and whether it'schanged pending a restart. So you need to do a restart to enable it. So this post is just a quickreminder about the existence of PG settings and you can use it in addition to show commands tosee what your current running PostgreSQL settings are.

[00:05:40] The next post is automated testing of PostgreSQL backups. So basically this is doingautomated restores and this blog post runs through a two different scenarios. So say you'reusing PG based backup so they go through a scenario of the scripts needed to create adirectory, unzip the items into that directory, go ahead and start at the database. Although ifyou're starting it on an existing system you'll probably want to adjust the port and someadditional settings and then do a query to test out that that restore was successful. It had thedata that you expected to be in there. So a table count or the last updated at something of thatnature to validate that it was a successful restore. And they have a full script here to do that. Andthen they have a second version using PG dumps. So if you use PG Dumps, it goes through theprocess of using either psql or Pgristore to reload the files and the whole process to go through.And they also make note of that. Be sure to watch out for triggers because some triggers,particularly if they're contacting another service or some other database system, you want todisable those triggers as data is being loaded. So that's definitely something to keep in mind withPG Dump or PG Restore. And lastly, they cover doing a point in time recovery testing and howyou could set that up. So definitely a critical thing to do to verify that your backups are workingand successful. And here's a great post that covers how you could do that. And this is from thePG IO blog.

[00:07:17] The next post is Parallelism what next? And this is from the Amid Capilla 16blogspot.com blog. And in the initial set of paragraphs here, he covers what's been developed interms of parallelism in postgres from 9.6 on through twelve. Then he's talking about what is inprogress for 13. The first one is a parallel vacuum. So being able to vacuum indexes in parallel,which could be huge benefit if you have a lot of indexes on your table, because it's the indexesthat tend to take the longest time with vacuum in my experience.

[00:07:55] Second, Improve explains handling of per worker details that can be very beneficial totry to understand what's going on with explain plans and then avoid unnecessary sharedmemory writes in parallel hash join. So this could be a good performance benefit in certain casesin terms of what's being discussed for the future is parallel grouping sets. The second is aparallel copy, so being able to load data copy in parallel, that could be advantageous. And then aparallel file foreign data wrapper, probably similar to the work being done for copy to be able tohave that operate in parallel. And in the future they're talking about doing more parallel work withthings like Gen and Gist indexes because right now it's primarily done for Btree indexes. Andthen of course there's additional comments that you may want to review as well. So if you'reinterested in what's in store for parallelism for the future, definitely a blog post to check out.

[00:08:54] The next post is Shared Buffers, looking into the PostgreSQL I O cache, and this isfrom CyberTech postgresql.com and it discusses how you can get insight into what the sharedmemory cache is storing. And he creates a simple test database to do it. And he's using theextension PG Buffer cache to be able to get that information. Then he shows you how you canexpect inspect per database caching as well as inspecting your current database. So if you havea desire to investigate what your cache is storing, definitely blog post to check out the next postis which table should you be auto vacuumed or auto analyzed? Update. So this is an update to aprevious post where he basically did a prediction of what is the next vacuum that's scheduled tobe run, or next analyze suspected to be run. But that version did not do it per table. It was onlyconsulting the system wide values that are set here, say in the postgresql --.com file. This one takes into account table settings as well. So if you've altered a particulartable and adjusted the vacuum settings, this script now consults those as well. So if you'reinterested in this tool, definitely check out this blog post from Dep.

[00:10:16] Also from Dep.com is waiting for PostgreSQL 13 add percent x to default prompt oneand prompt two in psql. So it looks like what they're referring to here is that they've added someadditional notifications on a transaction state. So for example, in the current version of Postgres,when you do a begin, you do a select and you get an error, and then do another select andyou're out of the transaction. On the prompt. There's no additional notification, it's just equalspound symbol, whereas in the new version it has additional prompts as notifications to say, okay,we're inside the transaction here, and the exclamation point means we're inside a transactionthat failed but hasn't been rolled back yet. So some additional prompts as you're working withtransactions that could be beneficial. So if you're interested in learning more, go ahead andcheck out this post.

[00:11:12] The next article is have an eye on locks of PostgreSQL. And this is from Higo CA,and it's a post that talks all about locks. The first thing they mention here is that you can look atwhat locks exist using the Pglocks system table. Then they go into table level locks and then allthe different share types that exist for them. Then they go into row level locks and how they canbe shared, covering transaction locks, page locks, and advisory locks. So if you're interested inlearning more about locks in PostgreSQL, definitely a blog post to check out.

[00:11:50] The last piece of content is actually a tool. Now, I don't normally mention tools, but Ithought this was interesting, and it's called PG Flame. It creates flame graphs of your ExplainAnalyze output, and it looks like this runs on a Mac, or it looks like you can work with Docker aswell. But basically you get your Explain Analyze output and it can generate a flame graph foryou. So if you want a more visual way of looking your Explain Analyze output, definitely a tool tocheck out.

[00:12:20] That does it. For this episode of Scaling Postgres, you can get links to all the contentmentioned in the show notes. Be sure to head over to Scalingposgres.com where you can up toreceive weekly notifications of each episode. Or you could subscribe via YouTube or itunes.Thanks. --

episode_image