Foreign Key Indexes, Graph Queries, Linux Huge Pages, Text Column Size

In this episode of Scaling Postgres, we discuss if foreign keys should have indexes, how to run graph queries, how to configure Linux huge pages and the benefits as well as text size similarities.

Content Discussed

YouTube Video

Podcast Audio

Transcript

All right. I hope you, your friends, family, and coworkers continue to do well. Our first piece of content is "Should I Create an Index on Foreign Keys in PostgreSQL?". This is from percona.com and they're saying a lot of the advice, I've even given it, is that generally you want to have indexes on your foreign keys and these are references to other tables that are frequently used with querying. However, it's not always the case that those would be used. So they present an example here where you have a supplier table and a product table and most of the time you're going to be looking and searching for products and just wanting to know who the supplier is. So for that type of query, you're not going to need an index on the supplier ID. You would only need it if you are looking at products from a given supplier. Now clearly that would happen, but it's probably not going to happen as frequently. So do you need an index on it? And of the arguments they're making, it may be more efficient just to do a sequential scan of the table.

But of course, they also discuss the flip side of where having an index on a foreign key is pretty much essential. So when you have a customer and you have orders, a lot of times you're just going to be looking up a customer's order. So in this case, this foreign key would be essential for those types of queries. But this post basically talks about how it's not always essential to have indexes on foreign keys. Although, I would still probably do it as a first pass unless I'm sure it's the case like above here where it's not going to be used that much because you can always add the index later. Even if you've added one, Postgres has tables that let you examine how frequently that index is being used or if it's being used or not. So it's a good practice to use those views to look at Postgres and remove any indexes that just aren't being used. But if you want to learn more, you can check out this blog post.

The next piece of content- "Webinar: Graph Queries with PostgreSQL". This is on the EDB YouTube channel and this is a webinar about an hour in length. He explains how you do graph queries with PostgreSQL. Basically, graph queries are where you have separate nodes that are linked or there are relationships between one another. An example I typically think of is if you have a person that has relationships, maybe it's a friend relationship or a familial relationship. Those are links between two different areas. Or you could also have a graph query that looks at where different HTML pages are and how they're linked together. Well, how you do these types of queries in PostgreSQL is you use recursive queries, so you use the WITH RECURSIVE keyword to recursively search through those trees, essentially. This webinar goes through how you can write queries that run efficiently, use indexes that can also avoid circular references, which are possible with a graph, as well as handling multi-paths to the same node. So if you're interested in that you can check out this YouTube video.

The next piece of content- "Improving PostgreSQL performance without making changes to PostgreSQL". This is also from enterprisedb.com and this post is basically all about Linux Huge Pages. Now, we've had some other presentations about it, but as your database scales this becomes more and more important to look at and configure. So it's basically an operating system-level configuration, not a PostgreSQL configuration, although you do have to make an adjustment. But this discusses the importance of it as well as some of the magnitude of the changes. So they say when they were working with one client, after configuring Linux Huge Pages, the CPU usage of one database system went from 51% down to 15% with essentially the exact same load. So that's definitely really significant.

They give you some queries here you could use to check out the status of the page table to see if you could potentially benefit from this. So they go about discussing hey, does this make sense? How to check that out. Then once you're ready, make the change even in a test system and test the results. So you can definitely see some TPS adjustments based on how you configure Linux huge pages. I really like the advice at the bottom of this post. The first thing, set up a test server. Make sure neither transparent huge pages nor huge pages are enabled and run a test load. Then repeat with the two-megabyte size Huge Pages. Record the results and then repeat it with the 1GB Huge Pages. Record those results as well. Determine what's best for you and then choose what you want to put into production. So this is a great post about Linux Huge Pages that I definitely encourage you to check out if you're running a larger-size database.

The next piece of content- "Does varchar(n) use less disk space than varchar() or text?". This is from depesz.com. I personally have always heard that there's essentially no difference between these. But he actually set up tables and tested all sorts of different character links to examine if there's a difference. He has the table right here and basically for anything greater than a string length of 100, the results are exactly identical for all different column types, both the text column, a ,varchar and a varchar with a specified number of characters in it. The only two areas that were different were where the string length was ten or 100 and that was for a varchar of a specific number. But he narrowed it down to what he believes is the presence or not of a TOAST table for it. So basically it pretty much looks to me like all of these are identical and it doesn't really matter which you choose in terms of disk space utilization, but if you want to learn more, you can check out this content.

The next piece of content- "Using PostgreSQL Views in Django". This is from pganalyze.com and they're talking about using views which are basically virtual views of a table. You can create a view that looks at multiple tables and it basically is a way, as they say here, to cut down on complexity. So as opposed to having to join the same tables over and over again and rewrite a long query, you can simply write that query once and save it as a view using the CREATE VIEW command. Now you can simply select from this view to get the information of interest. These are virtual tables, but you can also materialize them, which is what a materialized view is. So you're actually creating a separate table and putting the data into it. A standard view has no data stored in it, it always just goes and reads directly from the tables, but a materialized view actually creates that table.

Now the advantage of that is that you can put indexes on it to get potentially better performance and they talk about how you can create it. But the other thing you need to keep in mind is that, now, it is a static representation of a point in time and if you want up-to-date data, you're going to have to refresh that materialized view so they have the command to do that and ideally you're always going to want to concurrently refresh it. Then the latter part of the post shows you how to set up and use these views in Django because, in terms of Django, they're essentially just another model that you can define. I've personally done the same thing with Rails. I've set up views and created models that point to those views and you can use them for reports, for example. But if you want to learn more about how to do that, you can definitely check out this blog post.

The next piece of content- "PG Phriday: Isolating Postgres with repmgr". This is from enterprisedb.com and they're talking a lot about high availability with regard to repmgr, which is essentially a rep manager. Specifically, this talks a lot about consensus as well as fencing and how it handles those cases with repmgr. So if you're interested in that you can check out this blog post.

The next piece of content- "POSTGRESQL ON WSL2 FOR WINDOWS: INSTALL AND SETUP". This is from cybertec-postgresql.com. WSL2 stands for the Windows Subsystem for Linux. So basically it's kind of like having a virtual machine infrastructure within Windows. So if you set up this WSL2, you can essentially install a Linux distribution within Windows with presumably better performance than you would get from, say, a VM solution. This talks about installing PostgreSQL in that type of environment. Now they're referencing a stack overflow survey here where they show a lot of people are using Windows for their development environment.

Frankly, a lot of what I see is Mac. So I'm a little surprised Mac is only 25%. I'm a little surprised that Linux is this high at 25%, but the Windows subsystem for Linux is about 3%. But there are some users out there. I know personally I use Linux on the desktop and I have for years. I used to predominantly use Windows, but about a decade ago when I started Rails development, I pretty much switched over to Linux. But this post shows you how you get WSL2 installed on Windows and then how you can then get an Ubuntu distribution and install it on that, and then from that point install PostgreSQL on it and be able to connect to it successfully. So if you're interested in that, you can check out this blog post.

The next piece of content- "Waiting for PostgreSQL 15 - Allow publishing the tables of schema". This is from depesz.com. Basically, they've made an enhancement to logical replication where now you can also define a schema that you want to publish. So before you could publish individual tables or the whole database, now you can do it at the resolution of the schema. So if you're interested in that feature, you can check out this blog post.

The next piece of content- "PostgreSQL 15 will include some more regexp functions". This is from migops.com and they're talking about Postgres 14 and prior have a fair number of regular expression support within a number of functions that they provide. But this post discussed that Oracle provides even more. So this individual or organization put in a fair amount of work to add a lot more regular expression functions to Postgres that should be live by version 15. So if you're interested in learning more about that, you can check out this blog post.

The next piece of content- "Easily Running Babelfish for PostgreSQL on Kubernetes". This is from stackgres.io and they're talking about Babelfish, which is the open-source project that allows PostgreSQL to act like a Microsoft SQL Server, essentially a line-compatible version. Since the source code has recently been open-sourced, they've used their solution, their Stackgres operator for Kubernetes, I believe, to easily enable you to run Babelfish on it. Now they do provide a hosted solution, but it appears that there is an open source version as well that you could try out. So if you want to try a Babelfish without having to compile Postgres from source code, maybe you would like to try out this alternative.

The next piece of content, the PostgreSQL person of the week is Louise Grandjonc. If you're interested in learning more about Louise and her contributions to Postgres, definitely check out this blog post.

The last piece of content, we did have another episode of The Rubber Duck Dev Show this past week. This one was on "What is DevOps?". So if you're interested in a long-form discussion about DevOps and what it is, you can feel free to check out our show.

Foreign Key Indexes, Graph Queries, Linux Huge Pages, Text Column Size | Scaling Postgres 192

Join Over 1,000 Engineers & Get New Episodes Weekly!

Content Discussed

YouTube Video

Podcast Audio

Transcript