background

Useful Features, Community Blogging, PG 15 Anticipation, Generic Plans | Scaling Postgres 233

Join Over 1,000 Engineers & Get New Episodes Weekly!

In this episode of Scaling Postgres, we discuss little known useful features, a monthly community blogging initiative, PG 15 Anticipation, and getting generic plans.

Content Discussed

YouTube Video

Podcast Audio

Transcript

All right. I hope you, your friends, family, and coworkers continue to do well. Our first piece of content is "6 Simple and Useful PostgreSQL Features that I wish I knew when I started". This is from it.badykov.com and he's running through a set of, well, a little bit more than six features that he likes using in Postgres. The first one is identity. Historically, when you want to create an auto-incrementing integer, you would use serial and that creates a sequence that auto-increments and uses that to populate typically an ID column of a table, its primary key. But the SQL standard is actually to use identity and you can actually create a SQL standardized way to do it using "generated always as identity". It does the same thing in the background, creating a sequence, but it gives you an auto-incrementing integer. Now, this has some benefits compared to the serial.

Number one, it is part of the SQL standard. Number two, for the serial, you can actually insert values outside of what the sequence is choosing to insert. Whereas if you try to do that with an identity, it actually generates an error, saying basically you can't insert your own IDs into it because it's always generated. However, he does show an insert that allows you to override what the system is doing and insert your own values. The other advantage he mentions is that it's easier to grant privileges for someone to insert into a table, whereas when you're working with serial in it, you have to actually grant usage on the sequence to it, whereas you don't have to with the identity. The next one he covers is COALESCE and NULLIF. So COALESCE is a function that basically returns the first value that is not null. So in the sequence of values here, it returns the first, not null value, which is one.

That's what the COALESCE function does. The NULLIF returns if the two values you're passing in are equal, then it returns a null. Otherwise, it returns the first value. So null if one and one return null, null if one and zero, it returns a one. Then he says that you can actually combine these to return the first value that is not zero by combining NULLIFs and COALESCE. And he shows an example of doing that here. The next area he covers is grouping. So he talks about grouping sets, which GROUP BY normally enables you to group your rows together as a part of the output. Well, when you create a grouping set, it groups sets of values together. So for example, he's using grouping sets one and two. These refer to the columns up above- country and product name. So you can see when you do that, it groups the sets of values together. Now ROLL UP is different in that it actually gives you totals of a given value that you are grouping up. So you can see the totals for Argentina, Germany, USA, as well as the totals for everything.

So it's basically something you don't have to calculate in the application. You can do a ROLL UP using SQL. The next one he covers is CUBE, which is basically like ROLL UP but it gives you all the different permutations of ways to add up numbers. Essentially, it includes the second group's totals in there as well. The next area he covers is common table expressions or WITH clauses, essentially. Basically, this is a way to do subqueries but have them be much more readable than a sub-query would typically be. Next, he covers Domains, which is a separate object you create that represents a defined column that can define nulls, defaults, and constraints.

For example, the Postgres documentation uses postal codes as one. Maybe you want certain constraints around that to say what values are viable to go in the postal code. Once that domain is created, you can then use it as a data type in any table within the particular database. He mentions the USING keyword and as long as your ID columns are identical between two different tables, you can do a join and the USING keyword and those are the things that he wishes he had known when he was getting started with Postgres. Feel free to check it out if you're interested as well.

Next piece of content- "PGSQL Phriday: A Monthly Blog Event for the PostgreSQL Community". This is from softwareandbooze.com and he's talking about starting a monthly event where the community writes blog posts on a particular topic. So basically a volunteer is chosen who is running a blog and they "invite" someone to participate and create a blog post. Now he says invite, but I'm wondering if some of the blog owners are going to be writing the post themselves and then they post it up and then the community posts about that same topic and references the original host of the first blog post. So we'll have to see how this works, but he's already lined up a number of hosts to get started and the first two proposed topics are "Why PostgreSQL?" and the second one is "Upgrade PostgreSQL". So I'll be keeping an eye on this and see what comes of it.

The next piece of content is actually a YouTube channel and it's "PostgreSQL 15 Antici... pation". This is on the San Francisco Bay Area PostgreSQL Users Group. And this is something that Christophe Pettis presented. It's about 34 minutes in length, but it's a great overview of many of the different features coming in Postgres 15. And if you haven't been keeping up with it as we get closer to the release date, I highly encourage you to check out this particular presentation on it.

Next piece of content- "EXPLAIN from pg_stat_statements normalized queries: how to always get the generic plan in Postgres". This is from dev.to in the Yugabyte section, and he's talking about when you're working with pg_stat_statements, all of the queries are essentially normalized. They convert values that are passed into these variable replacements and you can't just take it and do an explain plan on it because it gives an error like there is no parameter dollar sign one. Now you can prepare this query and then you can see what data types it expects by querying the pg_prepared_statements system view. But if you try to explain and execute this statement, it won't really give you a plan. Now the key is you actually need to have it generate a generic plan and apparently you can run it multiple times to eventually get a plan to come out. But with Postgres 12, there's a way to set the plan cache mode to be a forced generic plan.

Now, there's a lot more detail in this particular post, and I encourage you to check out "5mins of Postgres E35: How to run EXPLAIN on normalized queries from pg_stat_statements using plan_cache_mode", where Lukas goes into much more detail about the the previous post and adds some more information and insight into how this works.

Next piece of content- "WHAT IS AN INNER JOIN IN SQL? AND WHAT IS AN OUTER JOIN?". This is from cybertec-postgresql.com and they're covering different types of joins. They talk about explicit vs. implicit joins. The implicit-basically you're not specifying a join, it's just with a comma between the tables. And then the WHERE clause actually defines the join relationship. Whereas explicit joins, you're literally saying join on the particular columns of interest. He says the plans for these are mostly the same, although he does mention that once you get up to more than eight or nine tables joined together, you start seeing some differences between them, but for the most part they're about the same. Then he covers outer joins and there's LEFT outer joins that show the values in A even though they don't exist in B. And then RIGHT outer joins where they show the values in B even though they don't exist in A.

He talks about full join, which shows all the values in matching up the rows that match. But then he mentions common mistakes and errors where someone seemingly put something like a WHERE clause within the on section for a join. And I haven't seen someone do this, but apparently someone has and it gives you very unexpected results. So you definitely don't want to put something there. If you want something like this, you're probably going to want it to potentially be an inner join or have this in the where clause. Because presumably what someone is looking for is just this row. But you definitely don't want to put this here. Then he closes out the post by talking about semi-joins which is basically using where an ID is IN a value from a subquery or NOT IN. So if you want to learn more, definitely check out this blog post.

The next piece of content- "PostgreSQL JSON versus JSONB". This is from proopensource.it and they're talking about the difference between the JSON data type and the JSONB data type. Basically, JSON is great if you want to retain the full format and spacing and include duplicate keys. Everything's identical and you just want to store that payload in Postgres. That's what JSON is for. JSONB is great when you're building your own JSON, you want it to be fast and you want it to be indexable. Because JSONB is basically a binary format, it removes duplicate keys. And again, as I said, it enables efficient indexing using GIN or Gist indexes. And this blog post covers a number of those issues as well as other areas such as the different types of data that are allowed within JSON. So if you want to learn more about that, you can check out this blog post.

The next piece of content- "Working With Snapshots in PostgreSQL". This is from percona.com and explicitly they're talking about exporting transaction snapshots. This is a capability that was developed so that when you do a pg_dump to a directory format, you can actually specify the number of jobs that can work on it. So basically, multiple processes will export your data in parallel. And it does this by being able to export transactional snapshots between different processes, apparently. He shows an example of how this essentially works in Postgres, or at least the rough implementation of it. So if you want to learn more about that, you can check out this blog post.

Next piece of content- "Generate Unlimited Crypto Using Postgres". This is from crunchydata.com. The title is a bit of a misnomer. It's basically about the pgcrypto extension and all the different ways that you can do encryption related activities within Postgres. From using it for symmetric encryption, using it for public key encryption where you actually have a secret key and a public key, working with OpenPGP, as well as generating and testing passwords, and generating digests. Now, for a lot of this activity, I tend to use my application framework to do it. I usually don't like Postgres doing this responsibility or storing keys anywhere on the database. I usually have them in the application and the data gets encrypted and then put in Postgres. But Postgres definitely has the capability of doing this with the pgcrypto extension. And if you want to learn how to do that, definitely check out this blog post.

Next piece of content- "Distributed Postgres goes full open source with Citus: why, what & how". This is from citusdata.com. And this post goes into a little bit more depth about why they decided to go open source with their Citus extension. Primarily, it was because they were doing enterprise licensing that kind of got diminished when they introduced the hosted Citus service on Azure. Basically, that is what's funding this work as opposed to the enterprise licensing now. So they basically released all the features open source, which is great, but if you want to learn more details about that, encourage you to check out this blog post.

Next piece of content- "Configuring and Managing VIP for Pgpool-II on AWS". This is from b-peng.blogspot.com. And by VIP, they mean virtual IP addresses. And essentially how you do this in AWS is using Elastic IP addresses so they talk about how you can use Pgpool II on AWS using this capability.

The next episode of Postgres FM has been published and it's on "Query macro analysis intro". So basically this is looking at queries and different runtimes on a probably 10,000 foot view at the whole system level. Not an individual query, but looking at all the queries and how they're performing. So you can definitely listen to this episode or click here to watch the episode on YouTube.

Next piece of content. The PostgreSQL person of the week is Karen Jex. If you're interested in learning more about Karen and her contributions to Postgres, definitely check out this blog post.

The last piece of content, we did have another episode of The Rubber Duck Dev Show this past Wednesday evening. This was on "Engineering vs. Product Managers with Brittany Martin & Josh Gaastra" and how they can work together to achieve their organization's goals. So if you're interested in that type of long-form discussion, definitely welcome you to check out our show.

episode_image