background

generate_series, PG11 Partitioning, GPU Queries | Scaling Postgres 25

Join Over 1,000 Engineers & Get New Episodes Weekly!

In this episode of Scaling Postgres, we review articles covering graphing with generate_series, partitioning in Postgres 11 and GPUs for queries.

Content Discussed

YouTube Video

Podcast Audio

Transcript



[00:00:00] In this episode of Scaling Postgres we talk about graphing with Generate seriesbetter partitioning GPUs for queries in touring PostgreSQL. I'm creston. Jameson. And this isscaling postgres episode 25.

[00:00:18] You.

[00:00:23] Well we've got quite bit of content this week so we'll go ahead and jump right in. Ourfirst post of the week is simple monthly reports in PostgreSQL using Generate series. And this isfrom the Rob Connery blog and in it he's talking about generating a graph and basically hewants to set up a monthly report that shows sales per day but he wants to see a zero if there areno sales for a particular day. And starting off a month he doesn't necessarily just want to see thefirst, he wants to see all 30 days and then if there are any zero, show as zero. Now to do thishe's going to use a function called Generate Series to generate a series of dates. So he just firstsimply looks at Generate series which generates essentially integers. But here he shows how itcan also work with days and intervals. And basically this would query you would need to do ifyou wanted to do for an entire month start at the first, stop at the end of the month and do aninterval by day. And now he shows by using the date trunk function defining a month and sayingnow you can basically get the first of the month and you can get the end of the month by usingthat same function, adding an interval of a month minusing one day to get the end of the month.So basically that gives you a row per day. He then goes ahead and uses this to create a functionin PostgreSQL called dates in the month where you can pass in the date and it defaults to now.Now to use this query he hits basically set up a view to get all of his sales information becausethat's what this graph is going to represent. And then he joins from the days in the monthfunction to this sales view and then groups it by the date that's generated and then that giveshim his graph with every date represented. So I really like this post from the perspective of thedetail he went through to be able to generate something that you could potentially use if youwanted to generate graphs or tables that included every date of a time series you're trying toretrieve from the database but maybe you don't have sales on those days. For example, the nextpost is Talk slides partitioning improvements in PostgreSQL eleven. So this is from the secondquadrant of postgresql.com blog and this was actually from a presentation in PGConf Brazil in2018.

[00:02:53] Now this is the link to the talk slide. So I'm actually going to look at this and this wasby Elvario Herara. Now in this presentation you can tell here and this is a PDF of it, he goes overbasically a historical review of partitioning in PostgreSQL. He talks about the new features andsome of those include default partition. So you can define a partition that's a default. So if dataneeds to be inserted but there's no partition that it would match to, it can be placed in the defaultpartition.

[00:03:28] Row migration on update so if you do an update and it's actually going to move apiece of data from one partition to another, it can handle that. Now it can now support in additionto list and range partitioning, it can support hash partitioning which this is useful for being able toequally populate the partitions that you have set up. And then lastly in this area he mentionsinsert on conflict do update is something that works. Now, this is something that did not work inversion PostgreSQL ten and actually I really like this update because I have some use caseswhere I could really use this in PostgreSQL and eleven in terms of better DDL. He talks aboutbeing able to create an index on the parent table that will then be created on the child tablessome support for unique and primary key constraints some support for foreign key constraints aswell as some row level triggers. And of course he talks about better performance as well, mostlywith regard to partition pruning on targeting the appropriate partition for doing queries. So I knowit can sometimes be hard to look at a presentation that's given where all you have is slides, butthere's still some useful information in this presentation and I suggest you check it out as we getcloser to PostgreSQL eleven and we find out all the different new features that are going to becoming online.

[00:04:55] The next post is GPU accelerated SQL queries with PostgreSQL a --nd PG strom in OpenShift 3.10. And this is from the Openshift.com blog by Red Hat.

[00:05:10] Now this was quite a title but basically they're showing you how to use PostgreSQLalong with PG Strom in OpenShift. Now, why I picked this particular article is because I wasinterested in the GPU accelerated SQL queries. So basically GPUs are basically graphicalprocessing unit. They're designed for graphics. However, they tend to have a lot of cores thatare specially designed for doing mathematical operations. So there are certain operations thatthere are queries that can be highly benefited by them. So for example, they're talking aboutwhen you're doing average count square roots of particular queries, it really accelerate themwhen you're using a GPU in conjunction with the CPU. And they mentioned that a CPU isbasically very flexible, can do all sorts of calculations, whereas GPUs are specific for particulartype of operations. But even with that they can show some significant performanceimprovements when using one GPU and one CPU. And they're looking at natural joins here inthe execution time and as you can see, the GPU kind of gets up around 2 seconds, but nevergoes much more than 2 seconds as the natural joins go from one to nine. Whereas when you'rejust dealing with one CPU, it basically continues up in a linear fashion and they say at somepoint the GPU execution time will increase, but they never hit it in the test that they were doing.So in a previous episode of Scaling Postgres, we talked about the potential of using GPUs toprocess queries faster and basically PG Strom is an open source utility to do that. So if this issomething that may be of interest to you, if you're doing a lot of data warehousing work, youmight want to check out the PG Strom project.

[00:07:05] The next post is taking a tour of PostgreSQL with Jonathan Katz and this is episode42 of the Data Engineering podcast.

[00:07:14] Now, so this is a podcast, it's about 56 minutes long and it didn't really have anycontent directly related to Scaling Postgres, but it was pretty interesting.

[00:07:26] Whirlwind tour of PostgreSQL, a little bit of its history, a lot of its capabilities, talksabout a little bit about the future of Sharding, which is I found particularly interesting. So again,not directly related to Scaling PostgreSQL, but an interesting piece of content nonetheless. Ifyou wanted to check it out, the next post is Fun with SQL Common table expressions for morereadable queries. And this is from the Citusdata.com blog and basically this is part of their Funwith SQL series that we've covered some of these in the past and this covers Common tableexpressions or CTEs, and basically how I like to think of CTEs. It's basically a way to name asub query. So you could run this as a sub query, select all from and then just paste this right intothis area. And it's essentially like a subquery, but it allows much more readable SQL becauseyou can compartmentalize what something is doing and treat it kind of like a function is how Ilike to think about it. And they go through a couple of examples of using CTEs and even puttingmore CTEs together to create a more complex query. So this is a pretty short post, but if you'rewanting to look into using CTEs a bit more, definitely a blog post to check out.

[00:08:51] The next post is PG. Bouncer and off. Pass through. Now, I chose this post because Ithink whatever documentation or additional information can come out about Pgbouncer isdefinitely beneficial. And also because this post talks about a need where they wanted to be ablejust pass through authentication from Pgbouncer to PostgreSQL. Because typically there's auser list file in Pgbouncer that you define all your users and passwords for, and there's aparticular user who didn't want to have that password file essentially just sitting on the PGbouncer instance and wanted to pass through the authentication. But in the process he actuallyshows you how you can go about and get just general authentication up and running withPgbouncer now, how he actually does the pass through is actually using a function, or I shouldsay creating a function that queries the PG shadow password database in order to accomplish it.So it's a pretty interesting post and if you want to learn more about how PG bouncer works, Idefinitely suggest checking it out.

[00:09:59] The next two posts are from thebuild.com they're relatively short. They have --a lot of condensed, excellent information. The first one is does anyone really know what timeit is? And it covers how to use the now function, the statement timestamp function, and the clocktimestamp function. I'll just quote this right here. Since this is so short, now is the time at thestart of the transaction. So when you do begin and then do a transaction, that's when now isgoing to give you the time and it never changes. Quote it never changes while the currenttransaction is open. So until commit or rollback, it will always give the same time. Statementtimestamp is the time that the current statement started running. It changes from statement tostatement, but is constant within a statement. And lastly, clock timestamp changes each time itis called regardless of context. And then he gives pointers for its use, basically for a predicateand aware clause, use now or statement timestamp because these work properly with indexesand because they are constant within the execution of a statement, if you need the time toupdate within a single transaction, use statement timestamp. Otherwise use now. And generallyyou should only use the clock timestamp, which gives you the immediate time inside of aprogramming language, so something procedural. So definitely a quick post that gives a lot ofgood information.

[00:11:21] The next post again from thebuild.com is Three Steps to PG rewind Happiness.

[00:11:27] So first is have a wall archive. So even though you don't officially need one whenusing PG rewind, you're always going to want to have it so that the server you're rewinding hasaccess to those archive logs if it needs them. Second, be sure you promote the source serverbefore shutting down and doing things with the other server. You're always going to want to do apromote and wait until it's fully promoted. And that leads to step three, wait for the forcedcheckpoint to complete. So basically you want to make sure that when the secondary ispromoted to being a primary, it needs that forced checkpoint to complete. So again, a very shortpost, but very stock full of useful information.

[00:12:10] The last post is using Kubernetes to deploy PostgreSQL. This is from the Severalnines.com blog and it goes over very simply how to set up PostgreSQL on Kubernetes. And theytalk about the previous posts, where they talk about deploying PostgreSQL on a dockercontainer. And this is kind of related to it. And what I found interesting is they have a sectionhere talking on persistent storage, because in my mind, that's the most important thing to getright if you're using Kubernetes and how to set up the service and connect to PostgreSQL. So ifyou're interested on using PostgreSQL with Kubernetes, definitely a blog post to check out.

[00:12:51] That does it. For this episode of Scaling Postgres, you can get links to all the contentpresented in the show Notes. Be sure to head over to Scalingpostgres.com, where you can signup to receive weekly notifications of each episode, or you could subscribe via YouTube or itunes.Thanks. --

episode_image