background

Timeseries Queries, Arrays, Ranges, C Triggers | Scaling Postgres 77

Join Over 1,000 Engineers & Get New Episodes Weekly!

In this episode of Scaling Postgres, we discuss timeseries queries, using arrays with queries, range use cases and developing triggers in C.

Content Discussed

YouTube Video

Podcast Audio

Transcript



[00:00:00] In this episode of Scaling Postgres we talk about time series queries arrays, rangesin C triggers. I'm Kristen Jameson. And this is scaling postgres episode 77 one.

[00:00:20] Alright, I hope you're having a great week. Our first piece of content is post trivial timeseries examples and this is from CyberTech Postgresql.com and they're talking about doing timeseries queries in PostgreSQL. Now the first thing they talk about is loading time series data theeasy way and they're just showing an example where you can actually use copy from a programand just pull data from a website and load it into PostgreSQL. And the first very simple examplethey mentioned here is Lag. So this is a window function. So basically you can pull what the lastvalue is for a given column and you're choosing the column that preceded it ordered by the year.So for example, 1966 looks at the previous column by year which is here and places it as thefunction specifies here. So it gives you the last year's set of data. And then with that you canactually use a function where you just take the primary column production minus this lag functionand it basically gives you the difference from the previous year to the current year and you couldmake a percentage if you want to, et cetera. So it's just basically a very simple way to deal withtime series data and you can even do computation such as correlations because there is a corefunction. So you can take two different values and see how they correlate. So how doesproduction correlate to consumption for example, and this is by country, so you can see how wellthey correlate production versus consumption. So a very brief post, but a couple of very simplewindow functions. What's interesting about this week is there's a lot of content for developers.So moving on to the next one is also by CyberTech hyphen postgresql.com is time seriesexclude ties, current row and group. So the first thing they did here is they prepared a set of datausing generate series and they have an ordinality column that's basically an incrementing integerand then a day column that increases by day and then defining by week in integer. Now the firstexample they looked at is let's look at a sliding windows which is moving over time series inSQL. Now they use an array aggregate function in order to show you what makes up thewindow. So you do a window functions over order by the day and the rows between onepreceding and the one following. So as a good example, looking at the second row here, it'sgoing to show the current row, the one preceding and the one following. So again, the arrayaggregate column just shows you where that window is and you can see as it moves to three, ithas three in the preceding and the following row, et cetera. And you can include more if youwant, but he just did one preceding, one following. This is not useful in and of itself. But now youdo calculations, you can do averages, sums mins maxes over that window series. The next areatalks about excluding the current row. So it does the same window range, but it excludes thecurrent row. So here on the second row, you could see it includes the previous and the next row,but not the current one, et cetera. And it continues on. And you'll notice at the beginning and theend, it only has one value. And similarly up here it only has two values because there is noprevious row to row one and there is no row after row 14. So the window will only have twovalues in it. And then there's excluding ties. It says here, quote, exclude ties, excludes any peersof the current row from the frame, but not the current row itself. The next area is going intoexcluding entire groups from a windowing function. So we'll see here, this is a window thatcovers essentially five rows, two preceding, two following and the current one, and excludeswhatever this week value is in the group. So of course, since there's only ones in the window forthe first two rows, nothing shows because that group is excluded. However, it does show thetwos because they start showing up once the window reaches the third column here. Soessentially this will be included. And then once you get to this row, these two twos will beexcluded, but these two twos will be included, but that will be excluded. And then when itswitches to a two, it will then start showing these two ones that are still in the window and thelast area they cover is distinct. Now, what they say here is th --at in PostgreSQL, the current version, there's no way to use distinct as part of a windowingfunction and that PostgreSQL will error out. And he shows the error here. Distinct is notimplemented for window functions. And he says what you have to do is filter the duplicates on ahigher level so you can use a subselect, enroll the array, remove the duplicates and assemblethe array again. So he gives an example of this implementation if you wanted to do somethinglike a window function using distinct. So these are a lot of great queries for getting a betterunderstanding of window functions for working with time series data. So if you do that, Idefinitely encourage you to check these two pieces of content out. The next piece of content ishow to get the first or last value in a group using group by in SQL. This is from Hackibinita.com.So again, this is another piece of content that I feel is a great example for developers. Now, theyhave a table here where they have data stored in essentially a JSON B field and it looks like atime log of when certain events happen, so certain amounts of money are deposited, withdrawn,and an available credit is set at a periodic time. Now, they had to do a calculation to get what thetotal balance was for the account, as well as what the last credit row was. Now, I find this layouta bit unorthodox, but this was the data that he had. And to get the balance, you just pull out fromthat JSON B field, the delta balances and do a sum and by account you can get what the currentbalance is. But the problem was finding this last credit entry. Now, he said in Oracle that you canuse a last function to be able to do it in the manner described here. But when he tried to do thisin PostgreSQL, he couldn't really do it with a group by. So you could remove the group by anddo it with a window function this way to get it, but it's still not exactly what you need. And hesays, quote PostgreSQL doesn't have the built in function to obtain the first or last value in agroup using group by. Now he did some examples using SQL, but then he came up on this, whathe's calling an array trick. So using this function he developed an array and found the max valueof the array and filtered it only where the type is credit set, and then grabbed the second elementof the array, which is the credit. And that got the answer that he was looking for. And it was avery efficient query plan. So this was a pretty interesting trick and perhaps you could use it insome of your SQL queries or similar to the previous post where they were manipulating arrays tobe able to do distinct counts. Perhaps you could use rays in different ways to do queries that youneed to do.

[00:07:48] The next post is range types in PostgreSQL and Gist indexes. It's fromAlibabloud.com and they're talking about someone was using MySQL to be able to search forwhether a particular value existed in a range and they decided, this individual decided to give it atry using PostgreSQL and using their different range types. So the post goes into describing thedifferent range types that can be set up. So here's all the different types of range types anddifferent functions that can be used against them. And it goes over a lot of different examples ofbuilding certain range types and eventually discusses using a gist index. And in this case he'susing a Btree gist to be able to get the query performance that he was looking for. And in theprocess he created an exclude constraint, inserted a series of test data, performed his query,and then looked at the execution plan, which looks like a pretty simple plan, so I assume he'shappy with the performance. It didn't give too much of a comment with reference to it or howperformance compared to the MySQL version. But if you have a use case that could use rangetypes, this would be a good post potentially to look at and review how you might be able to usethem in your implementation.

[00:09:05] The next post is a faster lightweight trigger function in C for PostgreSQL and this isfrom Procona.com and they had done some previous examples of using C for triggers to givemore performance. And this gives an example of doing a simple audit with a C trigger and seewhat kind of performance that could give. Because from what they state here, a lot of people usetriggers for auditing purposes like to insert into a table when other parts of a table have changed,or another example is just updating as they say, an --insert timestamp or an update timestamp using triggers. So they're wondering how fast theycould make it. So basically they developed a C function that is shown here.

[00:09:48] Now, one thing that they did mention is that they used at number to specify whichcolumn is going to be updated. So not the actual name of the column, but this was moreperformant. Like choosing a particular column of the table is more performant than using this SPIunderscore F number function to get the column Adventures bytes name. So that's one caveatthey mentioned here. And then once you have this C function developed, go ahead and make itinto an extension that you install in your database and then create the function that the triggerwill use using the C language. Create your trigger before insert or update for each row, executethe function you defined. And then for benchmarking purposes, they created a Plpg SQLfunction. So this is what you would normally do in this case. So they wanted to see what theperformance difference was and over multiple runs you could see that without the trigger. Here'sthe baseline adding the C trigger only resulted in just about a 12% performance hit, whereas thePlpg SQL trigger resulted in 80% performance hit. So as you're looking potentially scale yourdatabase, maybe using more C functions as extensions could help increase the performance ofyour database. Now again, the caveats mentioned here is they did hard code the attributenumber or essentially the column number.

[00:11:12] When they dropped down to using the actual name and used this SPIF numberfunction, the performance wasn't as good. It was about between the performance of these justso, just something to keep in mind. But if you're looking to eke out every bit of performance,perhaps this is a path you may want to explore.

[00:11:29] And the last post is waiting for PostGIS Three st tile envelope ZXY. This is fromCrunchydata.com and again with the upcoming postgres Three, they are mentioning all thedifferent features and this is yet another one. So if you are interested in PostGIS three, definitelya blog post to check out that does it. For this episode of Scaling Postgres, you can get links to allthe content mentioned in the show notes. Be sure to head over to Scalingposgres.com, whereyou can sign up to receive weekly notifications of each episode. Or you can subscribe byYouTube rightunes. Thanks. --

episode_image