background

Pluggable Storage, Developer Gotchas, Cursors, PG12 | Scaling Postgres 60

Join Over 1,000 Engineers & Get New Episodes Weekly!

In this episode of Scaling Postgres, we review articles covering pluggable storage, gotchas for developers, using cursors and new PG12 features.

Content Discussed

YouTube Video

Podcast Audio

Transcript



[00:00:00] In this episode of Scaling Postgres we talk about pluggable storage developerGotchas Cursors and postgres twelve. I'm Kristen Jameson and this is scaling postgres episode60.

[00:00:19] Alright, I hope everyone's having a great week. Our first article is PG Friday dayaround the world in 2 billion transactions. And this is from the second quadrant Postgresql.comblog. In here they're talking about the issues of transaction IDs or Xids, and the fact that they arelimited to 2 billion transactions based upon the fact that they're using 32 bits for this counter.Now, this can be an issue because if you have a lot of activity, you can run out of them. And theymentioned some high profile incidences of this happening and that as databases are processingmore data, this becomes an ever bigger limitation. Now, I have covered some previous articleswhere they're talking about making it 64 bits to lengthen the runway on it a bit. But here they'retalking about reframing the problem and how to kind of think about this and what's the wayforward. And a lot of people just say monitor it. But as more database use cases have higherand higher volume, the amount of time you have to deal with a potential problem gets smallerand smaller the higher the activity. So you definitely need to monitor your X IDs, but given yourvolume, you could run out of it pretty quickly. Now, one thing they mention here is a way out ofthis because they talk about you could change it to 64 bits, but that would require a lot of work todo. And if you already have a large database, they frame the question here that would mean avery time consuming and antiquated dump and restore procedure. So you couldn't usesomething like PG upgrade. And they say, quote who wants to spend days or weeks transferring50 terabytes worth of data to a new target that removes the XID issue. So even if we were ableto make 64 bit Xids, the problem is transitioning large databases to it. But interestingly, hementions under this heading a way out. And the point he's referring to is Andres Round, I believethis part is postgres twelve. He posted postgres now has pluggable table storage. And why thisis beneficial is that once data storage becomes decoupled from the engine itself, a whole newworld opens up. So instead of having to do a whole database transition upgrade at once, maybeyou could create a new table storage engine with 64 bit IDs and transition it that way. And theyalso mentioned some of the back ends like Enterprise DB is currently working on Zheep, whichmaybe it could use this pluggable storage and this type of storage engine. Some of theadvantages, maybe we wouldn't need vacuum anymore, maybe we wouldn't need to vacuum uprows anymore and updates could be done in place without having to keep a copy of that rowaround. So basically this offers a way forward potentially to moving from the 32 bit limitation onXids. So it's definitely a very interesting article and I definitely encourage you to check it out.

[00:03:23] The next post is actually a YouTube video and it's PostgreSQL gotchas for appdevelopers by Magnus Hagandar and this was on the Foss North YouTube channel and this hadsome great advice for developers using PostgreSQL on Gotchas to watch out for and I'll list afew that occur in the video. Like they're talking about the importance of pooler size and perhapsbasing your connections on the number of cores. In other words, you don't want 10,000connections going to the database when you only have ten cores in your database system. Howit's important with regard to the security notices that have been released is that the user you'reusing to connect as your application, you don't want that to be a super user, you want it to be ageneric user or at least owner of the schema that you're working with. You want to be sure toturn off trust in your Pghba file since that bypasses the network security in place. He talks aboutdifferent ways you could use JSON as well as some best practices with regard to migrations. Healso talks about cautions when using CTEs or common table expressions, which is the withclause because PostgreSQL tends to materialize those. However, in Postgres twelve I believethey don't do it by default but you can choose to materialize them if you so choose. And then hehas a whole nother section talking about orms and how best to work with them with yourapplication with PostgreSQL. So if you're a developer, definitely a --really great presentation to check out.

[00:05:01] The next post is declare cursor in PostgreSQL or how to Reduce Memoryconsumption. This is from CyberTech Postgresql.com and this is a post about cursors. Now, Iactually haven't used cursors that much in PostgreSQL, I did use them more when I was anOracle DBA but I haven't used them any for my recollection in PostgreSQL. But they havecursors and they're talking about specific use cases where they can be beneficial. In otherwords, in this scenario he put a 10 million rows into this table and he just did select everythingfrom the table and it takes forever loading that data in and pulling it from the database. So you'repulling everything at once, whereas if you use a cursor it allows you to fetch a certain numbereach time. So with this first one he can immediately fetch one of the rows and then when hefetches the next four rows, it gives him the next four rows from his cursor. So imagine a screencursor and it saves the place of where you are. And you can get one record and then anotherfour or ten or however many you need, but you have a main query. And then you move thatcursor along within that data set you've pulled and it should return the data much faster becauseit's only pulling back for you specific amounts of information that you need. And here are someof the different capabilities like you can move to the next or the prior or the first, the last movingforward, backward. So all of these commands to be able to move the cursor throughout the dataset that you're selecting. Now, one thing to keep in mind is that these do have to be within atransaction. So normally when I had used cursors it was within a stored procedure, but it's justsomething to keep in mind when you're using cursors, it generally has to be within a transaction.However, they do have a section here saying using cursors across transactions, which they sayhere to make sure the data can survive the transaction, PostgreSQL has to materialize theresult. So when you do this select, it does materialize the result and then you can use it for aperiod of time until you close out that cursor. So if you're needing to work on a particular data setand move forward and backward through retrieving data, perhaps cursors could help you getbetter performance for the activity you're doing. So this is definitely a blog post to check out.

[00:07:24] The next post is how to update objects inside JSON B arrays with PostgreSQL andthis is freeCodeCamp.org. And basically this is a post about working with JSON B inPostgreSQL and he's assuming this scenario you have a customer's table and maybe you wantto store contact information in a JSON B in this type of format where you have a phone type anda value. Although if you looked at the gotchas for Postgres video, they would definitely endorsehaving a separate email phone column, a separate email column as opposed to using JSON Bto store it. However, you may have use cases where this is beneficial to do it. And this postfocuses on how you can update a single area of this JSON B without having to rewrite the entirecontents. Now for doing that you can use the JSON BSET function that's part of PostgreSQLand he goes over how to do it so that you can specifically look for and update one of thoseelements you need without having to rewrite the whole JSON b. So if you use JSON B in yourdatabase or applications with PostgreSQL, definitely a blog post to check out.

[00:08:33] The next series of posts are all about new features that are slated to be coming withPostgreSQL twelve. The first one here and these all start waiting for PostgreSQL twelve andactually three posts in a row from Dep and the first one is Report progress of Create indexOperations. Now for me this is huge.

[00:08:56] So finally we get insight into what a create index or create index concurrentlystatement is doing. Because currently you can't really track the only way that I've found to do it ismaybe monitor the size of the index as it's being built. But this commit for postgres twelveactually creates a PG stat table called PG Stat Progress create index. So it lets you know atwhat stage things are. Like you're scanning the table, you're loading the tuples in the tree. So Ilike what he says here and I agree wholeheartedly. This is amazing. Thanks to all involved. Sofor me this was definitely the best thing to see because this would be so welcome when you'rebuilding indexes on Te --rabyte plus databases. The next post is generated columns. Now I also found this veryinteresting. Now this is part of the SQL standard. I'm not sure which SQL standard it is meaningwhat date it was put in, but it lets you establish a computation to apply for a stored column. So itdoes stored column as opposed to virtual. So for example, they have one int, a second int andthen a third int that is calculated for you. So you don't do anything. It's essentially read only toyou, but you say whatever value is in the first column, add it to the second column. So forexample, when you insert into this table, again, you don't insert into the generated column, butyou insert into the other two columns like one and two and it will automatically calculate anupdate value to be three. And then here you can even use update statements to set, okay, setthe second column to five. Now it will calculate that sum again to be one plus five equals six. Sowe will always maintain what calculation you define. And again this is called generated columns.Now presumably in the future they're going to be offering virtualized calculations so it actuallywon't store it. But in this first release for this feature they are doing the stored columns and theytalk about historically you had to do this through triggers, but this mechanism is much moreefficient. Like he did some testing and it was twice as long to use a trigger to do this type offunctionality versus using generated columns. So if this feature is of interest to you, definitelysomething to check out once a PostgreSQL twelve gets released.

[00:11:14] And the next in a series of posts are log all statements from a sample of transactions.So this goes to when you're looking in your logs for PostgreSQL and maybe you have set a logmin duration statement and say it's set to print out every slow query that's greater than asecond. So this is very beneficial to finds slow queries. What this offers is it adds a logtransaction sample rate and here it's set to zero zero one or zero 1%. And what this does is ittakes a zero 1% sampling of all the slow queries. So basically it makes your log much smaller.Now you don't get all the slow queries of course because you're taking a sample, but at leastyou get insight in some example queries that are slow in your system. So of course the benefitof this is less logging. With some of the testing that he did in this post, it resulted in adramatically smaller log. So if you have a lot of transactions going through your database in verylarge log files, this may be a feature you want to check out with PostgreSQL Twelve.

[00:12:21] The next post, again related to Postgres Twelve is New and PG Twelve statistics onchecksums errors. And this is from our juju GitHub IO. Now, we had talked about this in theprevious episode of Scaling Postgres, the data checksums that was introduced in 9.3, but youbasically have to do it at database creation or a cluster creation if you want to enable these. WithTwelve, they are adding a mechanism to enable this on existing clusters. But this post is talkingabout new counters that are available in the PGSTAT database view that makes checksumerrors easier to monitor. So it basically lets you track these checksum failures by checking yourPGSTAT database. And it tracks validation errors both for the backend activity as well as basebackup activity per database. It shows cumulative number of errors and the checksum lastfailure. So again, this is a really great feature that's being added to PostgreSQL Twelve, and ifyou want to learn more, definitely check out the blog post.

[00:13:25] The last piece of content is another YouTube video and it's called the Mother of allquery languages, SQL in Modern Times. And this is from Marcus winend. Now, this presentationis really interesting and kind of opens your or at least opened my mind up with regard to SQLand all the advanced modern features that are being added now. It's a little bit on the cuttingedge because even a lot of the features that are in the standard, even as a few years ago,haven't even made it into all of the major database platforms. Now, PostgreSQL has a fairnumber of the features he mentioned, but some of the like a few of the kind of the mind blowingones aren't even a part of PostgreSQL now. But if you definitely want to get more insight into thepower of SQL and what it can do and what features are probably going to be eventually added toPostgre --SQL as well as other relational databases or actually he didn't like the term relational, but SQLbased databases definitely a presentation to check out.

[00:14:27] That does it. For this episode of Scaling Postgres, you can get links to all the contentmentioned in the show notes. Be sure to head over to Scalingpostgres.com where you can signup to receive weekly notifications of each episode, or you could subscribe via YouTube or itunes.Thanks. --

episode_image