Using JSON, Procedures, Concurrency, Kubernetes

In this episode of Scaling Postgres, we review articles covering how to use JSON & JSONB, procedures, deal with concurrency issues and Kubernetes.

Content Discussed

YouTube Video

Podcast Audio

Transcript

[00:00:00] It. In this episode of Scaling Postgres, we talk about using JSON procedures,concurrency issues in Kubernetes I'm Creston Jameson and this is Scaling Postgres, episode21.

[00:00:21] Alright, so we have a lot of content this week. I'm going to try to move through itmaybe a little bit quicker than normal, but definitely going to try to cover what I feel is kind of thebest pieces of content for this week. So, starting off we have Asynchronous notifications inpostgres. And this is from the Citizen 428 blog. And this post is basically talking about the notifyand listen capabilities of postgres. Basically a Pub sub notify being a method of publication ofsome event and listen as a way to kind of subscribe and listen for those events. And this is greatbecause I believe it's done by a developer and he covers how to set this up with a particularapplication and it's presented in a very clear fashion. So, for example, he's focused on a table oforders and just has a few columns here and he wants to set up a notification when a new ordercomes in. So he creates a notify event so he creates a notify event function and the first thingthe function does is it wants to say if it's a delete. He wants to basically capture that old recordthat's deleted, otherwise get what the new record is. And from this he's going to build a JSONpayload. So he's taking the record doing a row to JSON function to store it in data. Soessentially using a JSON field, he's determining the action is, is it insert, update, delete and thenthe table that's impacted. And then he essentially performs a PG notify to an events channel asit were, sending the payload that he built as text. Now on the orders table, he creates a functionthat calls this procedure. So for each row, execute the Notify event procedure. And then he setsup a listen for essentially the events channel and he does an insert with some values. And youcan see what the Asynchronous notification payload is. You see the table he defined as orders,the action defined as he did an insert.

[00:02:28] And then all the data that was insert is located in this JSON column. Now of course, itgets more interesting when you're listening from a separate process. And in this example, he'susing Ruby and using the SQL gem and simply setting up a listener that loops over andessentially waiting for new notifications on this listen channel. And basically when an inserthappens, you get a JSON notification being printed out. Essentially the payload gets printed outfor every insert done to that table on this separate process that's running. And then to make iteven more interesting, he actually adds a front end. And again. He's using Ruby. So he's chosento create a simple rack app with Faye. And essentially his configuration here enables him to printto the console when a particular event happens. Now if you're using this with your applicationyou can envision that when an event comes in you could trigger like a JavaScript event thatappears in your application. So this wasn't too long a blog post, but he really goes into each ofthe steps that you would need to do and how simple it could be to set up this kind of pub subevent system using Postgres's. Listen to notify features.

[00:03:41] The next post is audit logging using JSON b and Postgres. And this is from theDinesh Patel blog pateldinish.com and he talks a way of different ways of auditing PostgreSQLand he talks about different ways of doing DML auditing so what inserts are happening, updates,deletes, etc. And and he says one method is to use a trigger that then updates or inserts newdata into a table. Another way is to actually turn on high level of logging in PostgreSQL. But Iactually comes up with an approach that is similar to the last post we looked at using JSON Bfields to store the content of the data that was changed. So first he creates just a simple userstable tracking basic user information, he then creates a separate schema called audit andcreated a table in that schema called Users underscore Audit that has a timestamp basicallywhen it happened, what operation? Again similar to the previous post, the username that did itand then he's actually recording before the modification what did the row look like, and then afterthe modification what did the row look like. He then creates a function and if it's an insert heuses two JSON B for the new row and inserts it in the after. If there's an update, he again u --ses two JSON b. The old goes in the before column and the new goes in the after column.And in the case of a delete he just takes what the old row was and puts it in the before column.Because there's no after with the delete. He then of course uses this function and for each rowrun this function and creates a trigger for it. And then you can look at what this audit table lookslike. He inserts some rows and then does a select from the user's audit table where in the aftercolumn the user ID is 101 and this is what it looks like. You get to see the before and after for aninsert in what user performed that insert and then he goes into creating gen indexes to makeaccess more efficient. But definitely an interesting take on how to do auditing using JSON Bfields and pretty simple to do if you're looking to do that. The next post is also related to JSONand it's a video that is introduction to JSON data types in PostgreSQL. Now this is about a 40minutes presentation from one of the principal contributors of the JSON functionality inPostgreSQL and I found the video pretty informative. However, it took a while to kind of get up tospeed, but if you want some introductory information, I suggest watching the whole video. Butfrom about the zero to five minute mark, they're talking about JSON structure in general,JavaScript object notation. About the five minute to ten minute mark, they talk about whatfeatures Postgres has added for JSON support over the past number of versions over the lastfew years. About the ten minute mark to the 15 minutes mark, they talk about specific adviceand that he suggests use cases for it. And then starting around the 15 minutes mark talks aboutactual code and using the functions and the types of things that you can do with it. So, definitelya video. I suggest you check it out if you want to learn more about the JSON capabilities inPostgreSQL. And again, I should say this is from the second Quadrant.com blog.

[00:07:12] The next post is using procedures for batch geocoding and other batch processing.So for a while PostgreSQL has had functions and we've just seen examples in the content thatI've presented where you can do a certain amount of work. Well, in version eleven ofPostgreSQL, they're offering procedures. So what's the difference? So functions do all their workand commit, or if there's a failure, essentially it rolls back everything. You can't do intermittentwork and commit it and then roll back to some other state if there's a problem. However,procedure supports doing begin in commit or rollback commands within the procedure. Now, Inever thought of this before, but as this post points out and discusses and talks about is that thisis a great feature for doing batch processing. Because if you have a really long running task, theproblem with a function is that it can easily fail and then you need to restart it from a certain pointor essentially it's all or none. Whereas with a procedure you can incrementally commit certainwork that's been done and then come back to it if you need to, or re execute it and correctissues. So, definitely interesting blog post to check out if you do this type of batch processing orgeocoding with your application. And this is from the Postgresql.com blog.

[00:08:34] The next post is a beginner's guide to PostgreSQL's update and auto vacuum. Andthis is from the CyberTech Postgresql.com blog. And this kind of goes back to how Postgreshandles updates and how in their experience dealing with clients, some applications are nottaking into account basically how PostgreSQL handles updates and how an update actuallycreates a new row, retains the old row, so that other connections can still see that informationuntil everything has been committed and updated. And as a consequence, if you have a lot ofupdates going on, you can really increase the number of rows that are in your table. That is untilvacuum comes around. To clean up the rows. But even vacuum can have issues if you have avery high update rate, particularly with long running transactions. So for example, it's saying youhave the first connection here. You start a transaction, you do a delete, it's running, doing things,a second connection comes in, it can see the rows, even what's been deleted here. And if yourun a vacuum at this stage, it doesn't delete anything because nothing's been committed. Soessentially it has to wait until these rows are committed before it can actuall --y vacuum and potentially allow space to be available. So they have a quote here next to a rulethat says long transactions can delay cleanup and cause table Bloat. Basically, your table keepsexpanding in size, whereas the data size is actually relatively small. And plus you need to takeinto account that just running a vacuum or an auto vacuum doesn't necessarily shrink the size ofthe table, it makes space available, but it doesn't literally try to compress all the space in it. Youactually need to do a vacuum full, which locks all access to the table, or there's a couple ofdifferent tools you can use to actually compress it while access is happening. Lastly, it also goesinto the dangers of when you have particular row and you're having massive updates per secondthat can really hammer the system and prevent vacuum from doing its job unless you reallyhighly tune it. And in some cases it's a lot easier to where they say here, quote, avoid updatealtogether and instead change your data structure so that you're predominantly doing Inserts. Soif you're doing predominantly Inserts, you don't have to worry about concurrency, you don't haveto worry about vacuum necessarily keeping up as much. And if you take this technique, thentypically you would want to partition that data by some sort of date. Then you can simply dropthe old tables when you need to, as opposed to storing all of this data and doing frequentupdates. So if you're unfamiliar with how postgres's update and Vacuum and auto vacuum worktogether in terms of planning how you want your application to function, this is definitely a blogpost to check out.

[00:11:27] Now, related to that, the next blog post is Modeling for concurrency because it goeshand in hand, kind of what we were discussing previously. This is from the Tap Oueh.org blogand this follows his PostgreSQL Concurrency, Isolation and Locking series where he's talkingabout how best to model for concurrency. And basically he has the same conclusions that theprevious post had whereby as opposed to updating particular actions with how many in hisscenario he's using Tweets as a part of his series. And when you want to track how manyretweets are happening or favorites or whatnot to avoid concurrency issues, you actually want touse Inserts as opposed to updates. So he proposes a structure here for converting from updatesto inserts, so it gives a more practical example of how to do this switch that was discussed in theprevious. Basically focus on using inserts as opposed to updates because that definitely helpswith the concurrency. And also table Bloat and auto vacuum issues. Now, related to this, if you'redoing a lot of inserts, you're going to be creating a lot of data and when you go to query thosemany, many rows, it can take a while to retrieve results potentially. So in the next post he talks alittle bit about that. So he's talking about computing and caching. And again, this is from the topblog. And with regard to caching, he's talking about some use cases you can use. So views hetalks about as a way to efficiently query exactly what you want. However, it always goes to theraw table, so not a lot of caching is involved there. However, he does talk about materializedviews. This is where you actually create a separate essentially summary table from your viewand you can refresh it on a periodic basis. And this is what he means by caching. So you couldrefresh a materialized view, essentially have cached data to be able to make queries ofpotentially Tweet counts more performant. So definitely these are two blog posts talking aboutconcurrency that are related to the previous thing that can give you some ideas on how to bettermodel your data. The next post is actually two parter. So the first one is should I run postgres onKubernetes part one? And the second is should I run postgres on Kubernetes part two. So thefirst one covers why would you potentially want to do it. And just to run over the summary, it saysbasically if you're already using Kubernetes for your application, then it can make sense to dothis for postgres as well because it helps you by using it helps you adopt a more DevOpsworkflow where each development team owns the database. And perhaps these databases arerelatively small. And he talks about the reference to zero brain cell admin for small databases. Inother words, if you have pretty small databases for an application, like he says, the vast majorityof --Heroku's hosted databases are less than 1GB, well then you wanted to make it easy aspossible to manage these databases and perhaps Kubernetes is the way to do that. You justneed to deal with the state issue because you need to maintain state within the Kubernetesinstance or instances for your database. And he says one thing about running postgres onKubernetes is that you could potentially have an easier path to high availability because it has anumber of built in features to handle coordinating multiple PostgreSQL instances. Now in thesecond post he talks about some of the downsides. Number one, if you're not a Kubernetesshop, you probably wouldn't want to do it because you have a whole nother level ofinfrastructure to learn.

[00:15:13] The second is that everything is alpha. So Kubernetes is still moving fast and there'spotentially risk for things not working. So if you're looking for a stable environment, it's probablynot the best thing to move to Kubernetes. The other thing he mentions performance. So eventhough Kubernetes is supposed to be particularly performant, if you're looking to squeeze themost performance out of your database, there's still an overhead to running it. And then ofcourse, he's talking about large databases running in a Kubernetes environment. Environmentmakes it a little bit more difficult to work with and you may not want to choose that path. Sothese are two interesting blog posts to check out related to using postgres with Kubernetes. So Idefinitely suggest checking that out if you're considering doing it.

[00:15:59] The next post is gracefully scaling to 10,000 or ten K postgres QL connections for $35a month. Part three. Now, in previous episodes of Scaling Postgres, I talked about part one andpart two, and it's basically talking about using PG Bouncer to scale your connections forpostgres. Now this third part, he addresses how to basically do it using Kubernetes and helm. Sobasically not running PostgreSQL on Kubernetes, but PG bouncer. So I mentioned this becausejust to close out this post series and it talks about potentially setting up this new technology helmalong with Kubernetes to be able to run multiple PG Bouncers in front of your PostgreSQLinstance. So again, if you are interested in Kubernetes as it relates to PostgreSQL and PGBouncer, definitely a blog post to check out. And this is from the FutureTech Industries blog onMedium.

[00:16:56] The last three posts are all from the several nines.com blog. The first post is customtrigger based upgrades for PostgreSQL. So I find the intro part of this I actually found the mostinteresting. So it said first rule, you do not upgrade PostgreSQL with trigger based replication.Okay? Second rule, you do not upgrade PostgreSQL with trigger based replication. Third rule, ifyou upgrade PostgreSQL with trigger based replication, prepare to suffer and prepare well.

[00:17:34] And then the other great part here he says, is that there must be a very seriousreason to not use the PG upgrade for upgrading PostgreSQL. And that's basically what I tend touse.

[00:17:47] If you can't afford more than a few seconds of downtime, then use PG Logical. And Ihave presented in previous episodes of Scaling Postgres how I believe the second quadrant hasused PG Logical to do a version upgrade with their system using this. But then if you can't usePG Logical, then use this. If you can't use this, use this. So it's definitely good advice to take. Butthen at the end here, they talk about how you could probably use trigger based upgrades forPostgreSQL. And that basically is what this post talks about. So, definitely an interesting post totalk about and good advice to take.

[00:18:24] The next post from their blog is PostgreSQL Log Analysis with PG Badger. So this isa way of analyzing your logs. The first thing they talk about is how to set up PostgreSQL toessentially log the most information, like turn on all the logs, and then how to use PG Badger tobasically track the status of your PostgreSQL. So if you don't already have a solution for that,this is definitely a blog post to check out. And the last post from their blog is Integrating Tools tomanage PostgreSQL in Production. So again, related to their PG Badger post, they talk aboutdifferent tools you can use for management deployment, how to do backups, different backupscenarios that are available, and they specifically go into Rep Manager to talk about some of itsapabilities. So definitely another blog post to check out.

[00:19:18] That does it. For this episode of Scaling Postgres, you can get links to all the contentpresented in the show notes. Be sure to head over to Scalingposgres.com where you can signup to receive weekly notifications of each episode, or you can subscribe via YouTube or Bye. I'm.--

Using JSON, Procedures, Concurrency, Kubernetes | Scaling Postgres 21

Join Over 1,000 Engineers & Get New Episodes Weekly!

Content Discussed

YouTube Video

Podcast Audio

Transcript