background

Full Text Search, JSONB_AGG, Upgrades, CIS | Scaling Postgres 24

Join Over 1,000 Engineers & Get New Episodes Weekly!

In this episode of Scaling Postgres, we review articles covering full text search, aggregating JSON with jsonb_agg, upgrades and CIS benchmarks.

Content Discussed

YouTube Video

Podcast Audio

Transcript



[00:00:00] In this episode of Scaling Postgres we talk about full text search, JSON, B,aggregates upgrades and CIS benchmarks. I'm Kristen Jameson, and this is scaling postgresepisode 24.

[00:00:21] I hope everyone is doing well this week. Our first art goal is setting up a fastcomprehensive search routine with PostgreSQL. This is from the Rob Connery blog and in it hewants to replicate some of the search features that you can find in applications such as Shopifywhere you do a comprehensive search essentially across multiple areas of the application ormultiple tables as it were. Now this blog post walks through things gradually, so he starts with aregex query and doing a union for the different tables in order to retrieve the results. And foreach of the steps he doesn't explain analyze in order to see how that query would be executedand how fast it would run. Now very quickly he turned to a materialized view. So a view ofcourse is a virtualized table and materializing essentially makes it like a real table and you canrefresh it on a periodic basis. And after setting up the materialized view, he then introduces thefull text searching with the inclusion of a gen index. Now, generally when I've used full textsearch in applications I just use the gen index, I haven't used a materialized view in a largeapplication I'm not necessarily sold on the fact that the materialized view could be refreshed fastenough when you have a lot of data. So I don't know, for a large application this implementationwould be the best choice, but it is an interesting use case because you can just refresh thematerialized view when you want the search results to be able to be viewed by a user. So it wasan interesting use case, but again, for a large application, I'm not sure how well this particularimplementation would work. And then he shows using the full text features to do the particularquery and how he got the execution time again for a small data set, but relatively low under amillisecond. Now he does mention here in the last paragraph, to be honest, it's not really madefor small loose searches like this. That's the full text search where he's just searching for a nameacross a couple of different tables in certain fields. And he continues and it's really easy togenerate a false positive full text indexing really shines over things like blog posts, commentsearches and so on. So I thought this was an interesting blog post in order to look at howmaterialized views could be used with full text indexes and maybe there are some use cases inyour application that they could be used for.

[00:02:48] The next post is Sqltip JSONB underscore AGG in PostgreSQL for simple one tomany joins. And this is from the Geek Uelmi blog on Medium. My apologies for the pronunciationof that, but this is a super short post and really focused on the JSONB aggregate function inPostgreSQL. So essentially he had a use case where he wanted to create an API endpoint thatdisplays items in a list like this. Historically, he would do a simple join. So join from To Do Lists totodo items and it will generate a table like this. But then in his application code, he would thenhave to group it how it needed to be in order to present this output as part of the API. However,using the JSON B aggregate function, he's able to rewrite the query like this and he takes all ofthe To Do items, sets it to JSON B and then aggregates them. In addition, he removes the To Dolist ID from the output of the todo items table. So he gets just the data he wants, the ID, thename field and grouping by the To Do Lists ID. And now as you can see here, it's going to outputthis the To Do list ID, the name as well as all the items in essentially a JSON array. And he says,I quote, I then return that to my API client. No other changes needed. Now, we've mentionedpreviously how some people have used PostgreSQL to directly return JSON to their APIs asopposed to having to do more on the application side to prepare it. So this was a very quickfocus post on how to do that.

[00:04:34] The next piece of content is Meetup PostgreSQL how PostgreSQL's SQL dialectstays ahead. Now, this is a YouTube video from the Liferay Agora channel. However, this is for apresentation that was mentioned in the previous episode of Scaling Postgres that was about 9697 slides long. But here is the YouTube video for and it is over an hour. Now, the audio is notgreat, but if you found that prese --ntation interesting in the content, again, this is the previous episode of Scaling Postgres.Here's a YouTube video for you to check out and get the audio along with that presentation.

[00:05:13] The next post is Upgrading PostgreSQL on AWS RDS with minimal or zero downtime.And this is from the preply engineering blog on Medium. Now essentially this post coversBucardo for asynchronous multimaster replication. So essentially they had AWS on RDS twoinstances, a master and a slave. And they wanted to upgrade them from 9.4 to 9.5, but theywanted to do it with no downtime. So this blog post goes into how they the different options thatthey looked at and those options included BDR, PG, Cluster, Ruby, Grep and Bucardo. And inthe end, due to the different constraints they choose, bucardo and I talked about what Bucardois. Essentially it uses triggers to replicate, inserts updates and deletes between instances. Andhe goes into a lot of detail of how they were setting this up on their staging environment. Againwith AWS RDS instances and how they got it working on staging and test and they say wetested this multimaster replication on test servers and staging and it works really good. But theyhad problems in production so they were having some issues with the replication, not being ableto keep up with their insert update delete operations and they were potentially getting someprimary key collisions and unfortunately they weren't able to do the upgrade with no downtime.So he then says for production we went ahead and did the RDS upgrade with Downtime. Nowthat may seem like a little bit of a disappointment, but their environment is presumably differentthan yours. And if you wanted to pursue this course, perhaps you could get around some of theissues that they were encountering when they attempted this with their production system. Andit's also interesting to note that it really only required, it looks to be about as much as sevenminutes to do the upgrade with the downtime. So a good blog post that goes over how theychose, what solution they were going to be using and how they used it, and then ultimately theissues they ran into and prevented them from using it. But still a good overview of using thesetools in their environment.

[00:07:29] The next post is using the CIS PostgreSQL benchmark to enhance your security. Soessentially Crunchy data here, and this is from the Crunchy Enterprise postgresql.com blog.

[00:07:43] They worked with the center for Internet Security to develop a benchmark and theseare essentially standards that you should follow to help secure your PostgreSQL instances. So ifyou're particularly focused on security, you might want to check out these benchmarks thatthey've put together and they have a link here. You do have to submit email and contactinformation when retrieving it, but again, a great recommendation to check out this guide.

[00:08:13] The next post is designing the most performant row level security schema inpostgres. And this is from the Caleb Brewer blog on Medium. Now in this post he discussespotentially using your database as an access layer. So for example, he's saying if you need todo a select against the database for a particular item for a user, typically you would do this. Youwould say maybe you'd have some sort of created at date collect some items where you are theowner of that item using the owner ID whereas with this implementation using row liberalsecurity for the user that's connecting could you potentially use this type of query, whereasessentially use the access controls and the role level security to only show the objects to thatparticular owner? Now, for my use cases, as an application developer, I don't necessarily seethe solution he's describing as something I would use at this time. He did go over and recorddifferent performance issues he was seeing and for different implementations. So it is kind of awork in progress post. But if you are using row level security or considering it, this is definitely apost to check out to kind of understand where the state of things are with the version ofPostgreSQL that we're at.

[00:09:34] The next post is actually an announcement for a new webinar, an introduction toperformance monitoring for PostgreSQL. And this is on the Several nines.com blog. So I believethis is a webinar that they are hosting. And since this show is Scaling Postgres, somethingperformance monitoring related is definitely a --piece of content to check out. And this is scheduled for August 21. So if you are interested inlearning more about performance monitoring, definitely a webinar to go sign up at.

[00:10:06] The next post is also from the Several nines.com blog and it is PostgreSQL Triggersand Stored Function Basics so basically it talks about what triggers are, what stored functionsare, and what are the use cases for them and what are potentially some disadvantages of them.So of course they go through how do you declare or set up a trigger in a function, and then theytalk about different use cases. So this is a pretty long blog post and some of the use cases theycover are data validation audit logging, which we've seen before in previous episodes of ScalingPostgres, where blog postgres have described using triggers to do audit logging, using them forderived values. So again, this is a use case with regard to full text searching capabilities. Maybeyou store information for searching purposes. So again a very comprehensive post. So if you areinterested or you believe triggers or functions could assist you in your application, definitely ablog post to check out.

[00:11:09] The last post is understanding and reading the PostgreSQL system catalog. Andagain, this is from the Several nines.com blog. And this goes over basically the PostgreSQLsystem catalog and it discusses essentially all the different system views that are available. Sothey talk about PG underscore Database and what are the different fields there? What does itmean? Also PGSTAT database. They talk about PG Stat BG rewriter for checkpoints in thebackground writer PG Stat Activity. That's a very large one. Understanding what queries aregoing on in your system. Currently looking at PG locks to understand what currently locks aredoing. PGSTAT User tables to get information about each of your different tables. PGSTAT UserIndexes again to understand all the indexes in your system. So they go over some of the mainsystem view tables that you would typically query to understand what's going on with yourdatabase. So definitely a blog post to check out.

[00:12:07] That does it. For this episode of Scaling Postgres, you can get links to all the contentpresented in the show notes. Be sure to head over to Scalingpostgres.com where you can signup to receive weekly notifications of each episode. Or you could subscribe via YouTube oritunes. Thanks. --

episode_image