background

Full Text Search, Query Optimization, Exception Blocks, Procedural Language | Scaling Postgres 100

Join Over 1,000 Engineers & Get New Episodes Weekly!

In this episode of Scaling Postgres, we discuss full text search, a process for query optimization, caution with exception blocks and adding a procedural language.

Content Discussed

YouTube Video

Podcast Audio

Transcript



[00:00:00] In this episode of Scaling Postgres we talk about full text search, queryoptimization, exception blocks and procedural language. I'm creston. Jameson. And this isscaling postgres episode 100.

[00:00:22] I hope you're having a great week. Well, welcome to episode 100. So afterapproximately two years of doing this, we are at the 100th episode. I didn't prepare anythingspecial for because unfortunately I was quite busy this week. But I'm definitely here with thecontent that I was able to find this week interestingly. The YouTube channel is very close to 2000subscribers so that's another milestone it looks like we will be hitting soon. But our first piece ofcontent is a YouTube video. The state of full text search in PostgreSQL twelve. By JimmyAngelacos And this is actually on the Peter Cooper YouTube channel and this does exactly whatit says. It explains different ways of searching text, especially the full text search capabilities ofpostgres and how you can basically use it and the different features that are available within it.Now you also may find it beneficial. There is the slides here that I will provide as a separate linkand it discusses the different contents that are part of the presentation in terms of operators,functions, dictionaries examples, indexing, type of indexes to use. Interestingly, had like a 4000fold improvement adding a gen index to one of those examples he had talking about non naturaltext searching, collations other text types as well as maintenance vacuum becomes veryimportant, particularly with different index types as well. Now, one thing I liked about thepresentation is with the advent of postgres twelve is using a generated column. So using agenerated column to actually build a TS vector of the data that you want to search on so it'sautomatically maintained and then you could just index that field for full text search purposes.But if you're interested in full text search, I highly encourage you to check out this presentationbecause it has a lot of great information.

[00:02:22] The next post is Introduction to Profiling and Optimizing SQL Queries for SoftwareEngineers.

[00:02:29] And this is from the Scope blog on Medium. Now this is not meant to be Python orpostgres specific, but those are the examples he's using. But he also speaks more generallyabout how these techniques can be used and it's mostly from a developer's perspective. So youwant to speed up a slow query. How would you do that? And the first thing he discusses is howdefined them. So you can look at the database slow query log, that's one method to do it. Youcould also use PG Stats statements in postgres to look for some of the different queries there.And he talks about other places you can look for it in certain middleware or application logs. Andhe has some Python or Django specific tools you can use, or even application performancemanagement platforms to be able to track and identify slow queries. He shows how you can doit. So once you find it, how do you profile it? And he talks about Explain and Explain Analyze andthe difference between them. Basically Explain Analyze actually runs the query and he has anexample, pretty comprehensive query he's showing here, and then what an Explain Analyzeoutput looks like. Now someone could be quite intimidated by it, but this query is prettyintimidating in and of itself. But he actually shows a visual representation and talks through howto look through the Explain Analyze output to be able to determine what's going on and whatcould be the slow component of it. And he uses the website Explain Depom to give you agraphical representation of your Explain plans. And then he discusses a particular thing to adjustor an index to add that would improve the performance and then even a way to test theperformance with using a transaction and see what the differences are. So this is more on thebasic level, depending on your level of database knowledge, but definitely good information fordevelopers if they're wanting to find out how to optimize slow queries in their database.

[00:04:37] The next post is the strange case of the exception block. And this is from Pgdba.organd he's talking about postgres functions and exception blocks where you begin something andthen if something has an exception, you can define what exception will trigger it and then you'regoing to execute some other code instead when this code here errors out. So he's talking --about exception blocks within it. And he had a very unusual thing happen where the Xidswere being incremented. So he actually replicated this like setting up a unique constraint, andthen said when you hit it, basically you don't do anything. But what he noticed is that when this isrunning, I believe the loop is 1000. It actually increments because he's looking at the frozen XIDage here. It's actually incrementing the Xids even though absolutely nothing has happened, nodata has been added, basically just burning Xids. And he says, quote, whether the exception ishandled or not, the DML consumes an XID every time it's executed. And he also makes note ofhere this kind of behavior that I'm struggling to find documentation for is what caused the oneand only emergency shutdown to prevent XID wraparound I had to deal with in my career. Sothis is definitely something to be aware of, that this can happen with exception blocks withinfunctions in PostgreSQL. Because actually what seems to be happening is that these aresubtransactions and transactions are essentially running with these exception blocks when thatcode is executed. And this is further seen in a second post discussing this plpgsql exception inXids from Fluco 1978 GitHub IO. And he mentions here, quote I think Plpgpsql is usingsubtransactions or save points to handle exceptions. So whenever those exception blockshappen, they're going to use a transaction ID to handle it. Or not. And he actually does adifferent implementation where he has set up a function and he's actually doing an output andtracking what the XID does with the different behavior using TXID current if assigned TXIDCurrent. And his assessment is it that exceptions are quite clearly implemented in Plpg SQL andpossibly in other languages by means of subtransactions. So if you're using Postgres functionsand using exception blocks, this is just something to be aware of because if you have a functionrunning very fast and erroring out, you could be burning through your Xids for that particulartable. So just something to be aware of.

[00:07:25] The next post is creating a PostgreSQL procedural language, part One setup. So thisis a setup to add a new procedural language to use with Postgres. So by default, as part of thestandard distribution, there's plpgsql, pltcl, PL perl and PL python. Well, they actually wanted toadd a PL Julia programming language, so the first step they want to do is actually create anextension for it. So they wanted to create a control file that defines the basic properties of theextension, a SQL file that creates the extension's objects, a C file for the extension itself, and amake file to build the extension. And he has the exact code used here for setting up PL Julia asa new procedural language, the SQL for the functions, the C code, as well as the make file. Sothese are basically the setup, the bones for getting this set up. And we'll have to consult part twoin order to see how this moves forward. And this is from the second Quadrant.com blog. Thenext post is an overview of job scheduling tools for PostgreSQL. Now, last week I believe wetalked about a PG timetable as one means of doing scheduling, and they're discussing threeothers listed here just using a general Cron tab in Linux, which is what I tend to do. There's alsothe agent called a PG agent, which I believe requires PG Admin. I believe for. So that'ssomething to take into account. And the last thing they mentioned is the extension PG Cron, butthis looks like it can only operate on things within the database itself. So store procedures, SQLstatements and PostgreSQL commands. So I don't believe it can run jobs outside of Postgres,but these are three options for managing jobs for your PostgreSQL installation. And this is fromthe several nines.com blog.

[00:09:28] The next post is Can PG Bouncer session survive everything we throw at it? So this isa fourth in a series of blog posts from Enterprisedb.com covering PG Bouncer and its connectionand pooling capabilities, predominantly using the EDB Failover Manager, which is kind of like avirtual IP manager and has some other features. And this goes through in depth on how to setthis up with the Failover Manager for PG Bouncer and doing different tests of failure to see whatsurvives and what doesn't. And at the bottom here they have the general conclusions of what ispossible. So if you want to learn more into handling failover of PG bo --uncers, definitely a blog post to check out.

[00:10:17] The next post is how to migrate from trigger based partitioning to native inPostgreSQL. So this is going from the trigger based partitioning to basically the declarativepartitioning that was set up in postgres ten improved in eleven, and then of course twelve. Now,it looks like a lot of these instructions are with regard to PG partman, so it doesn't look like it'susing just only trigger base, but a lot of the instructions are if you've used Pgpartman andmoving to it. And it looks like the basic process is creating new parent tables and then detachingthe child partitions from the current primary, attaching it to the new declarative partition schemeand then doing some renames. But they go through the whole process here in this pretty longblog post. So if you are interested in doing that, definitely blog post to check out fromCrunchydata.com.

[00:11:13] Also from Crunchydata.com is a next post guard against transaction loss withPostgreSQL synchronous replication. Now, this does discuss synchronous replication, so thereis some knowledge to be gained about what it is and how it works. Generally, when you set upreplication, by default it is asynchronous, meaning that transactions are committed and saved onthe primary database and then they're sent off to the Replicas or streamed to the Replicas.When you set up a synchronous replication, the write doesn't get fully acknowledged to the clientuntil it's written to two synchronous systems, or it could be more. And they have a great graphhere that displays this. So the client sends some data to the primary, but the primary, beforeacknowledging, sends the data to the synchronous Replica. The synchronous Replicaacknowledges to the primary and only then does the primary database acknowledge success tothe client. So you're always going to have data in synchrony. So the client does not getacknowledged until it's written essentially to both places. Now there's a performance hit for doingthis, of course. And with the stock synchronous replication PostgreSQL, if your Replica goesdown, essentially no writes happen on the primary because it can't do asynchronous write. Sothat's something to be aware of. Now, what this blog post discusses is actually their PostgreSQLoperator 4.2. So this whole post is written from that perspective. So that's something to keep inmind if you want to look at this. It's not about setting up synchronous replication necessarily, it isdoing it through the postgres operator, which does postgres set up in Kubernetes, but it doesn'tjust basically tell you how it is set up normally. So you can see here they're using the postgresoperator commands, PGO, et cetera, to set things up. So if you want to learn a little bit moreabout synchronous replication, but especially if you're wanting to use their PostgreSQL operatorfor Kubernetes, then definitely it's the blog post to check out.

[00:13:21] The next piece of content is a YouTube video and it is Oracle to Postgres SchemaMigration Hustle. And this is from the Enterprise DB YouTube channel and they're coveringconsiderations of migrating from Oracle to Postgres with regard to migrating schemas and all thedifferent data and objects that need to be passed over. So if you have a need for that, definitelya webinar you'll probably want to check out.

[00:13:48] The next post is how to use the KNN machine learning model with two UDA.PostgreSQL and Orange Part One Well, this post is about machine learning capabilities withsomething I've never heard of before. But if you're interested in machine learning usingPostgreSQL and Orange, this is a blog post to check out from second quadrant. The next post ishow to run a clustering algorithm within PostgreSQL. And this clustering algorithm is essentially,they say here the well known K means. So if you have a need to use that in PostgreSQL, youcan check out this blog post.

[00:14:26] And the final blog post is that a new NPG Twelve New Leader PID column in PG StatActivity so there's a new Leader PID column in PG Stat Activity that it says tracks the PID of thegroup leader used with parallel queries. So if you have an interest in doing that, definitely a blogpost to check out.

[00:14:48] That does it. For this episode of Scaling Postgres, you can get links to all the contentmentioned in the show notes. Be sure to head over to Scalingposgres.com where you can signotification occasions of each episode. Or you could subscribe via YouTube. Rightunes, thanks.--

episode_image