background

Max Connections, SQL Tricks, Fast Text Search, Security & Compliance | Scaling Postgres 110

Join Over 1,000 Engineers & Get New Episodes Weekly!

In this episode of Scaling Postgres, we discuss tuning max connections, different SQL tricks, setting up fast text search and handling security & compliance.

Content Discussed

YouTube Video

Podcast Audio

Transcript



[00:00:00] In this episode of Scaling Postgres, we talk about max connections, SQL tricks, fasttext search, and security and compliance. I'm Kristen Jameson, and this is scaling postgresepisode 110.

[00:00:22] All right, I hope you, your coworkers and family continue to do well, and I hope youhave a great week. This week, our first piece of content is tuning max connections inPostgreSQL. And this is from Cyberdeck. Hyphen postgresql.com. And they're talking about howbest to tune max connections, which is basically the maximum number of connections that youcan use to connect to PostgreSQL. And they mention here the default setting is 100, but three ofthese are reserved for super user connections. So essentially you only have 97. But they cautionagainst setting max connections too high. Now, the first thing they talk about is why you wouldset it too high. And they say, well, first, because it requires a restart to change it, you probablywant to be in the safe side and keep it higher. You may be getting an error message. Remainingconnection slots are reserved for non replication super user connections. So you just raise thelimit. Or the application developers convince the DBA that they need more database connectionsfor better performance. And then they go into actually what the problems are with setting it toohigh. The first one is you're overloading the database, so the more active connections you have,it's going to be actively using all the resources of that server. Now, the thing that I've run into ispeople put 100, 200, 400 connections. But the thing you have to keep in mind is that if you havean eight core server, a 16 core, a 32 core server, each of those connections is a process. And ifthose processes are very actively doing work, you can't have the number of connections too farremoved from the number of cores. Now there is wait time associated with doing network activityor disk access or things of that nature that the cores can be doing something else and they canhandle more connections than just one per core. But you just have to keep in mind all of thoseconnections are going to be using the resources of that database server that you have. So youwant to be cautious and not overload that or overload the memory. And with regard to memory,they talk about insufficient resources for each database connections and that each connectionuses your allocation of work, mem you have or higher than that workmen, depending upon thestatements that need to be run. So each connection consumes memory. So you want to makesure that you don't run out of memory with setting it too high. Now, they give a formula you coulduse to kind of derive how many max connections you should have. But mostly I've had to use itempirically, test out the server, test out the different connections to see what the workloadneeds. But this is definitely one way to look at it and examine it. And then of course, they saywhen you want to start having really high connections, you're probably going to want toinvestigate a connection pooler. Now, a lot of application frameworks have built in poolers butthere's also advantages to having a separate connection pooler. The one that I use the most isPG Bouncer, but there's also PG pool that you can use. And they talk about how you can havemany clients that are connected through PG Bouncer utilizing a smaller number of actualdatabase connections. And you typically achieve this using transaction pooling. So if you want tolearn more about max connections and setting it, here's a blog post to check out.

[00:03:36] The next piece of content is six SQL tricks every data scientist should know. And thisis from towardsdatascience.com and they're talking about some SQL tricks you can use if youare a data scientist, but this is equally applicable if you use SQL in your daily work. The first trickthey talk about is coalesce to basically recode null values into some other type of reference. Likefor example here they coalesce a null value into something called Missing. So when they queryit, it says Missing. Now I should say this, this is for Oracle, so it's not PostgreSQL specific, but Ithought a lot of the tips that they gave here would be beneficial. The next one they talk abouthow you would do a rolling total and accumulative frequency. Well they show it using windowfunctions and a number of these tricks use window functions. So of course they're using this toget the window functi --ons. And then as part of the subquery that they built, they're able to calculate this cumulativefrequency partitioning by this join ID that they set up here. The next trick is find the record withextreme values without self joining. And again here they're using a window function to achievethat with this code here. Next is a conditional where clause. So using a case statement and aformula here to be able to determine what needs to be selected. Fifth is looking at a lag and leadto work with consecutive rows. So here's an example where they are using the previous numberas a reference to indicate differences and six has to do with Python and R. So if you do usethose, this is potentially relevant. So if you're wanting to learn some more tricks and tips workingwith SQL, particularly window functions, definitely check out this blog post.

[00:05:25] The next piece of content is SQL optimizations in PostgreSQL in versus exist versusany all versus Join. So they're looking at different ways to write particular queries. And here theyare looking at some inclusion queries and no matter which way you write it, whether it's using Nne exists or an inner join to get the answer that they're looking for, the PostgreSQL planner isintelligent enough to give the exact same execution plan. So no matter which way you write it,when you do an explain plan, it'll give you the same result. So that is good news, it is able to findthe most efficient path to give the answer that you need. But then they looked at some exclusionqueries. So using not in not all not exist left join whereas and is null. And here they got a coupleof different variations from the plan, but kind of based on what they were interested in. It lookslike the not exists or the left join where data is null, tends to give close to the better performance.Now this actually did report something faster, but they said once they increased the number ofrecords, then the not exist left join one. And the other thing of note was down in the summary,they gave some pretty good advice in terms of developers writing good SQL statements. Hesays first make a list of tables from which the data should be retrieved, then think about how tojoin those tables and then think about how to have the minimum records participating in the joincondition. So always think of what's the minimum number of records I need to pull to do this.That will generally lead to the most efficient query. And then of course, quote, never assume thatthe query is performing well with a small amount of data in the table. So it's alwaysadvantageous to work on a copy of production or production in order to gauge what is a goodquery because the statistics are different and different decisions will be made by the queryplanner based upon those statistics. So you need to have something as close to production aspossible. So this is another good post about SQL to check out.

[00:07:34] The next post is Full text search in milliseconds with Rails and PostgreSQL. Now thisis talking about the Ruby on Rails framework, but most of this post covers PostgreSQL and fulltext search. So they give some comparisons of using like and I Like versus the Trigramextension for similarity searches and then full text search and they show how you can use TSvectors and then TS to queries in order to query and do a full text search. And they're evenincluding rankings here. Now once they look at some of that, they then used a full text searchwithin the Rails application framework. So here they're using a library that makes it easier towork with, and this library is a Ruby DRAM and it's called PG Search. They show you how to putit into your application and how you can do a basic search and what the query actually runs togive you the results of that search, and then how you can actually configure it so you can definewhat columns you're searching against, what dictionary you're using, and even doing aweighting. Now, in terms of optimizing it. And they're using postgres twelve to do this, they'reusing generated columns. So they are showing a database migration where they're adding acolumn that it's doing a generated as stored where they're setting awake. They're converting it tothe English dictionary, two TS vector for the title and waiting that as A in the description waitingat B. So they're doing as much work as possible within the generated column without having toresort to triggers. Then of course, they applied a gen --index on it. They adjusted the application configuration to know to look for that particularcolumn. And after doing that optimization, a particular query that did take 130 milliseconds nowtakes seven milliseconds. I'd be interested to know what difference the generated column hadversus the gen index, but it's still quite a significant speed boost. So if you're interested inlearning more about fulltext search, definitely blog post to check out the next piece of content isactually a YouTube video and it's Webinar security and compliance with PostgreSQL by BorisMayas, I believe. My apologies if I'm pronouncing that incorrectly. This is from the secondquadrant PostgreSQL YouTube channel and this webinar is about an hour in length and it coversall sorts of different aspects with regard to setting up the security of your PostgreSQL installationas well as compliance, namely talking about PCI DSS compliance. So if you have interest in thattopic, definitely check out this webinar on YouTube.

[00:10:16] The next piece of content is waiting for PostgreSQL 13 add logical replication supportto replicate into partition tables. And this is from Dep.com and they're talking how you canactually do logical replication into a partition table. So that's particularly interesting. And this isanother feature that's been added to capabilities with regard to logical replication. So definitelyan interesting new addition.

[00:10:45] The next piece of content is the origin in PostgreSQL logical decoding. So this istalking about logical decoding, basically decoding of the wall files by some other application. Soit reads the wall files and decodes logically what changes have been made for say, doing achange data capture solution and they're talking about the origin, which is defining the origin,where the data came from. So it was logically replicated from what origin and they go through abit of the internals describing it and how it works. So if you're interested in that, check out thisblog post from Higo CA.

[00:11:22] The next post, also from Higo CA, is replicate multiple PostgreSQL servers to a singleMongoDB server using logical decoding output plugin. So there's definitely a specific use case.But if you're interested in that, again dealing with logical decoding, check out this blog post againfrom Higo CA.

[00:11:43] The next post is intersecting GPS tracks to identify infected individuals. Now this iswith regard to the sickness that is happening and using GPS tracking, they're showing a solutionto be able to essentially do contract tracing, it looks like. So setting up data structures withinPostgreSQL, set up sample tracks via QGIS segment sample tracks to retrieve individual trackpoints, and then do the intersection of infected individual to retrieve possible contacts. So ifyou're interested in this type of content, definitely check out this post from CyberTechPostgresql.com.

[00:12:22] The next post is the PostgreSQL Person of the Week is Ilaria Batistan. Please forgiveme for that pronunciation, but if you're wanting to learn more about Ilaria and her work inPostgreSQL and Contributions, definitely check out this blog post.

[00:12:39] The last two pieces of content are from Dep.com, and they're talking about thingswe've mentioned in previous episodes of PostgreSQL about features that are coming in version13. The first post is Allow Auto Vacuum to log wall usage statistics. Definitely advantageous.The second is Add the option to report wall usage in Explain and Auto Explain. So if you'reinterested in those topics, definitely check out those blog posts.

[00:13:09] That does it. For this episode of Scaling Postgres, you can get links to all the contentmentioned in the show Notes. Be sure to head over to Scalingposgres.com, where you can signup to receive Week click notifications of each episode, or you can subscribe via YouTube oritunes. Thanks. --

episode_image