background

Chaos Order, Pub Sub, Petabyte Scale, PgBouncer Connections | Scaling Postgres 90

Join Over 1,000 Engineers & Get New Episodes Weekly!

In this episode of Scaling Postgres, we discuss chaos order, pub sub, petabyte scale analytics and PgBouncer connections.

Content Discussed

YouTube Video

Podcast Audio

Transcript



[00:00:00] In this episode of Scaling Postgres we talk about Chaos Order, pub sub, petabytescale and PG bouncer connections. I'm creston, Jameson. And this is scaling postgres episode90.

[00:00:20] Alright, I hope you're having a great week. Our first piece of content is thatPostgreSQL versions have been released. So twelve point 111.610.9 and some of the nineseries have all been released and it's predominantly bug fixes and improvements so not reallyany security issues. And also they did mention in this coming February PostgreSQL version 9.4will be End of Life. When you get an opportunity, go ahead and upgrade your version. The nextpost is chaos. Order randomizing queries to uncover order dependency. And this is fromSimplethread.com and basically it's talking about the concept that unless you specify an order,there's no guarantees as to what order in which you're going to get returned data from an SQLquery. So for example, this first query we are asking for one user from the users table, there's noguarantee that you're going to get ID One or ID five. You could get ID three, you have no idea.Now in a development environment maybe you're going to get it to see some consistency withregard to this, but there's no guarantee what you're going to be returning. Therefore you shouldalways use an order by. For example if you have an incremental ID such as this, maybe youwant to order by the ID or if there's a date and timestamp, maybe you want to order by the dateor timestamp that's available. But this blog post goes into a way to actually insert randomnessinto your tables to be able to sleuth out potential errors that could be caused by not using orderby for all your queries. So this imagine a very large code base and you're wanting to say is itresilient to these changes in order by potentially. So he has this PostgreSQL function thatactually adds a random column and then you can cluster this table by this random column tochange the actual order of the table because clustering actually changes the physical order ofthe rows on disk. So if you order by this random value you get to see when you do select allfrom users that the ID is now essentially randomized. Now if you do new inserts and things ofthat nature you're going to have to reorder it again. But still this is a tool to test and make surethat your applications and really all your queries are using order buys when necessary to ensurethe proper orders being returned. So if you're interested in learning more about that, definitely ablog post to check out. The next post is system design hack. Postgres is a great Pub sub andjob server. So this is from Layerci.com and it's talking about how they've chosen to usePostgreSQL as their Pub sub server and even potentially a job server. And it says you have a lotof options. So they mentioned Kafka roberta MQ. Redis Pub sub Cloud Solutions But you canuse Pub sub with Postgres. And how they've specifically used it is that they have CI workflows,continuous integration workflows they run through and they have a certain status. So an APIcreates a CI run that has a status of new, and then a worker grabs that new piece of work andchanges the status to init and then follows up with the status of running. So they show herewhere they have created their own type with specified states as an enum and all the differentstatuses that it can be. New, initializing, initialized running, success error. And then they have aCI jobs table. And on an API call, a new row is inserted into this table. And then a worker claimsa job by doing an update, setting the status to initialized. And they're choosing the IDs and usingfor update skip locked to do it. So they're skipping those rows that are already locked. And thenthey have a function that actually publishes to the CI Job status channel once the status hasbeen inserted or updated as a part of the trigger on that table. And then they show the clientcode that actually consumes or monitors this channel that then picks up the job to do the work.And they have a graph here that says an API calls doesn't insert into Postgres that fires thetrigger, that notifies the Postgres channel that they've created. And then the innet worker isnotified by that channel that it's listening on and it claims the row with an update, setting thestatus to initializing. And then work continues from there. And then they go into some of theother benefits. So if you're interested in using --Postgres as a Pub sub platform, definitely a blog post to check out.

[00:05:07] The next post is architecting Petabyte scale analytics by scaling out postgres onAzure with the Citus extension. This is from the techcommunity Microsoft.com. Now there seemsto be more so of a marketing piece describing the Cytus extension, but this kind of goes into thestory of the Windows Update platform where they release all of their Windows patches for theiroperating systems and the analytics that they use to track it. And this was before Cytus becamea part of Microsoft, and it describes how they were looking for a solution for their analytics andhow they identified the Cytus extension and PostgreSQL as a potential solution for the analyticalproblems that they were dealing with. So it definitely talks a lot about the business case and kindof how they chose this solution. And of course they eventually purchased them. So clearly it wasdefinitely working out. But it goes into some of the architecture that they use for doing theanalytics for the Windows updates along with the Citus cluster. And then they're also talkingabout building it on Microsoft Azure since it's now supported on azure now. So definitely aninteresting piece of content to check out and read over.

[00:06:21] The next post is actually a YouTube video and it's called Discover What's New inPostgres and Advanced Server Twelve. And this is from the Enterprise DB YouTube channel.And this is a webinar that discusses some of the new features in Postgres Twelve.Predominantly these eight listed here partitioning improvements, b tree improvements,multicolumn most common value statistics inline many CTE queries, prepared plan control just intime, compilation checksum control and re index concurrently. And I believe most of thesefavorite features are from Bruce momgm. So if you want to find out more and get a discussionabout it in a video format, definitely a presentation you can check out.

[00:07:06] The next post is PG Bouncer connection Pooling what to do when persistentconnectivity is lost. Now. This is from Enterprise DB. It's the second post in the PG Bouncerseries where they did an install of it. So this is saying what happens when you do a query againthrough PG Bouncer and you bring down the connection for a minute. And basically whathappens is it hangs. But eventually once the network interface is back up, it continues and doesthe query. But what happens if you actually restart the database, not just bring down thenetwork? Well then you're going to get a fatal error that mentions the server connection crashed,server closed the connection unexpectedly. Now they say one way to get around this is you canactually change the pool mode of PG Bouncer to transactional. So it only presents this messagein the cases where you're using Session Pooling. Now they do say that they have a server resetquery that you can adjust if you're wanting to use Session Pooling. But most use cases I've seenof PG Bouncer are that people want to use Transaction pooling, but with a Transaction poolingenabled you'll see that you won't get that error message anymore. So we did a query, did arestart of the server and then the next query there is no error message.

[00:08:23] The next piece of constant is also related to PG Bouncer and it's webinar use SSLwith PostgreSQL and PG Bouncer follow up. And this is a webinar given by secondquadron.comand it is pretty comprehensive webinar about an hour in duration that talks about using SSL withPostgreSQL. About the first 20 minutes are general SSL teachings explaining the purpose ofSSL and certificates and keys and things of that nature. And it's around the 20 mark that theystart getting into some of the PostgreSQL configurations. And then around about the 45 minutesmark they talk about PG Bouncer specifically and configuring that to use with SSL as well. So ifyou're interested in that type of content, definitely a webinar to check out and you can justregister for using the link right here.

[00:09:14] The next post is waiting for PostgreSQL 13. So they're already on working on 13 andit's introduced the force option for the Drop database command. So before, if you try to drop adatabase and you have a connection active, it won't let you. Personally, I kind of like that as aprotection. But now they offer the option to use force to do it. So you can do a Drop databasedatabase name with force and it will go ahead and dro --p the connections and then drop the database. So another feature coming with PostgreSQLversion 13 and another feature is waiting for PostgreSQL 13 allows sampling of statementsdepending on duration. So this enables two new configuration options log min duration sampleand log statement sample rate. So the setting logman duration statement has a higher prioritywhen compared to the log min duration sample. And whether that allows you to do as itmentions here is that you can use these new settings to, for example, log, as he says here, log10% of queries over 100 milliseconds and then all queries above 500 milliseconds. So it givesyou more granularity with regard to how you want to sample and track slow queries. So definitelya great feature addition for minimizing log files. And both of these two posts waiting forPostgreSQL 13 are from Dep.

[00:10:42] The next post is OpenStreetMap to PostGIS the basics. This is from CybertechnPostgresql.com and this blog post describes how to import OpenStreetMap data into PostGIS forthe purposes of visualization and analytics. So if you're interested in doing that, definitely a blogpost to check out. And lastly, st underscore subdivide all the things this is from the Paul Ramsayblog at Cleverelephant CA and he's talking about the scale of a geospatial data and how trackingall the points of Canada is represented with over a 68,000 point multi polygon. And in terms ofanalysis, subdividing that using the St subdivide function enables speeding things up. So ifyou're interested in doing that, definitely a blog post to check out.

[00:11:38] That does it. For this episode of Scale Postgres, you can get links to all the contentmentioned in the show Notes. Be sure to head over to Scalingposgres.com where you can signup to receive weekly notifications of each episode, or you can subscribe via YouTube or itunes.Thanks. --

episode_image