background

Data Loading Speed, View Dependencies, Users & Roles, H/A Clusters | Scaling Postgres 81

Join Over 1,000 Engineers & Get New Episodes Weekly!

In this episode of Scaling Postgres, we discuss data loading speeds, view dependencies, users & roles and high availability clusters.

Content Discussed

YouTube Video

Podcast Audio

Transcript



[00:00:00] In this episode of Scaling Postgres we talk about data loading, speed, viewdependencies, users and roles in high availability clusters. I'm Kristen Jameson, and this isscaling postgres episode 81.

[00:00:17] Microsoft.

[00:00:22] Alright, I hope you're having a great week. The first article is data loadingperformance of postgres and timescale DB. And this is from the blog@ceoho.net called StuffAbout Things, and this is basically a performance investigation looking at comparisons of insertperformance between postgres and timescale DB and mentions Simon Riggs at secondquadrant prompted him to look at the performance of loading a lot of data between postgres andtimescale DB. And I believe it's in reference to this article on the TimescaleDB website, wherethey see degraded insert performance over time. As you add insert millions of rows into thedatabase and approaching essentially billion rows, they show timescale DB does not degradesignificantly, whereas PostgreSQL does. So this blog, Post actually investigates thatindependently, I guess you might say, to see if this can be seen. And it looks like they use twodifferent instance types, one a high memory, one an R five and one a CPU. High performanceone compared version eleven of PostgreSQL and the twelve development version as well astimescale DB 1.22 and 1.3, and looked at the performance of loading 4 billion rows in bothstandard and partition tables. And what they actually got on the high CPU one was 320,000 rowsper second for postgres and 225 rows per second for timescale DB. And this is an ingest rate ofabout 100 gigabits per hour he indicates here. So didn't really see this drop off as indicated inthis. Now this may be a different version, maybe they're using version ten of PostgreSQL. I don'tquite remember where and I don't see where they're saying what version this is. But he goesthrough the test scenario and then shows how he did the configuration and then shows theperformance here. So you could see 1.22 of timescale DB is in around the average of 183,000rows per second, whereas different variations of whether it's a standard table or partitioned intomonthly or weekly, both the twelve development version and eleven version are all north of200,000, approaching as high as 268,000 rows per second. And then looking at the high CPUcapability instance here, it looks like he just used version eleven of PostgreSQL standard tablesgot up to 325,000 inserts per second, whereas the partitions were a bit lower in timescale DBaround the 220,000. So I thought this was very interesting that he did not see the same thing.Now again, the tests are different, the methodology may be different, and of course there's morethan just ingest rate if you're working with time series data. But if you're evaluating differentdatabase systems and wondering do you want to use something like timescale DB, maybe itmakes sense to go ahead and do your own benchmarks with PostgreSQL versus other solutionsyou're evaluating to see if that gives you better performance or not and what the difference is.So definitely interesting blog post to check out.

[00:03:38] The next post is tracking view dependencies in PostgreSQL So this is talking aboutwhen you have a table you've created and you create a view on it. If you try dropping the tableit's going to give you a dependency error because it cannot drop the table because other objectsdepend on it, the view. And you can't actually alter a table and drop columns from it as wellbecause you can't drop the column because the view depends on it. So if you have the scenariowhere they're mentioning here, you want to go from an integer to a big int, you actually have toremove the view, make the change and then add the view back. And they mentioned some verygood best practices for views here and they mentioned views are good for two things. One, theyallow you to have a recurring SQL query or expression in place for easy reuse. And two, theycan be used as an interface to extract from the actual table definition so that you can reorganizethe tables without having to modify the interface. And they say two patterns here that they seepeople sometimes use that kind of causes problems if you have views within views within views.So it makes it really hard to understand what's going on or denormalizing the whole databaseinto a worldview and using that for all the queries where sometimes you get things that just don'tw --ork as you expect when you're using different where conditions. So basically just keep itsimple with one view and don't try to nest your views in general and then they go into how viewsare stored in PostgreSQL. It's not stored like a function they mentioned here where it stores thetext, it actually stores a query parse tree that they mentioned here. So it actually does somework ahead of time at the create view stage. So for example using the search path at the pointthat the view is created and that objects referred to are used by the object ID. So you canrename objects or columns used in a view definition just as long as you don't actually changewhat the data type is and things or try to drop it and things of that nature. And they talk aboutwhere the dependencies are stored and then they give you an example here to test queries onhow you can actually look for dependencies within a view. So if you need to be able to make achange to a table, these queries will help you identify what views are dependent upon them sothat you can then adjust those in order to make the changes you need. So if you want moreinformation about views and their dependencies and how they work, this is definitely a great blogpost to check out.

[00:06:07] The next post is managing PostgreSQL users and roles. And this is from AWSAmazon.com site. Now, they have information related to running PostgreSQL on AWS, but as ageneral overview of roles and users and essentially using groups which are essentially just roles,it has a pretty easy to understand methodology that I kind of liked. So first they make theclarifications between users, groups and roles and that roles are essentially all of them. Andgenerally roles that are users can basically log in. So for example, you can create a user with itspassword and it's the equivalent of creating a role as long as it's with login, so it can log in. Soit's a role that can log in. That's the only difference between these two statements. That'sessentially what a user is. Now, in terms of their recommendation, they also talk about the publicschema and public role and things that you would probably want to revoke, like revoke thecreate on schema public from public as well as revoke all on database schema from public.That's a certain recommendation they have. And then you go into granting permissions from thatpoint. And they mentioned that when you're creating the role for accessing objects, you need togive roles to the database and then the schema and then the individual objects. So in thisscenario, when you're wanting a read only role, first you create your role called Read Only. Thenyou grant connect on the database to this Read Only role. So you grant the database, then yougrant usage on the schema, whatever the schema name is, to the Read only role. And at thatpoint you can grant select on the individual tables to the Read Only role. Now, if you don't wantto do individual tables, they say you can do all tables. And then if you want to make sure thatthis happens for future tables created in the schema, you need to alter the default privileges inthe schema for this role. And then they go over the same process for doing a read write role tobe able to create objects if you want, as well as insert, update, delete, select on the tables. Andthen once you've done that creating users, you simply create the user and then grant it to aparticular role or revoke it from a particular role. So I kind of like this methodology that's laid outhere. And if you're wanting to update how you're using users and roles, definitely a blog post tocheck out.

[00:08:33] The next post is how we build a scalable on premises PostgreSQL cluster. This isfrom Medium and the Enigma blog. They're talking about how they needed to create aPostgreSQL cluster on, I believe, a customer's on premises site. So they wanted to be able tobuild it up and have it be highly available. So they used these tools from the application, it goesthrough HAProxy and then they wanted to have a leader or primary database with two replicasand they wanted it to be highly available. So they're using Petrone and Console as a means ofleader election. So they go through all the different components of it and how they set it up.Now, they don't have every single command on how they set it up, but they just discussed howthey went through and set this up for their use case. So if you're interested in developing a highavailability so --lution, maybe this blog post is of interest. I'll also reference that they did mention at the bottomhere, going from a single PostgreSQL server to a cluster is not very straightforward, and that'sdefinitely true. And they've chosen particular tools, but one that may be kind of easy was Icovered in a previous episode where there's this extension called PG Auto Failover, and thisseems like a pretty simple solution. Now, it's not maybe as robust as the previous solution, but itdoes give a relatively simple solution for doing some Ha capabilities. So again, if you'reconsidering that, maybe look back at this post that was done in May.

[00:10:09] The next post SQL Concepts from A to Z so this was a pretty interesting blog postthat basically goes from A to Z and they choose particular things for A to Z. And this isn'tPostgreSQL specific, but it's about the SQL language in choosing particular topics to cover foreach of the first letters of the alphabet. So if you want to check out a pretty interesting blog postand kind of make sure you know what each of these areas are, a to Z, definitely one to checkout.

[00:10:40] The next post. Postgres optimization tips. Part One explain and analyze. Now this is aI would call a beginner level blog post. So if you already use Explain and Analyze, I don't thinkyou'll get too much out of this. But if you're still learning query optimization, it's a good example.And they talk through about a query that they're using. In this example, they're using Ruby, itlooks like in Active Records. So Ruby on Rails and how they needed to optimize a query. Andthey used Explain Analyze to kind of see what the query was doing and that it was spending agreat deal of time doing a sequential scan and actually looking at and rewriting the query, notnecessarily adding a new index. They found a way to get the same information they wanted justby rewriting the query, and it improved dramatically the execution time. So if you want a little bitmore about how you could potentially use Explain and Analyze, definitely what post to check outnext post is how to deploy and manage PostgreSQL on OpenShift using the Robin operator.Now, this is outside of my area of expertise because I've never used OpenShift, but if you'relooking to deploy it on OpenShift, it goes through all the process to do it. So if this is of interestto you, Jeff, definitely a piece of content to check out.

[00:11:59] And the last post is waiting for PostGIS three St transform and proge six. Again, notan area of expertise of mine PostGIS, but again, PostGIS Three is coming and the Continuecrunchy data here continues to produce blog posts about it. So if this feature is of interest to you,definitely one to check out.

[00:12:21] That does it. For this episode of Scaling Postgres, you could get links to all thecontent mentioned in the show notes. Be sure to head over to Scalingposgrids, where you cansign up to receive weekly notifications of each episode, or you could subscribe via YouTube oritunes. Thanks. --

episode_image