background

Account Configuration, Pricing, System Identifier, HyperLogLog | Scaling Postgres 96

Join Over 1,000 Engineers & Get New Episodes Weekly!

In this episode of Scaling Postgres, we discuss user account configuration, pricing, the system identifier and the HyperLogLog extension.

Content Discussed

YouTube Video

Podcast Audio

Transcript



[00:00:00] In this episode of Scaling Postgres, we talk about account configuration, pricingsystem identifier and hyperlog log. I'm Kristen Jameson, and this is Scaling. Postgres episode96 one.

[00:00:21] All right, I hope everyone's having a great week. Our first piece of content is my go topostgres configuration for Web services and this is from the tightlycoupled IO blog. And eventhough he's talking about postgres configuration for web services, basically my interpretation ofthis is Web application because he talks about some Ruby, some Python. So basically webapplications, what's the best way to configure your database for that? Now specifically he mostlyfocuses on account based information and some configuration settings related to accounts. AndI would say this piece of content is a must read if you're a developer. I really liked a lot of theway he laid out things and the way that he's chosen to set up his database. First, he runsthrough a few assumptions. The first is that you have super user access to the database,although he said he did do an update for services like RDS and you're developing with a webservice team. Schema changes are done as part of an admin process. So you're only going tobe creating objects, views, tables, sequences, indexes through an admin process and forexample, using Rails migrations or Python migrations or this Sqlize migration. Next assumptionis that you just have one application postgres database and one schema that happens to benamed after the database. But I imagine you could change that if you want to. So basically howhe has set it up is that you have one user that owns all the database objects. So the databaseschema, tables, index is all owned by one user that he happened to call owner. But of courseyou can make that name whatever you would like. And any of these migrations that happen thatchange objects in the database are run by this owner. Next you have a role that is for read writeusers. So they can do select, update, insert, delete, but they can't create objects. Next is youhave read only users so they can just do selects, but they can't do anything that the read writeusers or the owner can do. And essentially what your application connects to the database withis an app user that essentially belongs to the read write role. So again can't create objects. Soyou need to run your application migrations, if any exist, using the owner user. Now, he doesmake some configuration settings at the database level as well as at the user level. Definingstatement timeouts to ensure statements don't run too long a lock, timeout to avoid lock queuesthat something is not locked for too long. Idle in transaction session timeout to make sure youdon't have long running transactions that are idle in transaction. Defining connection limits. Soyou're not using all of your connections for your app user, but you have room to log in as owneror a super user to do certain activities and then defining the search path because he is defininga separate schema as opposed to using a public schemas. Now, he lists a fair amount ofupsides that you're welcome to read through and add a few downsides but I would agree withhim that the upsides outweigh the downsides. And then he goes through the whole process ofhow he runs it, the exact commands to set up the database as well as the users and all thedifferent permissions and he even includes a gist of it here as well. And then at the tail end hehas some considerations about how you do certain things like you need to make certain schemachanges. What are some things you can do to handle local development? If you want to listusers, you want to add new users using this type of scheme. You need to remove users, updatetheir settings or change other settings. So overall this is a really great blog post and if you're adeveloper, I highly suggest you take a read through and see if you want to make changes ormodify how you tend to set up or configure your database when working with web applications.

[00:04:19] The next post is RDS pricing has more than doubled. This is from Rick Branson onMedium and he noticed this very interesting disparity between the EC two cost. This is hourly ondemand costs and RDS. So their hosted database service hourly on demand costs. And thenthese instant types list which generation like the first generation, the price premium for themmanaging the database was 33%, the second generation it's at 35%, 3r --d generation at 39%. But then on fourth generation it jumps up to 75%. And then at the M fivethis is their most recent generation of general use case instances it's 78%. In addition, I checkedpostgres and specifically the postgres RDS. It's even more of a premium, getting closer to about85% for these M five S. Furthermore, looking at the R five S, which are the memory optimized,which you would think people would want a lot of memory for their postgres instance, and ifyou're using R five S, the price premium is closer to almost 100% for postgres. So this issomething I didn't really realize that prices had. Essentially this price premium had gotten largerand larger as they've gone with new instance generations. Hence the title. Why? RDS pricinghas more than doubled. So it's definitely something to keep in mind about. Do you want to relyupon a cloud service provider to manage your database? And clearly if have a large enoughbudget you can do that and not worry about it. But if cost is a concern, managing your own,particularly as these instance sizes grow larger and larger as you scale your application needs,you may want to consider running your own as opposed to relying on a vendor such as AmazonWeb Services. So short but definitely interesting blog post is something I really wasn't aware ofbecause I actually use just standard EC two instances and manage my own databases. I'vethought about using RDS for smaller postgres instances that I use but I really haven't taken thejump yet. But seeing these price differentials, I think I may just keep managing them myself,even for small little used databases.

[00:06:46] The next post is support for PostgreSQL's System Identifier in Barman. So this is fromsecond quadrant, Postgresql.com, and they are talking about barman 20, which is a backup andrestore manager for Postgres. And they're talking about the system identifier, which is a uniqueidentifier to each postgres instance. And they've added support for it to make sure that whenyou're doing backups or restores that the wall files you have match the actual database files soyou don't get a mismatch between them. And they say how you can find out where the SystemIdentifier is. You can use the application PG Control data to get it and it's part of what's returnedthe database System Identifier. Or you can actually do a psql query, querying the PG Controlsystem to get the System Identifier and you can even get it through the replication streamingreplication protocol in looking for Identify System and it will return the System ID. And the rest ofthe post just talks about how Barman has implemented this to again give you those types ofprotections to make sure you're not trying to restore wall files from another instance with anotherassociated backup. So this was an interesting identifier I was not aware of, so be sure to checkout this blog post if you want to learn more about it.

[00:08:08] The next post is postgres hyperlog log extension. And this is from Getwisdom IO andthey're talking about hyperlog log, which is an extension from Citus that actually enables you todo very fast, distinct counts at an approximation. So it's not 100% accurate. And even some ofthe things they mention here quote the catch is you get about a 1.5% accuracy configurable, ofcourse, by taking up more space, so it seems very inaccurate. But again, the more space youdedicate to the statistical data structure should be able to give you more accuracy. But itbasically lets you do very fast distinct counts over large amounts of data and they give anexample of how you can set this up and actually use the Hyperlog log extension as well as somegotchas to be aware of. So if you have a use case dealing with a lot of data needing to getdistinct counts, maybe check out the Hyperlog Log extension and this blog post. The next posthow to Create, Test and debug an extension written in C for PostgreSQL. So this was aninteresting post that basically describes how you can create your own C extension. So it goesthrough over creating the extension actually testing the extension and creating a test script thatwill test it and then doing a debug using the Eclipse IDE. So if you're interested in getting startedwith writing your own extensions for PostgreSQL, definitely a blog post to check out. And this isfrom the Haiko CA website.

[00:09:43] Now a second post from them is how to build and debug PostgreSQL Twelve usinglatest Eclipse IDE --in Ubuntu 18 Four. So if you actually wanted to essentially build from source, this is anotherblog post to check out and do debugging using the Eclipse IDE.

[00:10:00] The next post is PG. Qualstats Two global Indexadvisor. And this is from our jujuGitHub IO. And basically he runs Power, which helps with postgres performance optimizationand monitoring. And there is an index advisor that's a part of that. And it looks like he's rollingout this feature into something he's calling PG Qual Stats Two that basically does a prediction ofwhat indexes are needed for particular queries that are hitting the database. So he basically haspulled this out of the Power product. So if you're interested in using that, he said it's not releasedyet, but basically in a testing phase. So if you're looking for a tool that will enable you to look atdifferent queries and statistics to give suggestions as to what indexes you may need, definitely ablog post and a tool to check out.

[00:10:54] The next post is finally fixed. Time calculations for parallel plans in Explain dep.

[00:11:02] So this is actually a graphical means of presenting an Explain plan. Now, I haven'treally used it, but if you want a graphical representation to use it, and apparently there was atime calculation for parallel plans that had an issue that has been resolved, so feel free to checkthis out if you want to get a more graphical representation of your Explain plans.

[00:11:24] The last post is psql editor fighting with Sublime text under windows. So this isbasically when you're using psql and wanting to define editors, there's a variable you can set,psql Editor and he's talking about using it under Windows and the best way to set it to be able touse, say, Sublime Text. So if you're interested in that, definitely a blog post to check out.

[00:11:47] That does it. For this episode of Scaling Postgres, you can get links to all the contentmentioned in the show Notes. Be sure to head over to Scalingpostgres.com where you can signup to receive weekly notifications of each episode, or you can subscribe via YouTube or itunes.Thanks like that's. --

episode_image