background

Non-Relational Data, Contributers, Security Features, High Availability | Scaling Postgres 98

Join Over 1,000 Engineers & Get New Episodes Weekly!

In this episode of Scaling Postgres, we discuss storing non-relational data, recognizing contributors, Postgres security features and implementing high availability.

Content Discussed

YouTube Video

Podcast Audio

Transcript



[00:00:00] In this episode of Scaling Postgres, we talk about nonrelational data contributors,security features and high availability. I'm Kristen Jameson, and this is scaling postgres episode98.

[00:00:21] All right, I hope you're having a great week. So we are on episode 98, approachingepisode 100. So if you have any ideas of what you think I should do for the 100th episode,please go ahead and leave a comment in the box below. Or if you receive this via email, goahead and hit reply on that email if you're part of our mailing list, and just let me know what youthink I should do for the hundredth episode.

[00:00:46] But to get started for this week, our first piece of content is Eight Ways RDBMS.Postgres handles non relational data. And this is from the Enterprisedb.com blog. Now, this isactually a brief synopsis of a presentation, and there's actually a link to the presentation slides,and I actually like those a little bit better.

[00:01:10] These are on Momgn US, his site, so you can see in more detail kind of what wascovered. And this presentation just talks about how Postgres stores non relational data. Sosomething apart from your typical string of text or integer or boolean, things of that nature. Andhe covers these areas here, arrays, range types, geometry, XML, JSON, JSON B, row types,which these are pretty much custom types from my interpretation and character strings. Soarrays are basically you can store arrays so more than one value within a column or within afield, which kind of goes against some of the normal forms of a relational database. But Postgresallows you to do this, and he goes through different ways. You can work with that array in termsof accessing particular entities of the array, doing unrolling, creating them, aggregating them,things of that nature. The second one is talking about range types, so you have something as astart and an end, and you can have non bounded as well. He tells you how you can query it tolook, to say, does this exist in a particular range? And also mentions that you would want to usea gist index for using a range type. That's the most efficient index to use. And he goes overexclusion constraints on how you could see if something is going to be double booked. Forexample, you can add an exclusion strain to be able to handle something like that. The thirdarea, he talks about geometry and how if you're using PostGIS, you can store geometric shapeswithin postgres and how best to index them, and again using the just index for that purpose.Then he talks about XML and the XPath query on how you can work with XML that you'restoring in the database, and conversion of different values from XML. Then he goes into JSON,and this is a JSON data type that's different from JSON B. It basically stores the whole JSONstructure as it's represented doesn't do a lot of changes to it. And he mentions that there's over100 different JSON functions you can use to work with JSON. And he goes over different waysof loading it, presenting it, accessing it, concatenating it, things of that nature as well ascalculations. Then it goes into JSON b. Think of it as binary JSON. So it supports indexing of allkeys and values for efficient retrieval of information. It's stored in a compressed format, but itdoesn't preserve the original JSON like it doesn't preserve the key order, doesn't preserve whitespace and retains only the last duplicate key. So this is something definitely to keep in minddepending upon how you're going to be using the JSON. And he goes over and shows thedifferent examples here. And of course when you're working with JSON b, generally you wouldwant to use a gen index for search purposes. Then he talks about row types and essentiallythese are like creating your own custom types. So he created a custom type driver's license thatis actually composed of three different subtypes as you were, a character integer and a date.And then it goes over character strings and how to work with character strings, how to querythem, how to handle case situations, how best to do indexes, and then following up with a talkon full text search and using TS queries TS vectors in order to handle full text searchcapabilities. So if you're interested in this content, definitely it's a great presentation to check out.The next post is contributors Team Redux. And this is from Robert Haas at Rhoss blogspot.com.And this is a follow on to a previous blog post that we menti --oned on Scaling Postgres. But this is talking about the difficulty of how to address contributorsto postgres and not even postgres, but also libraries or tools associated with Postgres. And thenhow about their contribution meaning differentiating between a contributor or major contributorand what happens if they're a major contributor but then they're not contributing that much today.And just all of the difficulty of giving essentially credit for people who are really helping postgresmake it the way it is today. And he is also asking for feedback on how to make this better. Sothis is basically just an essay of the struggles of trying to recognize people appropriately andhow best to handle it. So go ahead and give this a read if you have any thoughts you contributeto a comment or reach out to him if you so choose. But definitely since Postgres is a communitybuilt and supported database system, definitely a blog post to check out. The next post isactually a series of three posts, part one, two and three called Understanding Security Featuresin PostgreSQL. And this is from a highGo CA blog and he's basically covering different securityareas of postgres, mostly with regard to network security and some of the roles. So, forexample, this part one here covers listen addresses. What addresses postgres listens on thehost based access in terms of how you're granting people to actually connect over the networkto the database system or even locally being able to connect to the database system, how youcan integrate it with an LDAP server or Pam, and then going into roles. Users and privileges andhow to set those up to grant access to the system. Now the second part, which is this blog posthere. Part two is all about essentially SSL or TLS and how it works public key cryptography. Andthis doesn't have a lot of directly relevant postgres. It's all about setting up certificates in publickey encryption. But this is a good thing to read before reviewing part Three, which is definitelymore PostgreSQL focused because it's talking about how you can prepare your TLS certificates,setting up a transport layer, security on the server, on the client, giving examples of how you canconnect, and it even talks a little bit about transparent data encryption. This is where data isencrypted at rest and essentially Postgres can't handle that. Yet there is a plan to add somefeatures in 13, version 13 and additional features in version 14 to have it ready by that time. Buthe talks about what it is and what it does. So if you're interested in learning more about thesecurity aspects of Postgres or you're looking in particular areas you're wanting to increase yourknowledge on, definitely a set of three posts, you may want to check out the next post. How toautomate PostgreSQL Twelve Replication and Failover with Rep Manager Part One. So the RepManager is a replication manager that was set up by Second Quadrant. And this blog post goesinto a lot of detail about how to set up a High Availability system using it specifically withPostgres Twelve. And they're talking about setting up three different nodes and three differentlooks like AWS Availability Zones. So one will be a primary and then two standbys or tworeplicas that will follow the primary, and then there will be an election process that Rep Manageruses to be able to determine if one of the standbys or replicas should take over the job of theprimary. So it goes into a lot of depth on how to set this up and how to configure all the differentfeatures to get started. So if you're interested in High Availability and want to potentially use RepManager, definitely check out this blog post. A related blog post. But this relying on Kubernetesand crunchy Data's PostgreSQL Operator is another way of achieving the same means. So thisblog post is deploy High Availability PostgreSQL clusters on Kubernetes by example. So they'reusing again the same concept concept of three different nodes. There's one primary and tworeplicas, but it's being done on Kubernetes and they have a load balancer that kind of handlesthese systems and how the operator coordinates setting this up so that you have an election tomake sure there's only one primary and the other replicas follow it. And they're using theDistributed Consensus Store of Kubernetes to be able to handle this. So again, it's another wayof getting high availability. So if you wanted to look at it from a Kubernetes angle, defin --itely check out this blog post from Crunchydata.com. The next blog post is JSONB set lax.This is from second quadrant.com and they're talking about the JSONB set function. And theysay they've had some complaints that if the value argument of the function is null, the result willbe null. Now, the reason this is happening is because this function is declared as strict, whichbasically it will turn null if the input is null. But some people wanted some slightly differentbehavior, so they introduced this function JSON BSET lax, which is scheduled for release 13,version 13 of Postgres. That then you can add an additional parameter that allows you to eitheruse the default, which is use JSON null, or to raise an exception, return a target or delete a key.So if you use this function and want this additional functionality in version 13, definitely keep aneye out for it.

[00:10:41] The next blog post also covering something coming in version 13 of Postgres iswaiting for PostgreSQL 13 allow vacuum command to process indexes in parallel. This is fromDepom and basically for the index vacuuming portion, you can assign multiple job workers to beable to vacuum indexes in parallel. And you can either set it explicitly or rely on a max parallelworkers to determine how many auto vacuum workers will work on it. Now, he has built out anexample here of a test table that he developed along with seven indexes disabled auto vacuum,did some delete of the data and then went ahead and run vacuum and did timing tests. So whenhe was doing no parallel vacuum, it ran in about 25 minutes. With two workers it ran in elevenminutes, so more than twice as fast. And with a workers it ran a little bit slower in about twelveminutes. So not quite sure, it was a little why it was a little bit longer here and he doesn't knowwhy. I mean, this is something in development, so it's essentially at a beta stage, so maybe it willbe improved. But there was definitely a big win going from zero parallel workers to two parallelworkers. So definitely a feature to look out for in the upcoming postgres 13.

[00:12:07] And the last post is PG watch two version 1.7.0 released. So this is a means of doingmonitoring of your PostgreSQL installation. So if you use this or want to consider using it,they've recently upgraded the version, adding a log parsing, a Recommendations engine, realtime stats tracking, and some new dashboards. So if you're interested in learning more about it,definitely check out this blog post and their tool that does it. For this episode of ScalingPostgres, you can get links to all the content mentioned in the Show Notes. Be sure to head overto Scalingpostgres.com, where you can sign up to receive weekly notifications of each episode.Or you could subscribe via YouTube or itunes. Thanks.

[00:12:55] Our. --

episode_image