background

Date Time Bins, Search Every Field, JSON_TABLE, Picking Primary | Scaling Postgres 210

Join Over 1,000 Engineers & Get New Episodes Weekly!

In this episode of Scaling Postgres, we discuss how to bin date times, search every field of a table, use JSON_TABLE and methods for a client to pick the primary database.

Content Discussed

YouTube Video

Podcast Audio

Transcript

All right. I hope you, your friends, family, and coworkers continue to do well. Our first piece of content is "POSTGRESQL: MAPPING TIMESTAMPS (DATE_BIN)". This is from cybertec-postgresql.com, and they're talking about the function date_bin. Basically, it lets you place a given date into a particular bin. So this is more sophisticated than just the date_trunc function where you're truncating it, but you can actually put dates into specific bins. And so the first parameter is the bin size.

So it can be something that date_trunc can't handle like they're doing 30 minutes, 15 minutes, 5 minutes, whatever binning you want. Then you specify in the second parameter the value that you want to bin. Then the third is the starting point of it. So if you're choosing an even time here, it'll bin every 30 minutes at that exact time. But in the other example here, if you're binning every 20 minutes, you can actually have an entirely different start time. So if it's 14:49, it'll go into 14:31, or if it's 14:54, it'll go into 14:51 just based on what the bin size is, as well as where it starts. So this is an interesting function. If you want to learn more, you can definitely check out this blog post.

The next piece of content- "DBA tips 'n tricks- search in every field in a table". This is from depesz.com and he's talking about how you can quite easily just search each individual column for a particular value. So that's easy to do. But what if you had a lot of columns and you wanted to search for the same value in every field? Well, there's actually a row data type, so you could actually do, in his example, he's selecting from pg_user. You could "select pg_user from pg_user" and it gives you one column being returned with all the fields in it for each row. Now, to make it easier, he aliased it to U. So he's selecting U from pg_user U and he's just limiting it to return one record. But once you have this, you can then cast it to a text value and do that same search. And it's going to look through the text of that row to see if it can find, for example, 22 somewhere in here. Now, of course, the thing to keep in mind, this is not going to use an index, but this could be a helpful way if you want to look through any column of a table. If you want to learn more, definitely check out this blog post.

The next post, also from depesz.com, is "Waiting for PostgreSQL 15- JSON_TABLE". So this is a new patch that was added as part of the SQL's JSON improvements that are coming and it basically lets you convert JSON into a table that looks like PostgreSQL that you can now query and you have different ways that you can render out the table. You can handle nested JSON as well. So a pretty interesting function and if you want to learn more about it, definitely check out this blog post.

The next piece of content is actually not a new post, but I've recently done some work where I was questioning how I was picking the primary database in a primary replica setup or a primary standby setup and for a long time, I was using things like virtual IPS or Elastic IPS or even trying using DNS and switching the DNS names. But that gets into an issue with TTL times. So basically, it was the infrastructure that was determining what Postgres database was the primary one. Now, a lot of this is just due to history because there was no capability in earlier versions of Postgres to determine which one was the primary, at least before version 10. So these were the solutions that were used. But even though, as a version 10, you could designate a client to talk to more than one database at a time, and it would pick the connection based upon just a read, write or any type of database.

I've never really used it for determining which database was the primary one or how the client actually determined which one to go to. I've always used a single host in the client configuration, but I recently moved to using multiple hosts in the client configuration and it made things a lot easier. So basically in the client connection string, you can include more than one host and then with the option target_session_attrs, you can determine whether it's going to connect to any host or a read/write host. Now that's as of version 13, that's what it does. Version 14 added four additional options read-only, primary, standby, and prefer-standby. So you can use this capability as a version 10, but the enhancements in 14 essentially make it even better. So on a recent project, I actually moved to using the client, determining if the server is primary or not, instead of having the network infrastructure abstract that away.

So basically, the client is aware of every single host running Postgres and I'm using a private DNS to tell what hosts exist and the port numbers. But then it's the client configuration in the target_session_attrs and the client capabilities that determine which one is the primary. So if something fails over, it knows it needs to look at the primary. I found this really easy and smooth to set up, so I wanted to mention this as another option you should potentially consider in terms of allowing your client to pick who the primary is as opposed to explicitly telling it. Now this capability of all these different options exists in the client of Postgres 14. So you don't even need to use the Postgres 14 server, although it does give some added benefit in terms of a little bit of reduced traffic in terms of assessing if a particular server is a primary or not. But as early as version 10, you could connect to any database or read/write database, so that may be sufficient for your use case.

There were two posts that discussed this in detail. One back in March of 2021 talking about this new feature by cybertec-postgresql.com and then another post by postgresql.fastware.com on May 27, 2021 that talked about these features. Now, again, this makes failover much easier because the client can just follow appropriately. The only thing to keep in mind is that you do need to essentially restart a session because that's where it determines what it's connecting to. So for example, if you have a client that's set to only connect to standbys and you promote one of the standbys and some of the clients are talking to it, unless its connection is restarted, it'll still connect to the primary until that connection is reestablished. But there are a lot of possibilities in using this and I wanted to bring it up to indicate that this could be a viable solution for clients to determine who the primary is as opposed to explicitly telling it via the network infrastructure. But if you want to learn more, definitely check out these pieces of content.

The next piece of content- "5mins of Postgres E12: The basics of Tuning VACUUM and autovacuum". This is from pganalyze.com, and this essentially goes over the post that was discussed last week on Scaling Postgres and was posted to the Postgres EDB blog talking about vacuum and autovacuum. And this was a great post that covered all of that configuration. So if you want another viewpoint on that content, definitely check out this post.

The next piece of content- "On Shapefiles in PostGIS". This is from endpointdev.com and they're talking about shape files and using them within PostGIS and the process you have to go through to get that set up. So if you're interested in that, you can check out this blog post.

The next piece of content- "How to enable SSL for Postgres connections". This is from dev.to. The post describes exactly how to do that from setting up certificates. Maybe you need to set up your own private CA and your own certificates to do this, but he goes through some commands that you can use, how you configure PostgreSQL, as well as the changes you need to make to the pg_hba.conf file as well as updating PgBouncer as well, so it can speak securely as well. So check out this blog post if you want to learn more.

Next piece of content- "DLE 3.1: pgBackRest, timezones for CLI, DLE community". This is from postgres.ai and they're talking about the release of their new open-source software called the Database Lab Engine, which basically does thin cloning of databases for testing or development purposes. So if you're interested in that, you can definitely check out this new release.

The next piece of content, the PostgreSQL person of the week is Taras Kloba. If you're interested in Taras and his contributions to Postgres, definitely check out this blog post.

The last piece of content. We did have another episode of The Rubber Duck Dev Show this past Wednesday. This one was on "Working With Outside Teams". So a more project management-type episode. So if you're interested in that type of content, definitely check out our show.

episode_image