background

Weekly News, Full-Text Search Performance, pg_cron, Bulk Data Loading | Scaling Postgres 139

Join Over 1,000 Engineers & Get New Episodes Weekly!

In this episode of Scaling Postgres, we discuss Postgres weekly news, full-text search performance, enhancements to pg_cron and the best way to bulk load data.

Content Discussed

YouTube Video

Podcast Audio

Transcript



[00:00:00] In this episode of Scaling Postgres, we talk about weekly news, full text searchperformance, PG, cron, and bulk data loading. I'm creston. Jameson. And this is scalingpostgres episode 139.

[00:00:17] One.

[00:00:21] All right, I hope you, your friends, family and coworkers continue to do well. Our firstpiece of content is new PostgreSQL core team members. This is the notification that's beenplaced up@postgresql.org and Andres Freund and Jonathan Katz have joined the core team. Sopresumably this is to balance out the number of members that EDB or Enterprise DB had on thecore team. They added two new members to balance it out and have no more than 50% of themembership coming from EDB. So congratulations to Andres and Jonathan.

[00:00:55] The next piece of content is also from Postgresql.org in that they have started aPostgreSQL weekly news. So it looks like just some general news, a few bits of product news,and then a lot of patch information about all the different changes that have happened toPostgres. So if you want a lot of detail what's going on with the Postgres code itself, this couldbe a good resource for you to keep track of. So it says weekly news, so presumably this will becoming out every week.

[00:01:25] The next piece of content is full text search since PostgreSQL eight three. This isfrom secondquarter.com, and this is a follow up to two previous posts talking about performancesince postgres 8.3 that looked at OLTP workloads and analytical or bi queries. This one focuseson full text search. Now, they talk a little bit about the history of it and the fact that there's notreally a great benchmark for it, but he had some search queries against posts to the PostgreSQLwebsite or searches against the PostgreSQL website, so we thought that would be a real worldbenchmark. Now, what he tested was Gen indexes and Gist indexes. And gen indexes aregenerally accepted to be faster for searches, whereas Gist are lossy and they require recheckduring searches. Therefore they're a little bit slower. He mentions the hardware used and thenhe goes into the different reports for each version of Postgres. Now, for the data loading, there'snot an appreciable difference between them. I mean, early on it was a little bit longer duration,but not anything appreciable. The index sizes with regards to Gen was more favorable startingaround 9.4, and the queers got significantly faster around 9.4 as well. And they've beengenerally trending up slightly over the different versions. Now, that wasn't the case when he waslooking at Gist indexes. Now, the size stayed pretty consistent, but the performance has fallensince about 8.3 slowly, and that's pretty universal for all the different performance tests that hedid. So this was an interesting post of benchmarking data you may want to check out, but theoverall conclusion is that generally you should reach to Gen when you're doing a full text searchin terms of performance the next piece of content is evolving PG Cron together Postgres 13audit log, background workers, and job names. This is from Citusdata.com and in 2016 theyreleased an extension called PG Cron. So a way for you to set up scheduled tasks to take placewithin the database. So you can see call this procedure or Vacuum this Table, or alter thisextension based upon a Cron tab like entry of when these activities should occur. Now this wasan open source project of course, and actually they've been working with others. So Citus Datais a part of Microsoft and they were working with Amazon's RDS team to add some additions tothis. Now, generally, I just use cron at the OS level. I don't have a hosted solution such as RDSor Azure's database for PostgreSQL. However, if you are using one of those, then having anextension such as Pgcron in it is advantageous. So they've done a number of additions to makethis easier. So first of all, it gives you an audit log of your jobs that have run within the datatables themselves, so that you can see what jobs have run, their level of success, and what thecurrent status is. They've also switched to having an option to use background workers so youdon't have to configure your connection information. So this way you don't have to makechanges to your Pghba comp file to be able to use this extension. They also added the ability toadd job names to particular jobs as well as Postgres 13 support. So it's a very interestingproduct to run scheduled datab --ase jobs when your database is hosted at a cloud provider such as Microsoft or Amazon. So ifyou're interested in this extension, definitely check out this blog post.

[00:04:59] The next piece of content is Webinar Best Practices for Bulk Data loading inPostgreSQL follow up. This is from Secondquader.com, and it's a webinar they put on. You canclick here to gain access to it and it talks about the fastest way to bulk upload data into Postgres.And basically the way to do this, make sure you don't have triggers, make sure you don't haveforeign keys or indexes on the table and add them after you've loaded the data. And that copy isa much better way to upload data rather than doing insert statements, because the general ruleis doing a single insert statement with a commit is one of the slowest. But if you insert multiplerows and then do a commit, that'll be faster, and then copy of course is the fastest. And there'spossibilities for using parallelism. And they even talked about a technique here where you coulduse an unlocked table to actually load the data and even do an analyze on it, and then do aninsert select statement to actually place it in its final destination. So if you're interested inspeeding up your bulk data loads, definitely check out this webinar.

[00:06:02] The next piece of content is having a second look at postgres version 13improvements. This is from Cybertechn Postgresql.com, so it's further Postgres 13improvements. Now we've covered some of these, but they cover generating a backup manifestfile for base backups and then the PG Verify Backup tool to be able to verify that those filesexist. And you have the wall files to do the restore, allowing Pgrewind to use the target cluster'srestore command to retrieve needed wall. So this way you can point at a wall destination so itcan find any wall it needs to do the restore process. Add an option to PG Rewind to configurestandbys. Basically this is being able to write the recovery.com file or create the signal filesnecessary to do a restore. Allow re IndexDB to operate in parallel. So parallel re indexing usethe directory of the PG upgrade program as the default new bin directory setting when runningPG upgrade. So there's some advantages to this. What they say here for small web consoleemulators so you can make your PG upgrade command a bit shorter. Allow dropdb todisconnect sessions using the target database, allowing the drop to succeed. So this is again thecommand to be aware of that would allow you to drop your database if it has active connections.Some changes to PG bench to make it easier to work with. And then the max slot wall keep sizeso it can balance out if a replica is falling far behind, you can basically drop it out of being areplica. They also covered improve retrieval of leading bytes of toasted values for betterperformance using incremental search sorting which we've discussed previously in terms ofspeeding up multicolumn sorts and allow control over how much memory is used by logicaldecoding before it is spilled to disk. And lastly, they mentioned add the backend type to CSV logoutput so that's something to be aware of. Like for example, they mentioned the tool PG Badgerwas caught off guard by this. So if you're using the CSV log output, be aware of that. So if you'reinterested in reviewing some of these changes, check out this blog post.

[00:08:06] The next piece of content is actually a YouTube channel, the Percona YouTubechannel, and they've had Procona Live Online and they had a few presentations that have beenposted in relation to PostgreSQL. One is a look at the elephants trunk, PostgreSQL 13 and thedemocratization of databases. So there's a few posts and there may be some more postgresrelated in the following days.

[00:08:32] The next piece of content is provisioning a PostgreSQL cluster with TerraForm andAnsible. So this talks about using TerraForm and Ansible for setting up a postgres infrastructureand it has links to the saved deployment scripts for TerraForm and the playbooks for Ansible. Soif you're interested in using these tools, you can check out this post. Related to it is this YouTubevideo that discusses the same thing automating a PostgreSQL High Availability Architecture withAnsible. Again, both of these are from Edb.com, and it talks about using both TerraForm andAnsible to deploy a high availability PostgreSQL cluster. Now they said here when they did asurvey of what DevOp --s tools their clients were using, you can see that over 50% of the responses were TerraFormand Ansible. So that's probably why they're using these tools. So if you're interested in doingthis, definitely check out these two pieces of content.

[00:09:28] The next piece of content is random numbers. This is from secondquader.com.They're talking about a way to generate random data to create data in a database for testingpurposes like say, billions of rows of data. And they want it to be reproducible so random, butreproducible by using some sort of seed. So he's using a pseudo random number generator tobe able to do that and tested a number of different generators and showed some of theirperformance here. So if you're interested in doing the same thing, maybe you want to check outthis post from Secondquader.com.

[00:10:03] The next piece of content is election night prediction modeling using PLR in postgres.So there was a Tweet that talked about election night forecasting using R. So basicallycrunchydata.com this blog post took that R code and put it into Postgres to be able to do a directanalysis because PLR enables you to run R on postgres as part of a function. So they set thatup here. And there's a lot of code with this, but it helps you do this election night prediction. So ifyou're interested in that, check out this post.

[00:10:38] Next piece of content is monitoring PostgreSQL cluster via PG pool two withPrometheus. So they're talking about a PG pool two exporter which sends monitoring data toPrometheus so you can analyze it along with your postgres instance. So if you're interested insetting this up for PG pool two, check out this blog post from Bping blogspot.com.

[00:11:00] Next piece of content is how to analyze a PostgreSQL crash dump file. So they showhow you can use GDB to be able to debug a crash file from postgres. If you're interested in that,definitely check out this blog post from Higo, CA.

[00:11:16] Next piece of content is find your local SRID in PostGIS. This is from a Rustproof lab,so if you're interested in doing these types of geographic analysis, definitely check out this blogpost.

[00:11:29] Next piece of content is announcing swarm 64 DA 50. This is an extension forPostgres and they've actually done some major changes recently where now for doing their datawarehousing analysis workloads, they're actually changing to using a compressed column storeindex to be able to give you high performance analysis of analytical queries. And it also includespostgres twelve support. So if you're interested in that, maybe you want to check out this postfrom swarm 64. Com.

[00:12:01] And the last piece of content is the PostgreSQL Person of the Week is ElaineMustaine. So if you're interested in learning more about Elaine and her contributions toPostgreSQL, definitely check out this blog post that does it. For this episode of Scaling Postgres,you can get links to all the content mentioned in the show notes. Be sure to head over toScalingposgres.com, where you can sign up to receive weekly notifications of each episode, oryou can subscribe via YouTube or for My Way. --

episode_image