background

Serializable, JSON & JSONB, Fast Data Loading, PG11 Features | Scaling Postgres 11

Join Over 1,000 Engineers & Get New Episodes Weekly!

In this episode of Scaling Postgres, we review articles covering serializable, JSON & JSONB data types, fast data loading with Ruby and Postgres 11 new features.

Content Discussed

YouTube Video

Podcast Audio

Transcript



[00:00:00] In this episode of Scaling Postgres, we discuss serializable JSON and JSON B, fastdata loading and new features. I'm creston. Jameson. And this is scaling postgres episodeeleven.

[00:00:21] All right, our first article is serializable in postgres resql eleven and beyond. So thiswas a really great post and it talks about serializable in terms of postgres eleven, but it also talksabout a number of other issues that can really expand your knowledge, particularly if you're adeveloper. So they're talking about basically some new features that are planned to go in inversion eleven, where they're doing predicate lock support for hash indexes, gen indexes, andgist indexes. Basically, when you're using serializable transaction isolation, it'll be much moreefficient. But a lot of this post goes into transaction isolation in general and includes some reallygreat links, including a research paper that talks about the importance of writing your codeessentially in a concurrent fashion to avoid attacks. But it goes in depth talking about theimplementation of Serializable, how it interacts with indexes, and even talks about the future interms of trying to enable this type of feature with regard to parallelism what's available in termsof replication, and even talking about skip locked. So I would say this post definitely gave me themost educational material for this week. So definitely one I suggest checking out. And I shouldmention, this is from the Writensku Blogsbot.com blog post.

[00:01:57] The next article is PostgreSQL data types JSON, and this is from the Tapoeh.orgblog. And this continues along with his series on PostgreSQL data types. And he's talking aboutessentially JSON and JSON B and goes into depth about how the JSON data type is basicallyjust text. So it stores the data presentation exactly as it is sent to PostgreSQL, includingwhitespace, annotation, et cetera. Whereas JSON B is an advanced binary storage format, sohe goes into use cases for each one. Basically, you would probably always want to use JSON Bbecause you can make the access to data very fast using gen indexes.

[00:02:48] And the only real case for using the JSON and not JSON B is if you're wanting topreserve the entire payload of some API you're consuming. But generally for storing information,the typical best practice is to use JSON B, so definitely a blog post to check out if you want tolearn more about that.

[00:03:09] The next post is an in depth video tutorial I did on the Scalingposgres.com blog andit's fast PostgreSQL data loading using Ruby. So I am a Ruby developer and I had a need todevelop a couple of terabytes of test data and I went over a Ruby script I developed to load thisfake test data using a library called a Gem, using the faker gem, and I went over different typesof implementations.

[00:03:39] So doing single row inserts, doing multi row inserts per insert statement and looked atthe performance and seeing how I went about maximizing how fast I could load data intoPostgreSQL. And then I also compare it to generating just a CSV file using Ruby and then usingthe PostgreSQL Copy command to import it, because generally copy is the fastest way to importdata into PostgreSQL. But it's an interesting comparison looking at the different timings that I doin this tutorial. So I welcome you to check it out.

[00:04:17] The next article is actually a YouTube video and it was performed at Linux FestNorthwest 2018 and it's called Ten and 1010 New Things in Postgres Ten. And this was done byBasil Bork, I believe, and he basically goes over his Ten favorite features for PostgreSQL. He isa developer, so it is from a developer's perspective. So basically his first one is replacement forSerial. So basically Postgres Ten developed the Identity feature to be compliant with the SQLstandard. So it's a different way of essentially generating auto incrementing primary keys foryour table. The next topic was traceable commit. Basically, if you request a commit but thedatabase server crashes or has some issue, did it truly commit? It's basically you can track onwhat the status of that commit you sent was. He talks about parallel queries, he talks aboutcross column statistics where you have relationships between data and you can define statisticsfor those. So for example, there are only certain cities in a given state or only certain zip codesin a city state, so you can define those relationships for statis --tical purposes. To get the best query plan for queries, he talked about the ability to do newcollations in terms of sorting and he also covered very briefly, logical replication, simplifyingtimestamps and some other features. So if you're a developer, it's definitely an interestingYouTube video that you can view.

[00:06:00] The next piece of content is actually a YouTube channel that I'm linking to, and this isthe Green Plum database YouTube channel. So if you're unfamiliar with Green Plum, I havementioned it on the show before. Basically it's a massively parallel database that's based onPostgreSQL and they recently had a conference, or were a part of a conference, and in the lastweek they've posted a number of presentations talking about Green Plum. And some of thetopics are what it means to be a massively parallel PostgreSQL. DBA green plumb in action onAWS Azure and Google Cloud. Greenplum contain coordinating many PostgreSQL instances onKubernetes. So basically Greenplum allows you to use multiple PostgreSQL and query them inparallel. So as you're looking to scale your database, if you believe you need to start going intoSharding or running your database on multiple PostgreSQL servers, definitely somepresentations to check out.

[00:07:09] The next blog post is near zero downtime. Automated upgrades of PostgreSQLclusters in the cloud. And this is part two, and this is from the second quadrant PostgreSQLblog. Now I mentioned the part one in a previous episode of Scaling Postgres and this goes intomore depth. Now they're talking about a tool that they have called PGL Upgrade which stands Ibelieve stands for postgres PG Logical which is their logical replication. This is different fromversion ten's logical replication because this is a tool that they had developed PG logical prior toit being implemented in version ten as logical replication. So this works in versions less than tenas well. And in part one of this article they discussed how they used logical replication to do anupgrade from a version nine PostgreSQL instance to a version ten.

[00:08:11] Now they don't include the tool but they're basically in this post describing thestructure of it and how they're using ansible to do it in an automated fashion. So they includeexamples of what the inventory file would look like, the configuration file and basically the stepsthat you would need to do this. And this is a great image I'm showing right now where it goesthrough the exact sequence of steps.

[00:08:36] So first you set up a new primary database, then you set up standbys that arereplicating to it. Then you set up a publisher on the old primary database, set up a subscriber onthe new primary database. This technique also uses Pgbouncer as a connection pooler. So thenext step is to wait for replication lag to be below a certain level, pause the connections,reconfigure to point to the new primary, wait for full replication to catch up and then resume theconnections.

[00:09:11] At that point you are now operating on the new primary database. At that point youcan then remove the publisher from the old primary, stop that instance and then stop all the oldstandbys and then set up any new standbys that you would need. So once again, they're notproviding the code for the tool but they're giving the structure so that you can replicate it. Or I'msure you could contact them if you want the implementation done on your behalf, but definitely avery interesting blog post to check out if this is something you're considering doing in the future.

[00:09:47] The next post is with regard to or the next two posts are regard to features coming inPostgreSQL eleven. So the first is post is Partition Elimination in PostgreSQL eleven. And againthis is from the Second Quadrant blog and they talk about what they hope to get into versioneleven is Faster partition Pruning and partition Pruning at execution time. So it's basically whenyou're using postgres ten s declarative partitioning Prune down to the exact table that you needto access the data that's being queried or inserted or updated into. And they talk abouthistorically you could only do it at planner time, determining what is the best or what partitionsyou need to access. However, this can cause problems with prepared statements, but with thisparticular patch they are able to improve the performance. So they have a benchmark and somegraphs here that show the dramatic improvement from --in the green here, going from this type of performance with non prepared statements to aprepared statements this type of performance in version eleven whereas version ten there wasessentially no improvement. So definitely there are some use cases where this could be verybeneficial. So if you're interested, I encourage you to check out this post.

[00:11:10] The next post is Tech preview PostgreSQL eleven. Create Procedure so for years inPostgreSQL you've been able to create functions and triggers, but we've never had the ability tocreate a procedure whereby you can, within that procedure, use, commit and rollback functions.So that's what this feature enables for PostgreSQL eleven. So this is very short post. If you'vebeen wanting to do this with your functions or have this type of functionality, it looks like it'sslated to get into postgres eleven.

[00:11:49] That does it. For this episode of Scaling Postgres, you can get links to all the contentpresented in the show notes. Be sure to head over to Scalingposgres.com where you canreceive weekly notifications of each episode or you could subscribe via YouTube or itunes.Thanks. --

episode_image