background

PgMiner Botnet, Collation Index Corruption, postgresql.conf, Custom Data Types | Scaling Postgres 145

Join Over 1,000 Engineers & Get New Episodes Weekly!

In this episode of Scaling Postgres, we discuss the PGMiner botnet attack, how collation changes can cause index corruption, managing your postgresql.conf and implementing custom data types.

Content Discussed

YouTube Video

Podcast Audio

Transcript



[00:00:00] In this episode of Scaling Postgres, we talk about a PG minor botnet attack,correlation index, corruption, postgresql.com files and custom data types. My name is KristenJameson, and this is Scaling postgres episode 145.

[00:00:24] Our. All right, I hope you, your friends, family and coworkers continue to do well. Justto let you know, there will be no scaling Postgres episode next week due to the holidays, but wewill get started again in the new year. Our first piece of content is PgMiner botnet Attacks weeklysecured PostgreSQL databases. This is from ZD net and it's talking about a botnetnet attackbeing set up to attack postgres servers. And it says it only impacts postgres servers running onLinux thus far. And it says it follows a typical pattern in terms of searching out a particularnetwork range and looking at the default port. So of course, number one, you don't want to haveyour PostgreSQL database exposed on the Internet. Number two, you don't really want to usethe default port, you should choose a separate port for it and then it tries to use a brute forceattack to find out what the postgres password is. So a third thing is don't allow logins viapostgres from an IP address. Set up a separate account to do that purpose and just leave thepostgres for say, local only connections. And then it uses copy from program to try and escalateits privileges. So something definitely to be aware of. And be sure to secure your postgresinstance as I had mentioned here, so you'll be protected. And given that it's the holiday season,there might be more attacks going on because probably people are paying less attention tosecurity during this time period. So definitely check out this topic or others about PG minor tolearn more about it.

[00:01:58] The next piece of content is don't let correlation versions corrupt your PostgreSQLindexes. This is from Citusdata.com and they're talking about an issue where you havecorrelation set up for postgres. And usually the default correlation is related to the Libsy librarythat you have that defines how text is sorted within the database. And they give an example ofusing different collations here where they have the same words rs, banana, cat. When youcollate them using in English, I guess, a New Zealand dialect, it sorts it essentially ABC. But AAactually has a different meaning in a Danish dictionary that means this character here. And whenyou do the order by Correlating, by Danish or DadK, it actually sorts the A word at the bottom.So correlations determine how data is sorted and this is particularly important for indexes. Andthey show an example here where you can list your databases and see what your correlationsare for them. And that the default correlations typically set up by NTDB. Now under this headingof this article where they say, so what's the catch? It says from the Unicode Technical Standard,number ten over time correlation order will vary. There may be fixes needed as more informationbecomes available about languages. There may be new government or industry standards forthe language that requires changes. So you can think of this, it's like time zones. Time zonesconstantly change based upon what's stated. Well different orders based upon culture andstandards apparently change too. And they say and finally, new characters added to the Unicodestandard will interleave with the previously defined ones. This means that correlations must becarefully versioned. Now, how this impacts Postgres is with their paragraph here and it says, Iquote the most obvious problem this causes for databases like PostgreSQL is that Btree indexescan become corrupted if they are ordered by a collation that changes. Typical symptomsreported in the wild include orders that cannot be seen by one query, but can be seen by anotherquery that is using a different plan. Or sometimes a unique index fails to prevent a duplicatevalue from being added to a table. Now I've seen exactly these problems happen after aPostgres upgrade. Now it actually may have been one of the Lib C versions that changedbecause the upgrade not only did it upgrade the database, but it was placed on a newer versionof Ubuntu and perhaps the Libc versions changed and had this impact. Now, the resolution to fixthese issues are just to do a re index. Once you do the re index, it'll resolve these problems. Andthe second thing to keep in mind, this only applies to text indexes. So if you ha --ve an index on integers or date time fields that's not going to be impacted by collations, it'sonly your text fields that will be impacted. And that's exactly what happened in the instance that Imentioned. Now, in terms of this happening, I've been mentioning Libc because this is anoperating system provided facility for collations. There's also ICU which is the internationalcomponents for Unicode. So this is you can think of a cross operating system way to definecollations. So you could choose to use this and hopefully avoid some of these problems. Butyou're still going to have to have versioning in effect if you're going to go to the next release ofICU. But it does have a number of advantages that they list out here. But most people stick withthe default correlations that are typically reliant on the Libc library of the operating system thatPostgres runs on. Now they do say that Postgres releases ten to 13. If you're using these ICUcorrelations, there's a way to capture what they are and present a warning if things changed. Butnot really for the Lib C, which is the predominant correlation being used by Postgres databases.But what they've worked on here for release in version PostgreSQL 14 is a way to essentiallyrecord what correlations have been used for building these indexes. And then if the correlationchanges for say, the operating system it can report back and say, hey, you probably need to reindex these text indexes and even specify which ones. So this shows how it handles differentversioning based upon the operating system in use. Now the warning he says, will looksomething similar to this. You'll get warning the index depends on a correlation which is defaultwas version 34 but the current version is 36. So this gives you a warning that you've changedcorrelation versions and it gives the detail the index may be corrupted due to changes in sortorder and the hint is reindex to avoid the risk of corruption. So basically when you change theseversions, it looks like you do need to do a best practice of reindexing your text indexes. So this isa patch that should make its way into version PostgreSQL 14. Now, they also mentioned thereare some other instances where text is in use that could be impacted or you could see some ofthis corruption. One, he says if range partitions are using text, if a collation changes, you mightget things going into different partitions. This is particularly scary for me, so you might not wantto use something like a range partition using a text field for that reason. The other one is thatcheck constraints that perform string comparisons whereas once they may pass, now they mayfail. So that's another thing to keep in mind. Now they don't have anything to address either ofthese two issues, they're just making you aware and then they go over and discuss some otherthings about indexes. But this was a great post explaining kind of the issue and the hope for atleast a warning to be produced in the log if this has happened for your particular database. So Ihighly encourage you to check out this post from Citusdata.com.

[00:07:47] The next piece of content is repository, tuning guide and API for your postgresql.com.So this is referring to a site called PostgreSQL NF or PostgreSQL. Comf. Now of course, this isthe main configuration file for postgres and up to this point this was just a source ofdocumentation that highlights specific areas of the postgresql.com file and give yourecommendations and ranges for each parameter that you can change. And they list some of thedetails with regard to it here. But what they've added today and they're saying in beta, is a wayfor you to manage your postgresql.com files via a service. So it provides a repository to storeyour postgresql.com files along with a tuning guide and an API for working with it. So this is aninteresting tool to help with management of postgres. I myself use a configuration managementtool called Ansible, so that's what I'm going to be sticking with. But if you don't have a similartool, you may want to investigate this post and see what they're working on. So if you'reinterested you can check out this post from Angres.com.

[00:08:58] The next piece of content is creating custom postgres data types in Django. This isfrom Pgeanalyze.com and whereas they are talking about Django, which is a web framework forPython, the first part of the post is just addressing custom data types. The first one they t --alk about is Custom domains. So essentially this is taking an existing data type and thenapplying a check constraint to it so that you can easily use it in your table creation. So forexample, they created a domain here called String No Spaces. So essentially it's a custom typestring no Spaces. It's created as a VAR car, it's not null with a check constraint that no spacesshould be included. Now, with all of that built into it, you can then specify when you create atable, I want a column called Username and the type is String No Spaces. And then they showyou when you try to insert a string with spaces in it, it'll give you a constraint violation error. Now,what they also said is that you can build upon it, so you can essentially nest domains withindomains. So here he created another domain called Email with Check and it uses as its base thestring with no Spaces domain. So essentially you're going to get a not null string with no Spaces.And then he adds a check constraint of it must include an ampersand. Then he gives anexample of the different errors you can get. If you try to insert one with a space. You'll get acheck constraint violation for this check. And then if you enter one without an at sign, you'll getan Email with check. The next type of custom data types they cover is composite types. This isjoining one or more types together to represent essentially one type. And they use the exampleof RGB values. So these can be represented as three integers, but you can create a separatetype and they've called it RGB color value. Here that is a composite of three integers and theyshow example for its use case and then how you can even pull out an individual value fromwithin that composite type. So these types make working with your database easier and thenthey cover how to use these if you're using the Django web framework. So the rest of the postcovers how to use these types within Django. So if you're interested in that, you can check outthis post from Pganalyze.com.

[00:11:09] The next piece of content is free. 1GB postgres Database on AWS Cloud Shell nowthis is quite interesting because Cloud Shell is just a web interface to give you a shell prompt tobe able to interact with AWS services.

[00:11:25] But apparently it is its own environment and it has about 1GB of storage. So this postcovers how you could actually install a postgres database within this environment. Now, it has alot of interesting issues. Like number one, the instance is not permanently running only whenyou enter the cloud shell from your web browser. You can't connect at it from the outside andthere's no replicas, no backups. But I guess I'd call it an interesting hack of running Postgres onthis minimal environment. So if you're interested in that, you could check out this post fromOngrids.com.

[00:11:59] The next piece of content is PostGIS raster and Crunchybridge. So this is fromCrunchydata.com, and they're talking about using the Raster data type in PostGIS. And basicallyit records Raster type data, which are images, elevation, grids, model outputs, and more. Forworking with geographical information systems. They're talking about how typically whenworking with this type of data, you either store it in the database or you store it outside of thedatabase. So have a reference to some data somewhere, like usually the file system, so thatPostGIS can access and work with that data. Now, I said initially in DB storage was used, but asinformation continues to grow and grow and grow, it's moved more towards storing the filesoutside of the DB. But what's interesting here is that this Raster tool has a way to access tiledGeoTIFFs of data. So you can actually store tiled images within an object storage system like SThree or Google CS or Azure Storage, and then pull individual tiles for analysis. So you don'thave to store them all in the database, you're just storing a reference to it in the database. Andthey give you a demo of how you can do this using Crunchy Bridge. However, you could do thesame technique using any version of PostgreSQL with Postgres. So if you're interested inchecking it out, you can check out this post from Crunchydata.com.

[00:13:24] And the last piece of content is PostgreSQL exclusive cron jobs using PG timetablescheduler. This is from CyberTech postgresql.com. So if you're interested in using the schedulercalled PG Timetable, you can check out this post that explains some new features --that have been added to it.

[00:13:43] That does it. For this episode of Scaling Postgres, you can get links to all the contentmentioned in the show Notes. Be sure to head over to Scalingpostgres.com, where you can signup to receive weekly notifications of each episode, or you can subscribe via YouTube or itunes.Thanks. --

episode_image