background

Security Concerns, Fast Counting, Indexing, Trends | Scaling Postgres 58

Join Over 1,000 Engineers & Get New Episodes Weekly!

In this episode of Scaling Postgres, we review articles covering security concerns, fast counting tables, indexing basics and PostgreSQL trends.

Content Discussed

YouTube Video

Podcast Audio

Transcript



[00:00:00] In this episode of Scaling Postgres we talk about security concerns, fast counting,indexing and trends. I'm Kristen Jameson and this is Scaling Postgres. Episode 58.

[00:00:20] Alright, I hope everyone's having a great week. Our first piece of content is a bit ofnews from Postgresql.org and it says CVE 2019 91 93 not a security vulnerability. Andapparently they say that there's been widespread mention of a security vulnerability inPostgreSQL and it was registered this CVE number. However, the PostgreSQL team believesthat this is not a security vulnerability, it's just how it was designed. And specifically it's talkingabout the Copy command which allows you to import and export data from a PostgreSQL table,for example to a file, but to a program. It allows you to run a program and pipe the output fromthat into say, a PostgreSQL table. And apparently this feature is thought to be a securityvulnerability, but you can only use it if you have superuser privileges or you have the RolePgeecute server program because it allows you to run programs upon which the system thatPostgreSQL is running, and specifically just the rights that a Postgres user has, assuming thatPostgreSQL is installed under the postgres user. So he says, quote by design there exists nosecurity boundary between a database super user and the operating system user the serverruns under. And that's part of the reason why they don't run PostgreSQL server or set it up torun as a root user. And they also state, quote we encourage all users of PostgreSQL to followthe best practice of never granting super user access to remote or otherwise untrusted users. Soof course you should keep this very much under locking key because of the power it has. Nowthere's a few more posts that went into this in more depth. The next one is when a vulnerabilityis not a vulnerability. And this is from Magnus Hagander's blog and basically he talks about thePostgreSQL documentation essentially documents how this is handled. So it's essentially howthe system was designed and not a vulnerability. And I like what he says here is that while this isnot a vulnerability in PostgreSQL, it is certainly a vulnerability present in a nontrivial number ofinstallation of PostgreSQL deployments. So basically avoid granting super user permissions to auser and quote a common setup is to only allow the Postgres operating system user itself to actas a super user, which is what I tend to do. I also like what he says here. It seems PostgreSQLneeds better documentation about this for new users, possibly along the Create role page, whichdoes not explicitly mention this, but if super user is granted to a user, that remains the equivalentof gridding the OS user permissions.

[00:03:07] So basically if you give someone super user permissions, they have somepermissions over the Postgres operating system user to do some things. So I agree it wouldmake sense to update documentation to make people aware of that. And of course hisrecommendation is if you have any application that uses super user. So for example, you have aweb application that talks to a database and you're connecting as the super user, you shouldpretty much stop that right away and just grant the permissions necessary for that applicationuser to do what it needs to do. Maybe it could just be the owner of the database for example.And of course he also mentions use Pghba.com to ensure that no super user can log into thesystem remotely. Now it also mentioned the origin of this appears to be from the blog post onTrustwave. So I will link this in the show notes that you can take a look at it as a reference.Another post related to security is Postgres and Super User Access because this againmentions the same CVE and they pretty much cover the same issues and particularly highlight ifyou're handing out super user access to your database or connecting to your application with asuper user role, consider changing that immediately. So it's definitely best practice not to beusing superuser privilege to carry out standard application operations.

[00:04:29] The next post again probably prompted from this CVE announcement is securePostgreSQL a reminder on various attack services and this is from CyberTech Postgresql.comand they talk about different attack points. The first one they mentioned, given what's in thenews this week is unnecessary PostgreSQL super user access and they flag this as the numberone p --oint of danger and emphasize again like mentioned super users can execute randomcommands on the OS level under the PostgreSQL process owner privileges. So this is usuallythe postgres user of that installation. So they go through a scenario where you can potentiallySSH into a machine. Another area mentioned is the archive command abuse because typicallythis is an operating system commands, a command that gets run every time you switch a wallarchive. Typically you copy those out to another location. But here let's remove the postgresinstallation for the archive command and if you reload your configuration and do a log switchyou've essentially removed your whole postgres installation. The next area they mentioned is afile foreign data wrapper, arbitrary program reads. So in this case being able to potentially run aprogram that again removes the postgres installation and then untrusted PL languages thatagain can try to do the same thing. And in terms of the mitigation the recommendations are thesame applications should be designed so that super user access is not needed. The next attackarea mentioned is brute force password guessing and the mitigation area for this is use longrandom passwords. Area two is install and configure the relatively unknown auth delay contribextension that will halt the communication for a moment in case a login attempt fails making itslower to guess passwords. And third, actively monitor the server logs doing full power bruteforce generates many gigabytes of logs per hour with fail entries which should be picked up by agood monitoring setup. The next area is man in the middle Attacks. Basically someone gettingbetween you and your database server when you're trying to connect and it says at a minimumforce SSL connections preferentially use Scramshaw 256 which is a new authenticationmechanism as opposed to MD Five and then also use certificates that use a verify CA and verifyfull SSL modes when connecting. The next attack area he mentions is Loose Backups or Logs.So basically where are they being stored and being sent, so make sure they're kept separate.Only DBAs have access to them and use things like at Rest Encryption. The next areamentioned is disk space attacks. So someone trying to say fill up the disk. And some of themitigation strategies are don't use the public schema for important applications and createapplication specific schemas. And also you could set the temp file limit parameter to areasonably low value. And the last area mentioned is distributed denial of service attacks forpublicly exposed DBS. Now, I'm not sure why you would want to publicly expose your databaseand of course mitigation number one is try to avoid direct public access to your database, butfailing that, look into anti DDoS services. So a pretty good write up of different attack areas andmitigation strategies to avoid them.

[00:08:03] The next post again security related, is enhancing your PostgreSQL ten security withthe CIS benchmark. So this is from Crunchydata.com and they have developed a center forInternet Security PostgreSQL benchmark that you can establish against your installation to seehow well it achieves this security benchmark. And it was developed by testing PostgreSQL tenrunning on CentOS seven and it has recommendations with regards to installation and patches,directory file permissions, logging, monitoring and auditing, user access and authorization,connection and login PostgreSQL settings, replication and special configuration considerations.So if you want to benchmark your PostgreSQL installation with regard to security, definitely ablog post to check out.

[00:08:53] The next blog post is actually not security related and it's Count Star or Countasterisks made fast. And this is from CyberTech Postgresql.com and they're talking about why iscount so slow? Because some other database systems keep a count of the total rows in a tableand when you count from a whole table in PostgreSQL it's relatively slow. And the primaryreason is MVCC. It has to go in and count every row to determine which rows are visible to giveyou an accurate count of how much is in the table. And the first thing he mentioned is count theproblem because you're returning essentially all rows. But no, it's really just counting rows andthere's nothing to be gained by avoiding the asterisk. Would an index only scan help? And itcould potentially if vacuum has been run sufficiently and the --visibility map is updated. The other thought is using an aggregate table. So basically you keepan active count of the rows you're interested in in a separate table. And he has this scenariowhere he created a started transaction, created a table called my table underscore count withjust a single column, created a function that will serve as a trigger. So if something's inserted,the value gets incremented, if something's deleted, it gets decremented. And in other caseswhere for example, it's truncated, it gets reset to zero and then he creates the trigger. Afterinsert or delete, run this procedure and get an initial count from the table by running thisstatement here. Only disadvantage, it will lock the table while it's setting this up. He says thisprovides us with a really fast alternative to count, but at the price of slowing down all datamodifications on the table. So now essentially, you're going to have to update the second tableevery time an insert or delete is done to it. And he makes note even though this counter tablemight receive a lot of updates, there's no danger of table Bloat because these will be hotupdates or heap only Tuple updates. The next thought is, well, do you really need an accuratecount? And another way you can count is just look in the PG class table at the Rail Tuples andyou get kind of what the planner uses for estimates of what the count total rows are in the table.So if you don't need something perfectly accurate, this may be a reasonable way to get anapproximate count. And then he has this interesting way where basically use an explain plan topull out an estimate of the rows, which I'm not sure I would use because of the last statementhere. Do not use this function to process untrusted SQL statements since it is by naturevulnerable to SQL injection. They also list a number of comments here discussing differentways. And one interesting one I thought is that if you don't have a lot of deletes going on andonly inserts, you could potentially use the sequence of the primary key. So this sequence couldgive you a rough count of how many rows are in the table. One disadvantage of that, even if youdon't have a lot of deletes going on, is that if you have failed inserts, that will still increment thesequence but not actually insert the row so you'll be overcounting them. But that's anotherinteresting alternative. So if you're looking a way to maybe get counts of an entire table, thesemay be some techniques you want to look into.

[00:12:17] The next post is postgres indexes for absolute beginners. This is from the PGMustard blog and this is a basic index post for postgres. They say step one, understand whatyou want to achieve. Because again, indexes are not free. There is a cost for keeping them up todate and maybe sequential scans aren't so bad if you're going to be looking at a lot of a tableand pulling back data from it, then maybe sequential scan is what you want. An index scan isadvantageous when you're only looking for a very, very small portion of the data in that table.And they talk a little bit about the different index types, the primary of course being the Btreeindex and talk about how to create them and even covering multicolumn indexes. So if you wanta basic introduction to indexes, definitely a blog post to check out.

[00:13:09] The next post is 2019 PostgreSQL Trends report private versus public cloudmigrations database combinations and top reasons used. So basically it looks like ScaleGridhere went to the PostgreSQL conference in March and surveyed PostgreSQL users. So theseare all PostgreSQL users, they make some reference to other databases, but no, this is aPostgreSQL crowd that was asked and they say basically how many people are using publiccloud versus a private cloud or a hybrid one. And I believe private just basically means onpremises.

[00:13:45] And what are you using for different cloud providers? Predominantly AWS, it lookslike, although I wonder how much because I don't think I saw this in here, but I wonder howmany are running their own postgres versus using something like RDS, a hosted version. What'syour state of using or migrating or exploring using PostgreSQL most popular databasecombinations with PostgreSQL. So definitely just a survey of PostgreSQL users about howthey're using it in their current state. So if you're interested in that type of information, this isdefinitely a blog post to check out.

[00:14:21] The n --ext post is PostgreSQL's exciting features you should know. And this is from ahackernoon.com and the first one they mentioned is inheritance. And then I was looking at theinheritance here and really what he's boiling it down to is basically partitions using inheritance.And he says the advantages are performance because when you need to do a query you'regoing to only choose like that partition table. The index sizes will be smaller, so less to scanthrough.

[00:14:51] And in terms of maintenance, if you have a table that's not being used, you can easilyvacuum full it, do a re index or even a cluster on the data. He talks a little bit about data types,talking about JSON and how to work with JSON, although interestingly, he doesn't mentionJSON b at all. And really if you're going to be using and manipulating JSON, really that's thedata type you want to use.

[00:15:16] Just the pure JSON data type is mostly beneficial for just getting a full snapshot of apayload and storing it for reference. But if you're wanting to insert data, update data, retrievespecific parts of the data, then A JSON B is generally the better data type to use. He goes intothe array data type and then hstore. Other interesting about hstore is that with JSONB I haven'treally used hstore anymore. Then of course, he talks about data integrity, which is a hallmark ofrelational database. He goes over custom types and functions and error handling, combiningqueries and window functions as well as check constraints and sharding, although he sayssharding, but this actually looks more like partitioning. So if you want to review of some of thefeatures in PostgreSQL, here's a blog post to check out.

[00:16:13] The next post is I am a developer and you can too. And this is from the secondQuadrant.com Blog. And basically this is an introduction on how to get started with contributingto PostgreSQL and not necessarily writing code but reviewing patches. And it goes through theprocess of how you would go about doing that and contributing to PostgreSQL. So if you'reinterested in doing that, definitely a blog post to check out.

[00:16:40] Now, related to that, the next piece of content is writing PostgreSQL Extensions isFun C Language and this is from Percona.com. So maybe if you don't want to contribute toPostgreSQL right away, maybe you could try writing an extension first so they go over theprocess of what you would need to do to kind of start putting together a PostgreSQL extensionusing C. So if that's interesting to you, definitely a blog post to check out.

[00:17:10] The next post is postgres Excel and Global MVCC. So this is talking about thePostgres Excel project, which is a fork of Postgres and how one of the things that it does ishandles global transaction management. What that means is that it ensures that if you havedatabase nodes that are separate geographically, like a multimaster setup, it ensures thatchanges are seen by both instances at the same time using a GTM or a Global TransactionManager. So if you have the need for distributed databases or multimaster and you haven'theard or explored Postgres Excel, this could be a blog post to check out as well as the PostgresExcel site.

[00:17:57] The last post is Replication between PostgreSQL versions using Logical Replication.This is from Percona.com and it is a post about logical replication and how to go ahead and setup your publisher and subscriber and to get started using logical replication. And even if you'reusing a version that doesn't have it versions prior to ten, how you can use PG logical to set upthe same process. Now it doesn't go, from what I can tell, into the process of actually doing anupgrade, but at least getting logical replication initially set up. So definitely a blog post. If you'reinterested in setting this up, that does it. For this episode of Scaling Postgres, you can get linksto all the content mentioned in the Show Notes. Be sure to head over to where you can sign upto receive weekly notifications of each episode, or you could subscribe via YouTube or itunes.Thanks. --

episode_image