PostgreSQL Replication Failback with pg_rewind
Subscribe to Get New Episodes Weekly
Learn how to reuse an old primary database after a promotion or failover event using pg_rewind. This works even if the old primary has had writes.
This tutorial assumes you have already setup streaming replication between a primary and a replica PostgreSQL server. If you need to set that up, use our Posgres replication tutorial. Also, make sure you have replication slots setup as well. You can set that up using our replication slot tutorial.
# PostgreSQL 10 installed on Ubuntu # Primary DB cluster called "main" on port 5432 # Replica DB cluster called "replica" on port 5433 # Test database exists with a posts table and two posts # Make sure postgresql.conf has wal_log_hints = on # review data sudo su - postgres psql test -c "select * from posts;" -p 5433 # promote replica to primary and verify sudo pg_ctlcluster 10 replica promote tail -n 100 /var/log/postgresql/postgresql-10-replica.log # write some data to the old primary psql test -c "insert into posts (id, title, content, type) values (102, 'Intro to SQL Where Clause', 'Easy as pie!', 'SQL');" -p 5432
Bring up main cluster as a replica
# stop main cluster sudo systemctl stop postgresql@10-main # NOTE!!! Creating recovery.conf and a replication slot should be done AFTER # running pg_rewind because the entire directory gets synchronized with the # new primary. However, sometimes you may not know you need to use pg_rewind. # So, we wanted to show you some of the errors and issues you could encounter # if you go ahead and create recovery.conf and a replication slot. # configure recovery.conf on main cluster nano /var/lib/postgresql/10/main/recovery.conf restore_command = 'cp /var/lib/postgresql/pg_log_archive/main/%f %p' recovery_target_timeline = 'latest' standby_mode = 'on' primary_conninfo = 'user=rep_user passfile=''/var/lib/postgresql/.pgpass'' host=''/var/run/postgresql'' port=5433 sslmode=prefer sslcompression=1 krbsrvname=postgres target_session_attrs=any' archive_cleanup_command = 'pg_archivecleanup /var/lib/postgresql/pg_log_archive/main %r' primary_slot_name = 'main' # create a slot for the main cluster psql -c "select * from pg_create_physical_replication_slot('main');" -p 5433 # start main cluster and review logs to see error sudo systemctl start postgresql@10-main tail -n 100 /var/log/postgresql/postgresql-10-main.log # Use pg_rewind to discard that transaction sudo systemctl stop postgresql@10-main /usr/lib/postgresql/10/bin/pg_rewind \ --target-pgdata /var/lib/postgresql/10/main \ --source-server="port=5433 user=postgres" \ --progress # Since pg_rewind synchronized the directory with the new primary, # you need to setup your recovery.conf again as shown above. nano /var/lib/postgresql/10/main/recovery.conf # start main cluster and verify in sync sudo systemctl start postgresql@10-main tail -n 100 /var/log/postgresql/postgresql-10-main.log psql test -c "select * from posts;" -p 5432
Failback to main cluster as primary
# stop replica cluster sudo systemctl stop postgresql@10-replica # promote main to primary and verify sudo pg_ctlcluster 10 main promote tail -n 100 /var/log/postgresql/postgresql-10-main.log psql test -c "insert into posts (id, title, content, type) values (103, 'Intro to SQL Order Clause', 'What comes first?', 'SQL');" -p 5432 psql test -c "select * from posts;" -p 5432
Setup replica cluster as a replica again
# setup recovery.conf on replica mv /var/lib/postgresql/10/replica/recovery.done /var/lib/postgresql/10/replica/recovery.conf # start replica cluster and verify in sync sudo systemctl start postgresql@10-replica tail -n 100 /var/log/postgresql/postgresql-10-replica.log # Due to pg_rewind synchronizing directories, make sure your new primary # has the correct replication slots configured. psql -c "select * from pg_replication_slots;" -p 5432 psql -c "select * from pg_drop_replication_slot('main');" -p 5432 psql -c "select * from pg_create_physical_replication_slot('replica');" -p 5432 tail -n 100 /var/log/postgresql/postgresql-10-replica.log psql test -c "select * from posts;" -p 5433 # review and drop slots on replica psql -c "select * from pg_replication_slots;" -p 5433 psql -c "select * from pg_drop_replication_slot('main');" -p 5433