Setting up and switching PostgreSQL active/standby environment on Linux

Pankaj kushwaha
6 min readJun 23, 2020

--

Database hot standby: Database hot standby refers to the creation, maintenance, and monitoring of one or more standby databases for the primary database. They are always on and kept synchronized with the host. When the host fails, the hot standby database can be enabled at any time to protect the data from failures, disasters, errors, and crashes.

Streaming replication: A method of data replication between servers provided by PostgreSQL. In this way, the backup server connects to the primary server, and the primary server streams them to the backup server when the WAL records are generated without having to wait until the WAL file is filled.

2. Experimental environment

Two servers.Operating system: Linux CentOS 6.6Software: PostgreSQL 10.0

The IP address of the main server(Source DB) is 10.40.239.103, and the IP address of the backup serverTarget DB) is 10.40.239.105.

3. Experimental process
3.1 Experimental preparation
Both virtual machines have PostgreSQL 10.0 installed, and their installation directory is: /u01/app/postgresql10/

Add the following lines in /etc/profile:export PATH=/u01/app/postgresql10/bin:$PATH

export LD_LIBRARY_PATH=/u01/app/postgresql10/lib:$LD_LIBRARY_PATH

export PGDATA=/u01/app/postgresql10/data

3.2 Build the main and standby environment of PostgreSQL .
3.2.1 Operations on the master node

1. Make sure the service is started. Excuting an order:

su postgres
Switch users and execute:
pg_ctl start -D $PGDATA
Start the service.

2. Create a user for streaming replication. Executing an order:

psql -h 127.0.0.1 -p 5432 -U postgres
Enter the console and execute the following statement to create a user:
create user repuser with login replication password ‘123456’;

3. Modify the pg_hba.conf file and add the following content to allow the replication users and super users on both computers to log in:

host replication repuser 10.33.45.101/32 md5

host replication repuser 10.33.45.102/32 md5

host all postgres 10.33.45.101/32 trust

host all postgres 10.33.45.102/32 trust

4. Set these parameters in postgresql.conf on the master node:

max_wal_senders = 10
wal_level = replica
wal_log_hints = on
wal_keep_segments = 10
wal_receiver_status_interval = 5s
hot_standby_feedback = on

The meaning of these parameters is as follows:

max_wal_senders represents the maximum number of concurrent connections from the backup server or streaming basic backup client;

wal_level indicates the log level. For streaming replication, its value should be set to replica;

wal_log_hints = on means that during the first modification of the page after a checkpoint on the PostgreSQL server, the entire content of the disk page is written to the WAL, even if non-critical modifications are made to the so-called hint bits;

wal_keep_segments specifies the minimum number of past log file segments that can be kept in the pg_wal (pg_xlog below PostgreSQL 9.6 version) directory when the backup server needs to obtain log segment files for streaming replication.

log_connections indicates whether to record the client’s connection to the server in the log;

wal_receiver_status_interval specifies the minimum period for the WAL receiver process on the backup machine to send information about the replication progress to the primary server or upstream backup machine;

hot_standby_feedback Specifies whether a hot standby machine will send feedback to the main server or upstream standby machine about the query currently being executed on the standby machine. Here it is set to on.

For details, you can refer to the official postgresql documentation.

5. Restart the master node:

pg_ctl restart -D $PGDATA

6. After restarting, it is not necessary to create replication slots for the primary and backup servers.

select * from pg_create_physical_replication_slot(‘postgresql_node101’);select * from pg_create_physical_replication_slot(‘ postgresql_node102’);

The role of the replication slot is:

1. In streaming replication, when a standby node is disconnected, the standby node provides feedback data through hot_standby_feedback. Data will be lost. When the standby node reconnects, it may cause query conflicts because the master node sends a cleanup record. The replication slot will still record the xmin (the oldest transaction ID that the replication slot needs to retain in the database) value of the standby node even when the standby node is disconnected, so as to ensure that there is no cleanup conflict.

2. When a standby node is disconnected, the WAL file information required by the standby node is also lost. If there is no replication slot, when the standby node reconnects, we may have discarded the required WAL file, so we need to completely rebuild the standby node. The replication slot ensures that this node retains all wal files required by downstream nodes.

3.2.2 Operations on the standby node

1. Make sure the service is stopped:

su postgres
Switch users and execute:
pg_ctl stop -D $PGDATA
Close the service.

2. First delete the files in the data directory $PGDATA in the standby node:

cd $PGDATA

rm –rf *
Then execute:
pg_basebackup -Xs -d “hostaddr=10.33.45.101 port=5432 user=repuser password=123456” -D $PGDATA -v -Fp

Here, -Xs means that the copying method is streaming. This method will not copy the WAL files that have been archived before the start of this backup; -d is followed by a connection string, where “hostaddr=10.40. “239.101” indicates that the IP address of the main server is 10.40.239.103, “port=5432” indicates that the database port is 5432, “user=repuser” indicates that the user used for streaming replication is repuser, and “password=123456” indicates that the password is 123456; “-D $PGDATA” means to input the backup content to the local $PGDATA directory; “-v” means to print detailed information, and -Fp means that the copy result output is a plain file.

3. After the basic backup is completed, modify the postgresql.conf file of the standby node and set:

hot_standby = on

4. Copy recovery.conf.sample in /u01/app/postgresql10/share/ to $PGDATA and rename it to recovery.conf:

cp /u01/app/postgresql10/share/recovery.conf.sample $PGDATA/recovery.conf

And set the following parameters:

recovery_target_timeline = ‘latest’
standby_mode = on
primary_conninfo = ‘host=10.40.239.103 port=5432 user=repuser password=repuser123’
primary_slot_name = ‘postgresql_node102’
trigger_file = ‘tgfile’

The meaning of these parameters is as follows:

recovery_target_timeline indicates what time point to recover to the database timeline, here is set to latest, which is the latest.

standby_mode indicates whether to start the PostgreSQL server as a backup server. Here it is set to on, which is the backup mode.

primary_conninfo specifies the connection string used by the backup server to connect to the primary server, where “host=10.40.239.103” indicates that the ip address of the primary server is 10.40.239.103, “port=5432” indicates that the database port is 5432, and “user=repuser” Indicates that the user used for streaming replication is repuser, and “password=123456” indicates that the password is 123456.

primary_slot_name specifies to use an existing replication slot to control resource removal on upstream nodes when connecting to the primary server through streaming replication. Here we specify the postgresql_node102 created in Section 3.2.1. If the replication slot is not created on the master server, this parameter is not configured.

trigger_file specifies a trigger file, the existence of the file will end the recovery in the backup machine, making it the host.

5. Start the standby node service:

pg_ctl start -D $PGDATA

3.3 Active and standby environment detection

1. Create a table on the master node and insert data:

postgres=# create table student (id int, name text);

CREATE TABLE

postgres=# insert into student (id, name) values (1,’tom’);

INSERT 0 1

2. Check on the standby node:

postgres=# select * from student;

id | name

— — + — — —

1 | pankaj

The data of the master node is synchronized to the standby machine.

At the same time, writing data on the standby node will fail:

postgres=# insert into student (id, name) values (2,’raj’);

ERROR: cannot execute INSERT in a read-only transaction

3.4 Switching between active and standby environments

1. Activate the standby node to make it the new primary node:

pg_ctl promote -D $PGDATA
The results are:
waiting for server to promote…….. done

server promoted

2. Check the status of the new master node:

postgres=# pg_controldata | grep cluster

Database cluster state: in production
Insert a piece of data:
postgres=# insert into student (id, name) values (2,’amy’);

INSERT 0 1

3. Stop the old main node:

pg_ctl stop -m fast -D $PGDATA
result:
waiting for server to shut down…. done

server stopped

4. Perform the data recovery operation on the stopped old main node:

pg_rewind — target-pgdata $PGDATA — source-server=’host=10.33.45.102 port=5432 user=postgres dbname=postgres’ -P
The results are as follows:
connected to server

servers diverged at WAL location 0/2B000230 on timeline 4

rewinding from last common checkpoint at 0/2A000098 on timeline 4

reading source file list

reading target file list

reading WAL in target

need to copy 57 MB (total source directory size is 143 MB)

58749/58749 kB (100%) copied

creating backup label and updating control file

syncing target data directory

Done!

Indicates that the WAL log was successfully obtained from the new master node.

5. Reconfigure the recovery.conf of the old main node:

recovery_target_timeline = ‘latest’
standby_mode = on
primary_conninfo = ‘hostaddr=10.40.239.105 port=5432 user=repuser password=repuser123’
primary_slot_name = ‘postgresql_node101’

6. Restart the node by executing the following command on the old main node:

pg_ctl start -D $PGDATA

7. Verify on the old main node:

postgres=# insert into student (id, name) values (3,’pankajconnect’);

ERROR: cannot execute INSERT in a read-only transaction
Now, it has become a new standby node.

In this way, we have realized the switch of the main and standby nodes under linux.

Thanks for reading this.

--

--

Pankaj kushwaha

Database/System Administrator | DevOPS | Cloud Specialist | DevOPS