Postgresql- Bidirectional replication (BDR)
BDR, Bidirectional replication (BDR) is definitely a rising star in the world of PostgreSQL.
PostgreSQL’s bidirectional replication (Postgres-BDR or BDR) is PostgreSQL’s first open-source multi-master replication system to achieve full production status. BDR was developed by 2ndQuadrant, specially designed for distribution in different geographical clusters, using efficient asynchronous logical replication, and supporting any node with more than 2 to 48 nodes in the distributed database.
This post will be about the following topics:
1. Understand BDR replication concepts
2. Install BDR
3. Set up a simple cluster
4. Modify the cluster and failover
5. Understand the performance of BDR
Before digging into all the technical details, it is very important to understand the basic technology of BDR.
Understand the concept of BDR replication
In the past, before 9.0 was introduced, people had to use Slony to copy data. The core problem with solutions like Slony is that an updated trigger is needed, which actually writes the data twice. Trigger-based solutions are difficult to manage, cannot handle DDLs, and are generally a bit tricky to operate.
BDR has been created to end trigger-based solutions and turn PostgreSQL into a stronger, more scalable, and simpler management solution. Trigger-based replication is indeed an obsolete thing and should not be seen in modern infrastructure. You can bet on BDR-it is a long-term, safe solution.
Understand final consistency
In the previous part of the book, CAP understanding was discussed. This is an important part. When a new database technology is evaluated, it should always be kept in mind. For BDR, this system is ultimately consistent. What does it mean? Wikipedia (http://en.wikipedia.org/wiki/Eventual_consistency) provides the following definitions:
The final consistency is a consistency model used by distributed computing and is used to informally guarantee high availability. If there is no update for the given data, all access to the given data will eventually return the last updated value.
This definition is actually so good and simple that it makes sense to put it here. The idea of eventual consistency is that the data on all nodes will not be the same immediately, but after a while, it will actually be the same (if nothing happens). Final consistency also means that by default, data will be copied asynchronously, so not all nodes will see the same data at all times. You must expect to see slightly different data, depending on the host you are connected to.
[BDR also supports synchronous replication. However, it is not as secure as the classic two-phase commit transaction. ]
Handling conflicts
Considering the consistency model, an important topic arises: what if there is a conflict? In general, all systems that use eventual consistency need some kind of conflict resolution. This is used for BDR.
The beauty of BDR is that conflict management is very flexible. By default, BDR provides a clear conflict management algorithm, which defines that the last update always wins.
However, more than that, in BDR, it is also possible to write your own conflict resolution algorithm. A server-side stored procedure can be used to define what must be done if a conflict occurs. This mechanism provides users with maximum flexibility and helps users achieve more complex goals.
Another advantage of BDR is that conflict modifications can be recorded in a table. In other words, if a conflict occurs, it is still possible to reverse the ongoing engineering in the system. The data will not be silently forgotten but will be saved for future investigations.
When it comes to conflicts, the use of BDR must be kept in mind; BDR is designed as a (geographically) distributed database system that allows you to process large amounts of data. From a consistency point of view, one thing must be kept in mind: how likely is it that a person in New York and a person in Rome change the same row at the same time on their nodes? If this is a conflict in the situation in the table, BDR is really not a suitable solution. However, if conflicts hardly occur (this is the case for 99% of applications), BDR is really the option to choose. Remember, as long as the same row is changed by many people at the same time, or if a primary key is violated, a conflict will occur. If two people change two lines that are completely independent of each other, no conflict will occur. Therefore, for some occasions, the final consistency is a reasonable choice.
Distribution sequence
Sequence is a potential source of conflict. Imagine hundreds of users adding data to the same table at the same time. Any auto-increment column will immediately become the real source of the conflict, because instances tend to repeatedly assign the same or similar numbers.
Therefore, BDR provides a distributed sequence. Each node is not assigned a value but a series of values. These values can then be used until the next series of values is assigned. The availability of distributed sequences greatly reduces the number of potential conflicts and helps your system run more smoothly than other methods.
Handling DDLs
The data structure is by no means fixed. Once within a period of time, structural changes will occur. Maybe a table is to be added with data, or a column must be deleted, and so on.
BDR can handle these operations well. Most DDLs are just copied, because they are just sent to all nodes and executed. However, there are also prohibited orders. Here are two more prominent ones:
ALTER TABLE … ALTER COLUMN … USING();
ALTER TABLE … ADD COLUMN … DEFAULT;
Keep in mind that conflicts are likely to be concurrent, so without these commands is not too much of a problem. In most cases, these restrictions are not important. However, they must be kept in mind.
In many cases, there is a workaround for those commands, for example, setting explicit values, and other methods.
BDR usage scenarios
For BDR, there are both good and bad use scenarios. Of course, this rule applies to every software. However, database systems are a bit different, and it is necessary to think carefully before making a decision.
Good BDR usage scenario
In general, if a particular data set is modified on only one node, BDR works best. This greatly reduces the possibility of conflict and helps you enjoy a smooth process.
What does it actually mean to modify on one node? We assume three locations: Vienna, Berlin, and London. For Germans working in Berlin, working in Vienna is more likely to modify Austrian data. German operators usually modify German customer data instead of Austrian or British data.
Austrian operators are very likely to change Austrian data. Every operator should see all company data in every country. However, it is more likely that the data is changed wherever it is created. From a business point of view, this is basically impossible, and two people change the same data in different countries at the same time-impossible in case of conflict.
In addition, if the workload is mainly composed of write operations, not UPDATE or DELETE operations. Plug-in is unlikely to cause conflict, so it is a good workload for BDR.
Bad BDR usage scenarios
However, there are generally workloads that are not beneficial to BDR. If consistency is your primary goal and the key to your application, then using BDR is certainly not the right choice. Due to the asynchronous nature of the product, conflicts and consistency can cancel each other out.
Another bad scenario for BDR is that if all nodes need to see the exact same data at the same time, BDR is very difficult.
BDR can effectively write data. However, it cannot extend the write operation indefinitely, because at this point, all write operations still end on each server. Keep in mind that the write operation can only be extended by actually splitting the data. Therefore, if you are looking for a solution that can better extend the write operation, PL/Proxy may be a better choice.
Logic decoding trick
The main concept behind BDR is logical decoding. As already mentioned in this book, logical decoding has been invented to analyze the transaction log stream and convert the binary log to a more readable format, such as SQL. Compared to binary streams, the advantage of logical streams is that replication can occur more easily at version boundaries.
Another important advantage is that there is no need to synchronize physical XLOG positions. As shown in the previous chapters of this book, XLOG addresses are very important and cannot be changed to make things work. Therefore, XLOG-based replication is always a single master and multiple slave replication. There is simply no way to unify two binary XLOG streams into one changing stream. Logic decoding solves that problem in an elegant way because it leaves the whole XLOG synchronization problem. By copying the real physical changes in SQL format, a lot of flexibility is gained and new operations for future improvements are provided.
The entire XLOG decoding is basically done behind the scenes; the end-user will not notice it.
Install BDR
Installing BDR is easy. The software is available as a source package, and it can be deployed directly using binary packages. Of course, it is also possible to install from source code. However, as more and more changes are transferred to the PostgreSQL kernel, this process may change. Therefore, I decided to skip the source code installation.
Install binary packages
In the previous chapter, you have learned how to install BDR on a Linux system using pre-compiled binary packages. The display installation work selected is if the Linux distribution being performed is CentOS 7 (to find information about other packages, please check http://bdr-project.org/docs/stable/installation.html )
The installation process itself is simple, first install the repo:
yum install http://packages.2ndquadrant.com/postgresql-bdr94-2ndquadrant/ yum-repo-rpms/postgresql-bdr94–2ndquadrant-redhat-1.0–2.noarch.rpm
In the next steps, BDR can be deployed:
yum install postgresql-bdr94-bdr
Once DBR is installed on all nodes, the system is ready to run.
[Remember, BDR is still in a fairly early state of development, so the subsequent process may change over time. ]
Set up a simple cluster
Once the installation is complete, it is time to start and actually set up a simple cluster. In this scenario, a cluster of three nodes will be created.
Please note that in order to make it easier for beginners to use, all data nodes will be installed on the same physical server.
Arrange storage
The first thing the administrator has to do is to create some space for PostgreSQL. In this simple example, only three directories are created:
[root@pankajconnect ~]# mkdir /data
[root@pankajconnect ~]# mkdir /data/node1 /data/node2 /data/node3
Make sure these directories belong to postgres (this is usually a good idea if you run PostgreSQL with postgres user):
[root@pankajconnect ~]# cd /data/
[root@pankajconnect data]# chown postgres.postgres node*
Once these directories are created, you are ready for everything you need for a successful setup:
[root@pankajconnect data]# ls -l
total 0
drwxr-xr-x 2 postgres postgres 6 Apr 2020:51 node1
drwxr-xr-x 2 postgres postgres 6 Apr 2020:51 node2
drwxr-xr-x 2 postgres postgres 6 Apr 2020:51 node3
Create a database instance
After creating some space for our experiment, it makes sense to cross-check our system path. Make sure the correct PostgreSQL version is in your path. Some users reported problems with the PostgreSQL version provided by some operating systems in that path during the installation process due to an accident. Therefore, it only makes sense to check the path and set accordingly, if necessary:
export PATH=/usr/pgsql-9.4/bin:$PATH
Then, you can create three database instances. The initdb command can be used in any common situation:
[postgres@pankajconnect ~]$ initdb -D /data/node1/ -A trust
[postgres@pankajconnect ~]$ initdb -D /data/node2/ -A trust
[postgres@pankajconnect ~]$ initdb -D /data/node3/ -A trust
To make the installation process easier, trust will be used as a verification method. Of course, user authentication is possible, but it is not the core of the theme of this chapter, so it is best to simplify this part as much as possible.
Now that you have created three database instances, you can adjust postgresql.conf. The following parameters are required:
shared_preload_libraries =’bdr’
wal_level =’logical’
track_commit_timestamp = on
max_connections = 100
max_wal_senders = 10
max_replication_slots = 10
max_worker_processes = 10
The first thing to do is to load the BDR module into PostgreSQL. It contains an important infrastructure for replication. Next, logical decoding must be enabled. It will be the backbone of the entire infrastructure.
For BDR to work, track_commit_timestamp must be turned on. In standard PostgreSQL 9.4, this setting does not exist. It will most likely appear in future PostgreSQL versions along with BDR. Knowing the submitted timestamp is essential to BDR’s internal conflict resolution algorithm (the final victory).
Then, max_wal_senders must be set together with replication slots. Streaming replication also requires these settings and should not be a big surprise.
Finally, there is max_worker_processes. As long as PostgreSQL is started, BDR initiates some client worker processes in the background. These work processes are based on standard background work APIs and are required to handle data transfer during the copy process. It is essential to ensure that there are enough processes available.
Finally, there are some conflict-related settings that can be used:
# Handling conflicts
#bdr.default_apply_delay=2000 # milliseconds
#bdr.log_conflicts_to_table=on
Now that postgresql.conf is configured, it is time to focus on pg_hba.conf.
In the simplest case, a simple copy rule must be created:
local replication postgres trust
host replication postgres 127.0.0.1/32 trust
host replication postgres ::1/128 trust
Please note that in a real and efficient setup, a wise manager will configure a special copy user and set a password, or use other authentication methods. For simplicity, this process has been ruled out here.
Make the database work just like ordinary PostgreSQL:
pg_ctl -D /data/node1/ start
pg_ctl -D /data/node2/ start
pg_ctl -D /data/node3/ start
For our test, we need a database and an instance:
createdb test -p 5432
createdb test -p 5433
createdb test -p 5434
Load the module and start the cluster
So far, so good! To ensure that BDR can do its job, it must be loaded into the database. Two extensions are required, namely btree_gist and bdr:
[postgres@pankajconnect node1]$ psql test -p 5432
test=# CREATE EXTENSION btree_gist;
CREATE EXTENSION
test=# CREATE EXTENSION bdr;
CREATE EXTENSION
These extensions must be loaded into the three databases created previously. Simply loading them into a component is not enough. It is essential to load them into all databases.
Finally, our database nodes must all be added to a BDR group. So far, there are only three independent database instances, which include the thank you module. In the next step, these nodes will be connected to each other.
The first thing to do is to create a BDR group:
test=# SELECT bdr.bdr_group_create(local_node_name :=’node1',node_external_dsn :=’port=5432 dbname=test’
);
bdr_group_create
— — — — — — — — —
(1 row)
Basically, two parameters are required: the local name and a database connection from the remote host to the node. The simplest way to define local_node_name is to give the node a simple name.
To check whether the node is ready for BDR, call the following function. If the answer is the same as the following, it means that there is no problem with the configuration:
test=# SELECT bdr.bdr_node_join_wait_for_ready();
bdr_node_join_wait_for_ready
— — — — — — — — — — — — — — —
(1 row)
Now it’s time to add other nodes to the replication system:
test=# SELECT bdr.bdr_group_join(
local_node_name :=’node2',
node_external_dsn :=’port=5433 dbname=test’,
join_using_dsn :=’port=5432 dbname=test’
);
bdr_group_join
— — — — — — — —
(1 row)
Again, a NULL value is a good sign. First, the second node was added to the BDR. Then, the third node can join one:
test=# SELECT bdr.bdr_group_join(
local_node_name :=’node3',
node_external_dsn :=’port=5434 dbname=test’,
join_using_dsn :=’port=5432 dbname=test’
);
bdr_group_join
— — — — — — — —
(1 row)
Once all nodes have been added, the administrator can check whether all nodes are ready:
[postgres@pankajconnect node2]$ psql test -p 5433
test=# SELECT bdr.bdr_node_join_wait_for_ready();
bdr_node_join_wait_for_ready
— — — — — — — — — — — — — — —
(1 row)
[postgres@pankajconnect node2]$ psql test -p 5434
test# SELECT bdr.bdr_node_join_wait_for_ready();
bdr_node_join_wait_for_ready
— — — — — — — — — — — — — — —
(1 row)
If both queries return NULL, it means that the system is running well.
Check your settings
After this simple process, BDR is up and running. In order to check whether all the work is as expected, it makes sense to check the relevant replication process:
[postgres@pankajconnect ~]$ ps ax | grep bdr
31296? Ss 0:00 postgres: bgworker: bdr supervisor
31396? Ss 0:00 postgres: bgworker: bdr db: test
31533? Ss 0:00 postgres: bgworker: bdr supervisor
31545? Ss 0:00 postgres: bgworker: bdr supervisor
31553? Ss 0:00 postgres: bgworker: bdr db: test
31593? Ss 0:00 postgres: bgworker: bdr db: test
31610? Ss 0:00 postgres: bgworker: bdr (6136360420896274864,1,16385,)->bdr (6136360353631754624,1,
…
31616? Ss 0:00 postgres: bgworker: bdr (6136360353631754624,1,16385,)->bdr (6136360420896274864,1,
As you can see, each instance will have at least three BDR processes. If these processes are present, this is usually a good sign, and replication should work as expected.
A simple test can reveal whether the system is working:
test=# CREATE TABLE t_test (id int, t timestamp DEFAULT now() );
CREATE TABLE
After the table is created, the structure should look like this:
test=# \d t_test
Table “public.t_test”
Column | Type | Modifiers
— — — — + — — — — — — — — — — — — — — -+ — — — — — — — —
id | integer |
t | timestamp without time zone | default now()
Triggers:
truncate_trigger AFTER TRUNCATE ON t_test FOR EACH STATEMENT EXECUTE PROCEDURE bdr.queue_truncate()
This table looks as expected. There is only one exception: a TRUNCATE trigger is automatically created. Remember, replication slots can stream INSERT, UPDATE, and DELETE statements. DDLs and TRUNCATE are now row-level information, so these statements are no longer in flow. Triggers are needed to capture TRUNCATE and copy it into plain text. Do not try to change or delete the trigger.
To test replication, a simple INSERT statement can work:
test=# INSERT INTO t_test VALUES (1);
INSERT 0 1
test=# TABLE t_test;
id | t
— — + — — — — — — — — — — — — — —
1 | 2015–04–11 08:48:46.637675
(1 row)
In this example, this value has been added to the instance listening for 5432. A quick check shows that the data has been well copied to the instances listening to 5433 and 5434:
[postgres@pankajconnect ~]$ psql test -p 5434
test=# TABLE t_test;
id | t
— — + — — — — — — — — — — — — — —
1 | 2015–04–11 08:48:46.637675
(1 row)
Handling conflicts
As mentioned earlier in this chapter, when working with BDR, conflict is an important thing. Remember, BDR is designed as a distributed system, so it makes sense to use it when conflicts are unlikely. However, it is important to understand what happened during the conflict.
To show what happened, here is a simple table:
test=# CREATE TABLE t_counter (id int PRIMARY KEY);
CREATE TABLE
Then, add a line:
test=# INSERT INTO t_counter VALUES (1);
INSERT 0 1
To run the test, a simple SQL query is necessary. In this example, 10,000 UPDATE statements are used:
[postgres@pankajconnect ~]$ head -n 3 /tmp/script.sql
UPDATE t_counter SET id = id + 1;
UPDATE t_counter SET id = id + 1;
UPDATE t_counter SET id = id + 1;
Now this script is executed three times, once on a node:
[postgres@pankajconnect ~]$ cat run.sh
#!/bin/sh
psql test -p 5432 </tmp/script.sql> /dev/null &
psql test -p 5433 </tmp/script.sql> /dev/null &
psql test -p 5434 </tmp/script.sql> /dev/null &
As the same line hits again and again, the number of conflicts is expected to soar.
[Please note that this is not the original purpose of the BDR. It is just a demonstration to show what happened in the event of a conflict. ]
Once these three scripts are completed, you can check what happened in the conflict:
test=# \x
Expanded display is on.
test=# TABLE bdr.bdr_conflict_history LIMIT 1;
-[ RECORD 1 ] — — — — — — + — — — — — — — — — — — — — — —
conflict_id | 1
local_node_sysid | 6136360318181427544
local_conflict_xid | 0
local_conflict_lsn | 0/19AAE00
local_conflict_time | 2015–04–11 09:01:23.367467+02
object_schema | public
object_name | t_counter
remote_node_sysid | 6136360353631754624
remote_txid | 1974
remote_commit_time | 2015–04–11 09:01:21.364068+02
remote_commit_lsn | 0/1986900
conflict_type | update_delete
conflict_resolution | skip_change
local_tuple |
remote_tuple | {“id”:2}
local_tuple_xmin |
local_tuple_origin_sysid |
error_message |
error_sqlstate |
error_querystring |
error_cursorpos |
error_detail |
error_hint |
error_context |
error_columnname |
error_typename |
error_constraintname |
error_filename |
error_lineno |
error_funcname |
BDR provides a simple and very easy way to read a table containing all conflicting rows. Display of LSN, transaction ID and more conflict related information at the top. In this example, BDR has made a skip_change solution. Remember that every changed line will hit the same line, because we are asynchronous multi-master. This is very annoying to BDR. In this example
The UPDATE statement is indeed skipped; it is very important to understand this. BDR can skip changes in conflicts or concurrent events in your cluster.
Comprehension set
So far, the entire cluster has been used. Everyone can copy data to others. In many cases, this is not required. BDR is more flexible in this regard.
One-way replication
BDR is not only capable of bidirectional replication, but also bidirectional replication. In some cases, this is very convenient. Consider a system that only provides read services. A simple one-way slave may be all you need.
BDR provides a simple function to register a node as a one-way slave:
bdr.bdr_subscribe(local_node_name,
subscribe_to_dsn,
node_local_dsn,
apply_delay integer DEFAULT NULL,
replication_sets text[] DEFAULT ARRAY[‘default’],
synchronize bdr_sync_type DEFAULT’full’)
Of course, it is also possible to delete a node from one-way replication:
bdr.bdr_unsubscribe(local_node_name)
The installation process is very simple and fits the basic design principles of BDR.
Processing data sheets
The beauty of BDR is that there is no need to copy the entire instance to a cluster. Replication can be very fine-grained, and administrators can decide what data is copied to where. Two functions can be used for table replication collections:
bdr.table_set_replication_sets(p_relation regclass, p_sets text[])
This sets up a replicated collection of tables. The initial allocation will be overwritten.
If you want to know which replication set a table belongs to, you can call the following function:
bdr.table_get_replication_sets(relation regclass) text[]
We will see more functions with the development of BDR in some replication areas. It will allow you to flexibly schedule data as needed.
Control replication
For maintenance reasons, it may be necessary to maintain and resume replication over and over again. Just consider a major software update. It may do something annoying to your data structure. You absolutely don’t want something that is wrong to be copied to your system. Therefore, it can conveniently stop copying and restart it once it proves normal.
Two functions can be used for this job:
SELECT bdr.bdr_apply_pause()
To restart again, you can use the following function:
SELECT bdr.bdr_apply_resume()
Connections to remote nodes (or nodes) are maintained, but data cannot be read from them. The effect of suspending the request is not permanent, so if PostgreSQL is restarted or the postmaster resumes after a background failure, replay will resume. Terminating the personal background using pg_terminate_backend will not cause replay to resume, or will reinstall the postmaster without requiring a full restart. There is no option to pause a replay from the only peer node
Conclusion:
BDR is a rising star in the replication world of PostgreSQL. Currently, it is still under development, and we can expect more in the near future (perhaps while holding this book in your hands).
BDR is an asynchronous multi-master and allows people to run geographically distributed databases.
It is important to remember that BDR is particularly useful when the replication conflict rate is very low.
Thanks for reading this post, if you like please follow me up.