Beruflich Dokumente
Kultur Dokumente
PostgreSQL is one of the most advanced open source databases in the world with a lot of great features. One of them
is Streaming Replication (Physical Replication) which was introduced in PostgreSQL 9.0. It is based on XLOG
records which get transferred to the destination server and get applied there. However, it is cluster based and we
cannot do a single database or single object (selective replication) replication. Over the years, we have been
dependent on external tools like Slony, Bucardo, BDR, etc for selective or partial replication as there was no feature
at the core level until PostgreSQL 9.6. However, PostgreSQL 10 came up with a feature called Logical Replication,
through which we can perform database/object level replication.
Logical Replication replicates changes of objects based on their replication identity, which is usually a primary key.
It is different to physical replication, in which replication is based on blocks and byte-by-byte replication. Logical
Replication does not need an exact binary copy at the destination server side, and we have the ability to write on
destination server unlike Physical Replication. This feature originates from the pglogical module.
Logical Replication implements a publish and subscribe concept (Publication & Subscription). Below is a higher
level architectural diagram on how it works.
Basic Logical Replication Architecture
Publication can be defined on the master server and the node on which it is defined is referred to as the "publisher".
Publication is a set of changes from a single table or group of tables. It is at database level and each publication
exists in one database. Multiple tables can be added to a single publication and a table can be in multiple
publications. You should add objects explicitly to a publication except if you choose the "ALL TABLES" option
which needs a superuser privilege.
You can limit the changes of objects (INSERT, UPDATE, and DELETE) to be replicated. By default, all operation
types are replicated. You must have a replication identity configured for the object that you want to add to a
publication. This is in order to replicate UPDATE and DELETE operations. The replication identity can be a
primary key or unique index. If the table does not have a primary key or unique index, then it can be set to replica
identity "full" in which it takes all columns as key (entire row becomes key).
You can create a publication using CREATE PUBLICATION. Some practical commands are covered in the "How
to achieve it" section.
Subscription can be defined on the destination server and the node on which it is defined is referred to as the
"subscriber". The connection to the source database is defined in subscription. The subscriber node is the same as
any other stand alone postgres database, and you can also use it as a publication to further subscriptions.
The subscription is added using CREATE SUBSCRIPTION and can be stopped/resumed at any time using the
ALTER SUBSCRIPTION command and removed using DROP SUBSCRIPTION.
Once a subscription is created, Logical replication copies a snapshot of the data on the publisher database. Once that
is done, it waits for delta changes and sends them to the subscription node as soon as they occur.
However, how are the changes collected? Who sends them to the target? And who applies them at the target?
Logical replication is also based on the same architecture as physical replication. It is implemented by “walsender”
and “apply” processes. As it is based on WAL decoding, who starts the decoding? The walsender process is
responsible to start logical decoding of the WAL, and loads the standard logical decoding plugin (pgoutput). The
plugin transforms the changes read from WAL to the logical replication protocol, and filters the data according to
the publication specification. The data is then continuously transferred using the streaming replication protocol to
the apply worker, which maps the data to local tables and applies the individual changes as they are received, in
correct transactional order.
It logs all these steps in log files while setting it up. We can see the messages in "How to achieve it" section later in
the post.
If you want to consolidate multiple databases into a single database for analytical purposes.
If your requirement is to replicate data between different major versions of PostgreSQL.
If you want to send incremental changes in a single database or a subset of a database to other databases.
If giving access to replicated data to different groups of users.
If sharing a subset of the database between multiple databases.
Logical Replication has some limitations on which the community is continuously working on to overcome:
Tables must have the same full qualified name between publication and subscription.
Tables must have primary key or unique key
Mutual (bi-directional) Replication is not supported
Does not replicate schema/DDL
Does not replicate sequences
Does not replicate TRUNCATE
Does not replicate Large Objects
Subscriptions can have more columns or different order of columns, but the types and column names must
match between Publication and Subscription.
Superuser privileges to add all tables
You cannot stream over to the same host (subscription will get locked).
1. Initialize two different instances for publication and subscription and start.
2. Parameters to be changed before you start the instances (for both publication and subscription instances).
3. Change pg_hba.conf file to allow replication. Note that these values are dependent on your environment,
however, this is just a basic example (for both publication and subscription instances).
Time: 53.542 ms
C1MQV0FZDTY3:bin bajishaik$
4. Create a couple of test tables to replicate and insert some data on Publication instance.
5. Create structure of the tables on Subscription instance as Logical Replication does not replicate the
structure.
7. Create subscription on Suscription instance (port 5556) to the publication created in step 6.
From log:
As you can see in the NOTICE message, it created a replication slot which ensures the WAL cleanup
should not be done until initial snapshot or delta changes are transferred to the target database. Then the
WAL sender started decoding the changes, and logical replication apply worked as both pub and sub are
started. Then it starts the table sync.
1
C1MQV0FZDTY3:bin bajishaik$ psql -d postgres -p 5555 -d source_rep -c
2
"insert into test_rep values(generate_series(101,200),
3
'data'||generate_series(101,200))"
4
INSERT 0 100
5
Time: 3.869 ms
6
C1MQV0FZDTY3:bin bajishaik$ psql -d postgres -p 5555 -d source_rep -c
7
"insert into test_rep_other values(generate_series(101,200),
8
'data'||generate_series(101,200))"
9
INSERT 0 100
10
Time: 3.211 ms
11
C1MQV0FZDTY3:bin bajishaik$ psql -d postgres -p 5556 -d target_rep -c
12
"select count(1) from test_rep"
13
count
14
-------
15
200
16
(1 row)
17
18
Time: 1.742 ms
19
20C1MQV0FZDTY3:bin bajishaik$ psql -d postgres -p 5556 -d target_rep -c
21"select count(1) from test_rep_other"
count
-------
200
(1 row)
Time: 1.480 ms
C1MQV0FZDTY3:bin bajishaik$