Logical standby nodes v5
PGD allows you to create a logical standby node, also known as an offload node, a read-only node, receive-only node, or logical-read replicas. A master node can have zero, one, or more logical standby nodes.
Note
Logical standby nodes can be used in environments where network traffic between data centers is a concern. Otherwise, having more data nodes per location is always preferred.
Logical standby nodes are nodes that are held in a state of continual recovery,
constantly updating until they're required. This behavior is similar to how Postgres physical standbys operate while using logical replication for better performance. bdr.join_node_group
has the pause_in_standby
option to make the node stay in halfway-joined as a logical standby node.
Logical standby nodes receive changes but don't send changes made locally
to other nodes.
Later, if you want, use
bdr.promote_node
to move the logical standby into a full, normal send/receive node.
A logical standby is sent data by one source node, defined by the DSN in
bdr.join_node_group
.
Changes from all other nodes are received from this one source node, minimizing
bandwidth between multiple sites.
For high availability, if the source node dies, one logical standby can be promoted to a full node and replace the source in a failover operation similar to single-master operation. If there are multiple logical standby nodes, the other nodes can't follow the new master, so the effectiveness of this technique is limited to one logical standby.
In case a new standby is created from an existing PGD node, the needed replication slots for operation aren't synced to the new standby until at least 16 MB of LSN has elapsed since the group slot was last advanced. In extreme cases, this might require a full 16 MB before slots are synced or created on the streaming replica. If a failover or switchover occurs during this interval, the streaming standby can't be promoted to replace its PGD node, as the group slot and other dependent slots don't exist yet.
The slot sync-up process on the standby solves this by invoking a function on the upstream. This function moves the group slot in the entire EDB Postgres Distributed cluster by performing WAL switches and requesting all PGD peer nodes to replay their progress updates. This behavior causes the group slot to move ahead in a short time span. This reduces the time required by the standby for the initial slot's sync-up, allowing for faster failover to it, if required.
On PostgreSQL, it's important to ensure that the slot's sync-up completes on the
standby before promoting it. You can run the following query on the standby in
the target database to monitor and ensure that the slots synced up with the
upstream. The promotion can go ahead when this query returns true
.
SELECT true FROM pg_catalog.pg_replication_slots WHERE slot_type = 'logical' AND confirmed_flush_lsn IS NOT NULL;
You can also nudge the slot sync-up process in the entire PGD cluster by manually performing WAL switches and by requesting all PGD peer nodes to replay their progress updates. This activity causes the group slot to move ahead in a short time and also hastens the slot sync-up activity on the standby. You can run the following queries on any PGD peer node in the target database for this: