Manage Replication and Failover on a PostgreSQL Cluster Using repmgr
Author: Jeff Novotny
Other contributors: Adam Overa
View edit history on GitHub → Originally authored by Jeff NovotnyTraducciones al EspañolEstamos traduciendo nuestros guías y tutoriales al Español. Es posible que usted esté viendo una traducción generada automáticamente. Estamos trabajando con traductores profesionales para verificar las traducciones de nuestro sitio web. Este proyecto es un trabajo en curso.
Although PostgreSQL is very powerful and reliable, it does not include built-in high availability (HA) capabilities. Resiliency and reliability can be configured using a replication manager, such as repmgr. This guide explains how to install and configure PostgreSQL and repmgr. It also describes the steps required to convert a two-node network into a functional high availability cluster (HA cluster).
PostgreSQL and High Availability
High availability functionality is an important component for any production database. If an important database server crashes or becomes inaccessible due to network issues, it can adversely affect both customers and employees. It can also degrade other resources, such as corporate or e-commerce websites.
A high availability architecture addresses these concerns. In an HA network, multiple nodes maintain the same database schema and data. One node is designated the master or primary node. It processes all changes to the database and transmits a record of all transactions to the standby nodes. These nodes are also known as secondary nodes or replicas.
In most architectures, both the primary and standby nodes can respond to read requests. This capability assists with load balancing and increases the overall capacity of the system. However, if the primary node fails, one of the standby nodes is elevated to become the new primary. This ensures uninterrupted access to the data.
There are a few important terms to be aware of before configuring an HA cluster:
- Data Replication: Replication is the process of generating additional copies of the original database data. It logs all data and schema updates and transmits them to designated standby nodes. Replication can be synchronous or asynchronous and transmits updates using either a file or streaming-based format.
- High Availability Cluster (HA Cluster): An HA Cluster is a collection of nodes, each hosting a copy of the same data and schemas. The different database instances are kept in sync by the replication algorithm. The HA Cluster is presented as a single database and external database users do not have any visibility into its composition.
- Primary Node: This is the master node within a PostgreSQL HA cluster. It receives all database changes and updates, so it always has the most current view of the data. It replicates these transactions to the other nodes in the HA cluster. Primary nodes also handle read requests, but these are typically distributed between the different nodes for load balancing purposes. In PostgreSQL, the primary node is registered based on the configuration files. However, after a failover event, a new primary can be chosen through a primary election.
- Standby Node: Also known as a secondary node or a replica, the standby receives database updates from the primary node. During regular operation, these nodes receive and process replication updates from the primary, but only respond to user read requests. Each HA cluster can contain additional standby nodes for added redundancy and load balancing.
- Failover: If the primary node fails, a failover occurs. One of the standby nodes is promoted to become the new primary node. It then handles all subsequent database updates. Administrators can initiate a manual failover for database maintenance purposes. This scheduled activity is sometimes known as a manual switchover.
- Streaming Replication: A streaming-based replication algorithm immediately transmits each update to the replicas, resulting in more timely updates on the replicas. It does not have to wait for a list of entries to build up before transmitting the updates. Streaming can be either synchronous or asynchronous.
- Synchronous/Asynchronous Replication: In synchronous replication, the primary node waits for confirmation from at least one standby before confirming the transaction. For asynchronous replication, the primary node does not wait for a response after transmitting an update. Synchronous mode guarantees the database is consistent across the HA cluster and eliminates potential data loss during a switchover. However, it introduces latency and can reduce throughput. Asynchronous mode is the default PostgreSQL and repmgr replication method.
All high availability solutions must be able to perform the following tasks:
- Designate one of the nodes as the primary node.
- Direct all write operations to the primary node.
- Replicate all changes on the primary to the standby nodes.
- Monitor the primary node and detect any failures.
- If the primary fails, choose one of the replicas as the new primary. This can be done either through deterministic means, including priority settings, or through an election process.
Some high availability solutions distribute read requests to all active nodes. In some cases, replication managers are able to repair failed nodes and reintegrate them into the HA cluster. However, repmgr is not able to do this.
Some database systems refer to the primary and all standby nodes as a cluster. However, in PostgreSQL a “cluster” refers to either a collection of databases sharing the same data storage area or a table index reorganization operation. In PostgreSQL a redundant computing cluster is a “high availability (HA) cluster”.
For further information about high availability solutions for PostgreSQL and a comparison of the different replication managers, see Akamai’s PostgreSQL high availability guide.
What Is repmgr?
Repmgr is an open source suite for managing PostgreSQL HA clusters. It closely integrates with PostgreSQL to configure a primary node, clone replica nodes, monitor the HA cluster, and perform a failover. Repmgr supports a single read-write primary server and one or more read-only standby nodes, also known as replicas. This guide focuses on repmgr because it is an efficient and robust solution with long-standing popularity in the database management area.
Repmgr is installed from the apt package. It includes a command-line utility for configuring, administering, and monitoring the HA cluster nodes. Configuration is added to the PostgreSQL and repmgr configuration files.
Each node in the HA cluster must be registered as either a primary or standby node. The primary is registered first and used as a template to clone the standby nodes. Repmgr creates its own tables within the PostgreSQL database to store the information about the nodes and replication process. It requires SSH connections between the nodes to complete these tasks.
The other repmgr component is the repmgrd daemon. It actively monitors all nodes and detects any failures and helps coordinate a switchover to a new primary. If the primary fails, repmgr attempts to reconnect to it. If this does not succeed, it performs a failover and promotes one of the standby servers. It fences off the failed primary in case it unexpectedly comes online again. This helps avoid a contentious split brain scenario where multiple nodes believe they are the primary. Additionally, the daemon transmits relevant notifications and alerts.
Repmgr also supports manual transitions to one of the standby nodes. It can incorporate an independent witness server to assist with primary elections and manage split network scenarios. Repmgr permits a cascading configuration, allowing one or more replicas to receive updates from an upstream replica. A notable feature is the dry run option. It allows users to preview the results of certain commands.
A drawback of repmgr is that it cannot recover resources or restore the state of a cluster to its original condition. Restarting a failed node usually requires manual intervention. It is also unable to detect many common misconfigurations and might think a misconfigured node is a viable standby.
Before You Begin
If you have not already done so, create a Linode account and Compute Instance. See our Getting Started with the Akamai cloud computing platform and Creating a Compute Instance guides.
Follow our Setting Up and Securing a Compute Instance guide to update your system. You may also wish to set the timezone, configure your hostname, create a limited user account, and harden SSH access.
This guide requires at least two compute instances. The examples here only require Shared CPU instances with 4GB of RAM, to accommodate larger data sets, use High Memory instances. One system must be designated as the primary node and the other as a standby or backup node. Additional standby systems can be added depending upon business requirements. All servers within the same HA cluster must use the same release of the same Linux distribution. The steps that follow are geared towards Ubuntu 22.04 LTS users, but are generally applicable for earlier releases and other Linux distributions.
sudo. If you are not familiar with the sudo command, see the
Users and Groups guide.An Overview of the PostgreSQL and repmgr HA Solution
This guide covers a full manual installation of PostgreSQL and repmgr. It also includes instructions on how to register the nodes and perform a switchover.
For simplicity, this guide uses a two-node HA cluster. For a setup with multiple standby servers, repeat the standby configuration on any additional standby nodes. Update the standby IP address with the local IP address as required.
Throughout this guide, execute some commands as the postgres user and others with a user account having sudo privileges. However, the postgres account does not have sudo rights by default. There are two approaches to dealing with this situation:
- Grant the
postgresaccount sudo access. Run the commandadduser postgres sudoas therootuser. However, this extends additional powers to thepostgresuser. This might not be desirable, especially from a security standpoint. - The preferred approach is to restrict the
postgresaccount to PostgreSQL and repmgr configuration. Runsystemctlandaptcommands from a user account withsudoprivileges. This can involve switching back and forth in one terminal, but it is usually easier to use two terminals. This guide indicates when to use thepostgresaccount or the user account withsudoprivileges.
The complete list of steps required to provision PostgreSQL and repmgr follows this sequence:
- Install PostgreSQL on both nodes.
- Access and secure PostgreSQL on both nodes.
- Install repmgr on both nodes.
- Enable SSH connectivity between the nodes.
- Create a repmgr user on the primary node.
- Configure the database replication settings in the PostgreSQL configuration file on the primary.
- Configure the PostgreSQL authentication settings on the primary.
- Configure the repmgr HA cluster settings on both nodes.
- Register the primary server.
- Clone and register the standby server.
- Activate the
repmgrddaemon.
After the configuration process, it is important to verify the cluster status. Add some data to the primary database instance and ensure it is replicated to the standby. It is also possible to test a failover process, but the failed primary must be recovered manually afterwards.
How to Install PostgreSQL and repmgr
How to Install PostgreSQL
PostgreSQL can be installed using a variety of methods, but the easiest approach is to use apt to install the PostgreSQL package. For more information on installing and configuring PostgreSQL, including instructions on using the database, see the
How to install PostgreSQL guide. The PostgreSQL downloads page has information on other install options, including how to build PostgreSQL from source code.
To install PostgreSQL, execute the instructions in this section on both the primary and standby nodes.
Ensure the server is up to date. Use the
apt updatecommand to install any updates. Reboot the server if necessary.All Nodes as sudo Usersudo apt update -y && sudo apt upgrade -yAdd the PostgreSQL repository to the list of packages:
All Nodes as sudo Usersudo sh -c 'echo "deb https://apt.postgresql.org/pub/repos/apt $(lsb_release -cs)-pgdg main" > /etc/apt/sources.list.d/pgdg.list'Import the signing key for the repository:
All Nodes as sudo Userwget --quiet -O - https://www.postgresql.org/media/keys/ACCC4CF8.asc | sudo apt-key add -Update the list of
aptpackages:All Nodes as sudo Usersudo apt updateInstall the latest release of PostgreSQL:
All Nodes as sudo Usersudo apt -y install postgresqlNote To install a specific release of PostgreSQL, add a dash-and the release number to the package. For example, to install PostgreSQL release 11, use thepostgresql-11package.Optional: To install additional extensions that are not yet part of the official PostgreSQL, run the following command:
All Nodes as sudo Usersudo apt -y install postgresql-contribPostgreSQL begins running immediately after installation. Verify the status of PostgreSQL is
activeusing thesystemctl statuscommand:All Nodes as sudo Usersystemctl status postgresql● postgresql.service - PostgreSQL RDBMS Loaded: loaded (/lib/systemd/system/postgresql.service; enabled; vendor preset: enabled) Active: active (exited) since Mon 2023-11-20 10:38:10 EST; 1h 29min ago Main PID: 3516 (code=exited, status=0/SUCCESS) CPU: 1msNote Theactive (exited)status does not necessarily indicate a problem. Thepostgresqlservice is an umbrella service for several subprocesses. Ifactiveis displayed somewhere in the status, everything is running normally. To see the status of all subprocesses, use the commandsudo systemctl status 'postgresql*'.Press the Q key to exit the
systemctl statusoutput and return to the terminal prompt.To automatically launch PostgreSQL at system boot time,
enablethe service:All Nodes as sudo Usersudo systemctl enable postgresql
How to Access and Secure the PostgreSQL Instance
PostgreSQL creates a default postgres user account at installation time. This account does not have Linux or database passwords. However, the account has full administration rights for PostgreSQL, so it is vitally important to secure it. This section explains how to access PostgreSQL and enhance security.
Change the password for the
postgresLinux account:All Nodes as sudo Usersudo passwd postgresChoose a unique strong password and store it in a secure location.
passwd: password updated successfullySwitch to the
postgresuser to access PostgreSQL:All Nodes as sudo Usersu - postgresTo ensure the database is installed correctly and confirm the release number, run the following command:
All Nodes as postgres Userpsql -c "SELECT version();"PostgreSQL 16.1 (Ubuntu 16.1-1.pgdg22.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0, 64-bitPress the Q key to close the output and return to the main
psqlprompt.Add a password for the
postgresdatabase account using the following command. ReplaceEXAMPLE_PASSWORDwith a more secure password.All Nodes as postgres Userpsql -c "ALTER USER postgres WITH PASSWORD 'EXAMPLE_PASSWORD'"ALTER ROLENote This password is only necessary when logging in remotely or over a network. Thepostgresuser can always log in locally without a password. This grants administrative access for maintenance tasks and cron jobs.Log in to PostgreSQL to confirm the database is accessible. This command uses the
postgresdatabase, which was created at installation time:All Nodes as postgres Userpsql postgresPostgreSQL displays some information about the database along with the
postgres=#prompt:psql (16.1 (Ubuntu 16.1-1.pgdg22.04+1)) Type "help" for help. postgres=#Type
quitorexitto exit the PostgreSQL shell and return the Linux terminal prompt.Type
exitagain to log out as thepostgresuser and return to your Linux user account withsudoaccess.Optional: By default, local users can log into PostgreSQL without a password using
peerauthentication. For multi-user environments, this can create a security risk. To enforce password authentication for local users (other than thepostgresaccount) edit thepg_hba.conffile. Change theMETHODattribute forlocalaccounts frompeertomd5. See the How to install PostgreSQL guide for detailed instructions.
How to Install repmgr
The repmgr software package must be added to all nodes. To install repmgr, follow these steps.
Install the EDB repository, including the repmgr package:
All Nodes as sudo Usercurl https://dl.enterprisedb.com/default/release/get/deb | sudo bashYou may have to enter your password for the curl command to complete.
Install the same repmgr release as the PostgreSQL release being used. For example, if PostgreSQL release 16 is installed, use
aptto installpostgresql-16-repmgr:All Nodes as sudo Usersudo apt-get install postgresql-16-repmgr
How to Configure a PostgreSQL HA Cluster
PostgreSQL and repmgr rely on a series of configuration files to set up and manage redundancy. Repmgr uses SSH to transfer files, so passwordless SSH must be configured in both directions. To configure the HA cluster, follow these steps.
How to Configure SSH
The primary must be able to connect to all standby nodes through SSH to properly clone the replication settings. To complete this configuration, create a key on the primary and share it with all standbys. Then repeat these steps in the other direction: generate a key on the standby and copy it to the primary. To configure SSH, follow these steps.
On the primary node, switch to the
postgresuser:Primary Node as sudo Usersu - postgresGenerate an SSH key:
Primary Node as postgres Userssh-keygen -t rsa -b 4096When prompted, enter the file to store the key in. To store the key in the default location
~/.ssh/id_rsa, hit ENTER. For passwordless SSH, do not provide a password. The SSH utility saves the key to the selected location. Take note of the directory and filename for the key.Note Passwordless SSH is less secure, but using a password can cause problems with automated repmgr processes.Generating public/private rsa key pair. Enter file in which to save the key (/var/lib/postgresql/.ssh/id_rsa): Enter passphrase (empty for no passphrase): ... Your identification has been saved in ~/.ssh/id_rsa Your public key has been saved in ~/.ssh/id_rsa.pubCopy the key to each standby node in the HA cluster. In the following example, replace
STANDBY_IPwith the IP node of the standby node:Primary Node as postgres Userssh-copy-id postgres@STANDBY_IPSSH can now be used to access a standby node without a password from the primary. The following command, when run from the
postgresaccount on the primary, places the user in thepostgresuser directory on the standby:Primary Node as postgres Userssh STANDBY_IPWhen done, type
exitto log out of the secondary server.Repeat these steps to create a key on each standby node share it with the primary:
Standby Node as sudo Usersu - postgresStandby Node as postgres Userssh-keygen -t rsa -b 4096Standby Node as postgres Userssh-copy-id postgres@PRIMARY_IP
How to Create the repmgr User
To allow repmgr to manage PostgreSQL data replication, create a repmgr user on the primary server. Then create a new database for the repmgr data. The commands in this section must only be executed on the primary server. Do not create any database entries on the standby because this interferes with replication.
While logged in as the
postgresaccount, create therepmgruser:Primary Node as postgres Usercreateuser -s repmgrNow create the
repmgrdatabase, with therepmgruser as the owner:Primary Node as postgres Usercreatedb repmgr -O repmgr
How to Configure the PostgreSQL Replication Settings
To configure the replication settings in postgresql.conf, follow these steps. This file must only be changed on the primary node. Repmgr copies it to the standby nodes in a later configuration stage.
Edit the
/etc/postgresql/16/main/postgresql.conffile as thepostgresuser:Primary Node as postgres Usernano /etc/postgresql/16/main/postgresql.confChange the settings in the following file sample to adjust the replication settings. In some cases, these lines only have to be uncommented. These lines are found in non-contiguous locations in the file. Use the text editor search utility (CTRL+W in
nano)to find them.- File: /etc/postgresql/16/main/postgresql.conf
1 2 3 4 5 6 7 8shared_preload_libraries = 'repmgr' wal_level = replica archive_mode = on archive_command = '/bin/true' max_wal_senders = 10 max_replication_slots = 10 hot_standby = on listen_addresses = '*'
Note It is easiest to setlisten_addressesto*. However, in some networks this might pose additional security concerns. For extra security, set this value to a comma-separated list consisting oflocalhostand the IP addresses of all nodes in the HA cluster.When done, press CTRL+X, followed by Y then Enter to save the file and exit
nano.
How to Configure the PostgreSQL Client Authentication Settings
Users must also add client authentication capabilities to the pg_hba.conf file. This file tells PostgreSQL what type of connections to trust and how to authenticate them. Add entries to trust repmgr connections from both the primary and standby servers. Edit this file on the primary server only.
Open the
/etc/postgresql/16/main/pg_hba.conffile for editing as thepostgresuser:Primary Node as postgres Usernano /etc/postgresql/16/main/pg_hba.confEdit the configuration settings for the
replicationdatabase as follows. ReplacePRIMARY_IPandSTANDBY_IPwith the actual IP addresses of the primary and the standby. If there are two or more standby nodes, add a similar line for each standby.- File: /etc/postgresql/16/main/pg_hba.conf
1 2 3 4local replication repmgr trust host replication repmgr 127.0.0.1/32 trust host replication repmgr PRIMARY_IP/32 trust host replication repmgr STANDBY_IP/32 trust
Note If all nodes are on the same subnet, the separate node entries can be replaced with one entry for the entire subnet. Use the address formatNETWORK/SUBNET, for example,192.168.1.0/24. This configuration trusts the entire subnet, so ensure it is not shared with other organizations.Add new entries for the
repmgrdatabase, following the same format:- File: /etc/postgresql/16/main/pg_hba.conf
1 2 3 4local repmgr repmgr trust host repmgr repmgr 127.0.0.1/32 trust host repmgr repmgr PRIMARY_IP/32 trust host repmgr repmgr STANDBY_IP/32 trust
The entire database file should resemble this example. Ensure the section for
alldatabases is left unchanged.- File: /etc/postgresql/16/main/pg_hba.conf
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16# "local" is for Unix domain socket connections only local all all peer # IPv4 local connections: host all all 127.0.0.1/32 scram-sha-256 # IPv6 local connections: host all all ::1/128 scram-sha-256 # Allow replication connections from localhost, by a user with the # replication privilege. local replication repmgr trust host replication repmgr 127.0.0.1/32 trust host replication repmgr PRIMARY_IP/32 trust host replication repmgr STANDBY_IP/32 trust local repmgr repmgr trust host repmgr repmgr 127.0.0.1/32 trust host repmgr repmgr PRIMARY_IP/32 trust host repmgr repmgr STANDBY_IP/32 trust
When done, press CTRL+X, followed by Y then Enter to save the file and exit
nano.Restart the PostgreSQL process on the primary using an account with
sudoprivileges:Primary Node as sudo Usersudo systemctl restart postgresqlEnsure there are no error messages and PostgreSQL is still
active:Primary Node as sudo Usersystemctl status postgresqlPress the Q key to exit the
systemctl statusoutput.Ensure the primary database is accessible from the standby nodes. Run the following command on one of the standby nodes as the
postgresuser to generate apsqlconnection to therepmgrdatabase on the primary. Substitute the actual IP address of the primary forPRIMARY_IPin the command below:Standby Node as postgres Userpsql 'host=PRIMARY_IP user=repmgr dbname=repmgr connect_timeout=2'The PostgreSQL prompt should appear, indicating the
repmgrdatabase context:psql (16.1 (Ubuntu 16.1-1.pgdg22.04+1)) SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, compression: off) Type "help" for help. repmgr=#When done, type
quitorexitto leave the PostgreSQL prompt, then typeexitagain to return to the terminal shell as your standard Linux user withsudoprivileges.
How to Configure the repmgr Cluster Definition
Additional repmgr-specific configuration must be added to both nodes. The /etc/repmgr.conf file must contain the following information:
node_id: A unique numeric identifier for the node within the HA cluster.node_name: An unique string-based identifier for the node.conninfo: Connection parameters for accessing the node. Other nodes must be able to connect using this string.data_directory: The storage directory for repmgr data.failover: This must be set toautomaticto enable an automatic switchover to a standby when the primary fails.promote_command: A command to execute when promoting the node.follow_command: A command to execute when following a new primary node. This directive is optional for two-node networks.
For a complete list of all repmgr.conf settings, see the repmgr.conf documentation on GibHub.
Follow the steps below to configure the repmgr settings.
Create a
etc/repmgr.conffile on the primary node in your regular user account withsudoprivileges:Primary Node as sudo Usersudo nano /etc/repmgr.confAdd the following lines to the file, but replace
PRIMARY_IPwith the IP address of the primary node:- File: /etc/repmgr.conf
1 2 3 4 5 6 7 8node_id=1 node_name=pg1 conninfo='host=PRIMARY_IP user=repmgr dbname=repmgr connect_timeout=2' data_directory='/var/lib/postgresql/16/data' failover=automatic promote_command='repmgr -f /etc/repmgr.conf standby promote --log-to-file' follow_command='repmgr -f /etc/repmgr.conf standby follow --log-to-file' log_file='/var/log/postgresql/repmgr.log'
When done, press CTRL+X, followed by Y then Enter to save the file and exit
nano.Create an
etc/repmgr.conffile on each standby node:Standby Node as sudo Usersudo nano /etc/repmgr.confThe file should closely resemble the same file on the primary, with a few changes.
node_idandnode_namemust be unique to each node. Inconninfo, thehostvariable indicates the local IP address. ReplaceSTANDBY_IPwith the IP address of the standby node under configuration.- File: /etc/repmgr.conf
1 2 3 4 5 6 7 8node_id=2 node_name=pg2 conninfo='host=STANDBY_IP user=repmgr dbname=repmgr connect_timeout=2' data_directory='/var/lib/postgresql/16/data' failover=automatic promote_command='repmgr -f /etc/repmgr.conf standby promote --log-to-file' follow_command='repmgr -f /etc/repmgr.conf standby follow --log-to-file' log_file='/var/log/postgresql/repmgr.log'
When done, press CTRL+X, followed by Y then Enter to save the file and exit
nano.
How to Initialize and Run a PostgreSQL HA Cluster
To initialize and run the HA cluster, register the master, clone the standby servers, then register the standbys.
pg_basebackup utility to clone the server. This is the best option for most users. Users wanting additional control over the installation can configure and use Barman. However, this procedure is much more complicated. See the repmgr Barman documentation for more information.To start running the HA cluster, execute the following commands as the postgres user on the primary server only.
If necessary, login as the
postgresuser:Primary Node as sudo Usersu - postgresRegister the primary server using
repmgr. Specify the repmgr configuration file using the-foption.Primary Node as postgres Userrepmgr -f /etc/repmgr.conf primary registerThe repmgr utility confirms the primary is registered:
INFO: connecting to primary database... NOTICE: attempting to install extension "repmgr" NOTICE: "repmgr" extension successfully installed NOTICE: primary node record (ID: 1) registeredConfirm the primary is running using the
cluster showcommand.Primary Node as postgres Userrepmgr -f /etc/repmgr.conf cluster showA node with an
IDof1has the role ofprimaryand a status ofrunning.ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string ----+------+---------+-----------+----------+----------+----------+----------+---------------------------------------------------------------- 1 | pg1 | primary | * running | | default | 100 | 1 | host=192.0.0.81 user=repmgr dbname=repmgr connect_timeout=2Move to the first standby node and switch to the
postgresuser if necessary:Standby Node as sudo Usersu - postgresCreate a clone from the primary, and include the following options:
- The
hostoption-hmust specify the IP address of the primary node. ReplacePRIMARY_IPwith the actual IP address. - Set the
useroption-Utorepmgr. - Set the database option
-dtorepmgr. - The
fileoption-findicates the location of the repmgr configuration file. - Add the
--copy-external-config-filesflag to copy the PostgreSQL configuration files.
Users should ideally test this command first using the
--dry-run. This command indicates if there are any errors and provides a preview of what theclonecommand intends to do.Standby Node as postgres Userrepmgr -h PRIMARY_IP -U repmgr -d repmgr -f /etc/repmgr.conf standby clone --copy-external-config-files --dry-runThe repmgr output confirms whether or not the standby can attach to the primary node. Search for the messages
NOTICE: standby will attach to upstream nodeandINFO: all prerequisites for "standby clone" are met. It is safe to ignore the warning about data checksums because they are not used in this configuration.NOTICE: standby will attach to upstream node 1 ... INFO: all prerequisites for "standby clone" are met- The
If the dry run is successful, run the command again without the
--dry-runoption:Standby Node as postgres Userrepmgr -h PRIMARY_IP -U repmgr -d repmgr -f /etc/repmgr.conf standby clone --copy-external-config-filesNOTICE: standby clone (using pg_basebackup) complete NOTICE: you can now start your PostgreSQL serverRepeat the steps above to clone any other standby nodes.
On the standby node, make a further adjustment to
postgresql.conf:Standby Node as postgres Usernano /etc/postgresql/16/main/postgresql.confChange
data_directoryto the same directory asdata_directoryin therepmgr.conffile, for example,/var/lib/postgresql/16/data. The other file changes were copied to the standby during the clone operation.- File: /etc/postgresql/16/main/postgresql.conf
1data_directory = '/var/lib/postgresql/16/data'
When done, press CTRL+X, followed by Y then Enter to save the file and exit
nano. Typeexitto log out of thepostgresuser and return to the terminal shell as your standard Linux user withsudoprivileges.Restart PostgreSQL on all standby nodes:
Standby Node as sudo Usersudo systemctl restart postgresqlVerify that PostgreSQL has a
statusofactive:Standby Node as sudo Usersudo systemctl status postgresqlPress Q to close the output and return to the terminal shell.
After PostgreSQL is restarted on the standby, it contacts the primary for the database contents. To confirm replication is active, access the PostgreSQL shell on the primary node as the
postgresuser:Primary Node as postgres UserpsqlRun the following command to list the standby nodes that have contacted the primary:
Primary Node as postgres User in PostgreSQL ShellSELECT * FROM pg_stat_replication;Each active standby node should have its own entry. Scan for the following details in the output:
application_nameshould contain thenode_nameof the standby.- The
client_addrshould indicate the IP address of the standby node. - The
stateshould bestreaming. - The
sync_stateisasync.
pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | backend_xmin | state | sent_lsn | write_lsn | flush_lsn | replay_lsn | write_lag | flush_lag | replay_lag | sync_priority | sync_state | reply_time -------+----------+---------+------------------+-----------------+-----------------+-------------+-------------------------------+--------------+-----------+-----------+-----------+-----------+------------+-----------+-----------+------------+---------------+------------+------------------------------- 41968 | 16388 | repmgr | pg2 | 192.0.0.82 | | 36820 | 2023-10-16 21:32:21.955465+00 | | streaming | 0/30002D8 | 0/30002D8 | 0/30002D8 | 0/30002D8 | | | | 0 | async | 2023-10-16 21:33:21.994021+00Although updates are being sent to the standby, it is still not registered with repmgr. Until it is registered, it is not able to take over from the primary.
Note If replication is not working, access thepsqlshell on the standby and run the commandSELECT * FROM pg_stat_wal_receiver;. This command is less intuitive, but thestatuscolumn should bestreamingandsender_hostshould be the IP address of the primary. It also displays the timestamp of the most recent update.When done, type
quitorexitto leave thepsqlshell and return to the Linux terminal prompt as thepostgresuser.Return to the standby server and log in as the
postgresaccount if not already:Standby Node as sudo Usersu - postgresRegister the node as a standby with the
repmgr standby registercommand, specifying the location of the repmgr configuration file using the-foption.Standby Node as postgres Userrepmgr -f /etc/repmgr.conf standby registerINFO: standby registration complete NOTICE: standby node "pg2" (ID: 2) successfully registeredRepeat this operation on each standby node.
Run the
cluster showcommand on the standby to confirm it is registered:Standby Node as postgres Userrepmgr -f /etc/repmgr.conf cluster showID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string ----+------+---------+-----------+----------+----------+----------+----------+------------------------------------------------------------------ 1 | pg1 | primary | * running | | default | 100 | 1 | host=192.0.0.81 user=repmgr dbname=repmgr connect_timeout=2 2 | pg2 | standby | running | pg1 | default | 100 | 1 | host=192.0.0.82 user=repmgr dbname=repmgr connect_timeout=2Register all nodes, including the primary and all standby nodes, with
repmgrd. The daemon monitors the node and quickly responds to any failures.All Nodes as postgres Userrepmgrd -f /etc/repmgr.conf -dThe primary should display messages similar to the following ones:
[2023-10-17 17:06:23] [NOTICE] starting monitoring of node "pg1" (ID: 1) [2023-10-17 17:06:23] [NOTICE] monitoring cluster primary "pg1" (ID: 1) [2023-10-17 17:06:23] [INFO] child node "pg2" (ID: 2) is attachedThe standby should display a slightly different set of messages:
[2023-10-17 17:08:04] [NOTICE] starting monitoring of node "pg2" (ID: 2) [2023-10-17 17:08:04] [INFO] monitoring connection to upstream node "pg1" (ID: 1)
Testing a Failover Event
To verify the solution is working, first confirm the data is being replicated. If the output of the various debug commands in the previous section is correct, replication is probably working. Then shut down PostgreSQL on the active. After a timeout, one of the standby nodes should become the new primary. Users can then write data to it.
To confirm the HA cluster is working, follow these steps.
On the primary node, access the
postgresdatabase as thepostgresdatabase user:Primary Node as postgres Userpsql postgresCreate a new
customerstable inside thepostgresdatabase:Primary Node as postgres User in PostgreSQL ShellCREATE TABLE customers (customer_id int, first_name varchar(80), last_name varchar(80));Note For an explanation of the most commonpsqlcommands, see the How to Install and Use PostgreSQL guide.Ensure the table is successfully created:
Primary Node as postgres User in PostgreSQL Shell\dtThe new table appears in the output:
List of relations Schema | Name | Type | Owner --------+-----------+-------+---------- public | customers | table | postgres (1 row)When done, type
quitorexitto leave thepsqlshell and return to the Linux terminal prompt as thepostgresuser. Typeexitto log out of thepostgresuser and return to the terminal shell as your standard Linux user withsudoprivileges.Access PostgreSQL on the standby node:
Standby Node as postgres Userpsql postgresList the tables inside the
postgresdatabase:Standby Node as postgres User in PostgreSQL Shell\dtThe same table appears in the output. The update on the primary is replicated to this node.
List of relations Schema | Name | Type | Owner --------+-----------+-------+---------- public | customers | table | postgresWhen done, type
quitorexitto leave thepsqlshell and return to the Linux terminal prompt as thepostgresuser.To initiate a switchover, stop the
postgresqlprocess on the primary node. Run this command using an account withsudoprivileges:Primary Node as sudo Usersudo systemctl stop postgresqlThe standby node makes a few attempts to reconnect to the primary. After this fails, it takes over as the new primary node. A similar list of trace messages should appear in the console of the standby node.
Note If there is more than one standby node, the nodes hold a primary election to determine the new primary.[2023-10-17 17:13:10] [WARNING] unable to ping "host=192.0.0.81 user=repmgr dbname=repmgr connect_timeout=2" [2023-10-17 17:13:10] [DETAIL] PQping() returned "PQPING_NO_RESPONSE" [2023-10-17 17:13:10] [WARNING] unable to connect to upstream node "pg1" (ID: 1) [2023-10-17 17:13:10] [INFO] checking state of node "pg1" (ID: 1), 1 of 6 attempts ... 2023-10-17 17:14:00] [WARNING] unable to reconnect to node "pg1" (ID: 1) after 6 attempts [2023-10-17 17:14:00] [INFO] 0 active sibling nodes registered [2023-10-17 17:14:00] [INFO] 2 total nodes registered [2023-10-17 17:14:00] [INFO] primary node "pg1" (ID: 1) and this node have the same location ("default") [2023-10-17 17:14:00] [INFO] no other sibling nodes - we win by default [2023-10-17 17:14:00] [NOTICE] this node is the only available candidate and will now promote itself ... [2023-10-17 17:14:00] [NOTICE] promoting standby to primary [2023-10-17 17:14:00] [DETAIL] promoting server "pg2" (ID: 2) using pg_promote() [2023-10-17 17:14:00] [NOTICE] waiting up to 60 seconds (parameter "promote_check_timeout") for promotion to complete [2023-10-17 17:14:01] [NOTICE] STANDBY PROMOTE successful [2023-10-17 17:14:01] [DETAIL] server "pg2" (ID: 2) was successfully promoted to primary ... [2023-10-17 17:14:01] [INFO] original connection is still available [2023-10-17 17:14:01] [INFO] 0 followers to notify [2023-10-17 17:14:01] [INFO] switching to primary monitoring mode [2023-10-17 17:14:01] [NOTICE] monitoring cluster primary "pg2" (ID: 2)Use the
cluster showcommand to indicate the previous standbypg2is now the primary. The old primary now has a status offailed.Standby Node as postgres Userrepmgr -f /etc/repmgr.conf cluster showThe new table appears in the output:
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string ----+------+---------+-----------+----------+----------+----------+----------+------------------------------------------------------------------ 1 | pg1 | primary | - failed | ? | default | 100 | | host=192.0.0.81 user=repmgr dbname=repmgr connect_timeout=2 2 | pg2 | primary | * running | | default | 100 | 2 | host=92.0.0.82 user=repmgr dbname=repmgr connect_timeout=2Ensure the new primary allows write operations. Create a new table in the
postgresdatabase to verify this:Standby Node as postgres UserpsqlStandby Node as postgres User in PostgreSQL ShellCREATE TABLE customers2 (customer_id int, first_name varchar(80), last_name varchar(80));Standby Node as postgres User in PostgreSQL Shell\dtThe new
customers2table is listed alongside the previous table:List of relations Schema | Name | Type | Owner --------+------------+-------+---------- public | customers | table | postgres public | customers2 | table | postgres (2 rows)
Conclusion
To add high availability capabilities to PostgreSQL, use the repmgr replication manager. In this architecture, a master primary node replicates all changes to one or more standby nodes. The primary handles all write operations, while read operations are shared between all nodes in the HA cluster. To implement this solution, install PostgreSQL and repmgr, then edit several configuration files. Register the primary node and clone and register the standby nodes. For more information, see the repmgr documentation.
More Information
You may wish to consult the following resources for additional information on this topic. While these are provided in the hope that they will be useful, please note that we cannot vouch for the accuracy or timeliness of externally hosted materials.
This page was originally published on