Setup Galera Cluster for MySQL/MariaDB on Ubuntu/Debian and CentOS/RHEL

install and configure a Two Node Galera Cluster for MySQL

Introduction

Galera Cluster offers a robust solution for establishing high availability MySQL / MariaDB database clusters without compromising data integrity. By replicating data across nodes in real-time, Galera ensures continuous operation, even when individual nodes experience failures. This makes it a critical component for applications demanding high uptime and data consistency.

In this comprehensive tutorial, we will guide you through the process of configuring a two-node Galera cluster on both Ubuntu/Debian and CentOS/RHEL Linux distributions. We will cover all necessary steps, from installation to testing and monitoring, enabling you to build a resilient database infrastructure. Let’s explore how to setup Galera Cluster.

Prerequisites

Before embarking on the installation and configuration process, please ensure that you meet the following prerequisites:

Two or more servers (virtual or physical) running either Ubuntu/Debian or CentOS/RHEL.
Root access or sudo privileges on all servers.
Basic familiarity with Linux command-line operations.
Network connectivity between all servers in the cluster.

We will be executing most commands as the root user or with sudo privileges.

We begin by installing the Galera Cluster software and other utilities needed for proper functioning of the cluster.

On Ubuntu/Debian

Update apt repository indexes and install the required packages:

$ apt update
$ apt install galera-4 mariadb-server socat python3-mysql.connector

This installs:

galera-4: The Galera Cluster software.
mariadb-server: The MariaDB server package (or MySQL if you prefer).
socat: A utility for establishing bidirectional data transfer between two endpoints. Used for SST (State Snapshot Transfer).
python3-mysql.connector: A Python connector for MySQL.

On CentOS/RHEL

Enable EPEL repos which provide the Galera packages:

$ yum install epel-release

Now install Galera and dependencies:

$ yum install galera mariadb-server rsync socat mysql-connector-python

This installs:

galera: The Galera Cluster software.
mariadb-server: The MariaDB server package (or MySQL if you prefer).
rsync: A fast, versatile, remote (and local) file-copying tool. Used for SST.
socat: A utility for establishing bidirectional data transfer between two endpoints. Used for SST (State Snapshot Transfer).
mysql-connector-python: A Python connector for MySQL.

Galera Cluster is now ready to be configured on both nodes.

Configuring MySQL for Galera

For MySQL to use Galera, we need to configure some options in my.cnf.

Open the config file:

$ nano /etc/my.cnf

Add the following under [mysqld]:

binlog_format=ROW
default-storage-engine=innodb
innodb_autoinc_lock_mode=2
bind-address=0.0.0.0
# Galera settings
wsrep_provider=/usr/lib64/galera-4/libgalera_smm.so
wsrep_cluster_name='my_galera_cluster'
wsrep_cluster_address="gcomm://node1,node2"
# This Only on Node 1
wsrep_node_address='192.168.1.101'
wsrep_node_name='node1'
# This Only on Node 2
wsrep_node_address='192.168.1.102'
wsrep_node_name='node2'
wsrep_sst_method=rsync

The wsrep_cluster_address contains a list of cluster node IPs.

wsrep_node_address and wsrep_node_name must be unique on every server.

Save and close the file after making changes.

Do this on both servers, replacing the IPs and names as per your servers.

This configures MySQL to use the Galera plugin for replication.

Starting the Galera Cluster

With configuration in place, we are ready to bootstrap the cluster.

Start MySQL only on the first node (node1):

$ systemctl start mysql
# or
$ systemctl start mariadb

This will initialize the Galera cluster.

Check MySQL status and wsrep variables:

$ mysql -u root -e "SHOW STATUS LIKE '%wsrep%';"

Example output:

+------------------------------+-------------------------------------------------+
| Variable_name                | Value                                           |
+------------------------------+-------------------------------------------------+
| wsrep_local_state_uuid       | af2a75b4-9e1c-11ed-9838-be4b133a6b15            |
| wsrep_protocol_version       | 7                                               |
| wsrep_last_committed         | 0                                               |
| wsrep_replicated             | 0                                               |
| wsrep_replicated_bytes       | 0                                               |
| wsrep_repl_keys              | 0                                               |
| wsrep_repl_keys_bytes        | 0                                               |
| wsrep_repl_data_bytes        | 0                                               |
| wsrep_repl_other_bytes       | 0                                               |
| wsrep_received               | 1                                               |
| wsrep_received_bytes         | 119                                             |
| wsrep_local_commits          | 0                                               |
| wsrep_local_cert_failures    | 0                                               |
| wsrep_local_replays          | 0                                               |
| wsrep_local_send_queue       | 0                                               |
| wsrep_local_send_queue_avg   | 0.000000                                        |
| wsrep_local_recv_queue       | 0                                               |
| wsrep_local_recv_queue_avg   | 0.000000                                        |
| wsrep_local_cached_downto    | 18446744073709551615                            |
| wsrep_flow_control_paused_ns | 0                                               |
| wsrep_flow_control_sent      | 0                                               |
| wsrep_flow_control_recv      | 0                                               |
+------------------------------+-------------------------------------------------+

This confirms Galera is operational. Note the local cluster size is 1 right now.

Now start MySQL on the second node to join it to the cluster:

$ systemctl start mysql
# or
$ systemctl start mariadb

Verify it joined successfully:

$ mysql -u root -e "SHOW STATUS LIKE '%wsrep%';"

We should now see the cluster size as 2:

| wsrep_cluster_size |2|

Additionally, run mysql -e "SHOW STATUS LIKE '%wsrep%';" on node1 again and the status variables should sync up between both nodes.

Our two node Galera cluster is ready!

Testing Cluster Operation

Let’s test replication between the two nodes works as expected.

On node1, create a test database and insert some data:

mysql> CREATE DATABASE cluster_test;
mysql> USE cluster_test;
mysql> CREATE TABLE test (id INT, message VARCHAR(20));
mysql> INSERT INTO test VALUES (1, 'Hello Galera');
mysql> SELECT * FROM test;
+------+----------------+
| id   | message        |
+------+----------------+
|    1 | Hello Galera   |
+------+----------------+

Now check the same table contents from node2:

mysql> USE cluster_test;
mysql> SELECT * FROM test;
+------+----------------+
| id   | message        |
+------+----------------+
|    1 | Hello Galera   |
+------+----------------+

The row replicated from node1 to node2 as expected.

Let’s also test shutdown recovery. Stop MySQL on node1:

$ systemctl stop mysql
# or
$ systemctl stop mariadb

On node2, connect to MySQL and verify queries still work:

mysql> USE cluster_test;
mysql> SELECT * FROM test;
+------+----------------+
| id   | message        |
+------+----------------+
|    1 | Hello Galera   |
+------+----------------+

Node2 remains operational despite node1 shutdown since all data is replicated.

Bring node1 back up:

$ systemctl start mysql
# or
$ systemctl start mariadb

It will sync again with node2 automatically. Run SHOW STATUS LIKE '%wsrep%'; on both nodes to confirm values match.

This demonstrates the high availability provided by Galera!

Cluster Monitoring & Management

Now that we have a working Galera cluster, let’s look at some tips for monitoring and managing it.

Checking Cluster Status

Use the garbd daemon to check high level cluster stats:

$ garbd -a gcomm://192.168.1.101,192.168.1.102 -g my_galera_cluster
Galera cluster Node 1/2 info:
  evs::protover => 7
  evs::uuid     => af2a75b4-9e1c-11ed-9838-be4b133a6b15
  evs::status   => Primary
  evs::state    => Synced
Galera cluster Node 2/2 info:
  evs::protover => 7
  evs::uuid     => af2a75b4-9e1c-11ed-9838-be4b133a6b15
  evs::status   => Primary
  evs::state    => Synced

This shows both nodes are in Synced state and part of the same cluster.

Monitoring Node Status

Use mysqladmin to check Galera variables on each node:

$ mysqladmin -uroot -p -h192.168.1.101 variables | grep wsrep
| wsrep_cluster_conf_id | 25 |
| wsrep_cluster_size | 2 |
| wsrep_cluster_state_uuid | af2a75b4-9e1c-11ed-9838-be4b133a6b15 |
| wsrep_cluster_status | Primary |
$ mysqladmin -uroot -p -h192.168.1.102 variables | grep wsrep
| wsrep_cluster_conf_id | 25 |
| wsrep_cluster_size | 2 |
| wsrep_cluster_state_uuid | af2a75b4-9e1c-11ed-9838-be4b133a6b15 |
| wsrep_cluster_status | Primary |

Values like cluster size, UUID, status should match on all nodes.

Checking SST Transfer Status

When a new node joins, State Snapshot Transfer (SST) is used to sync data to it.

Monitor SST progress with:

$ mysql -e "SHOW STATUS LIKE 'wsrep_local_state_uuid'"
+----------------------------------+--------------------------------------+
| Variable_name                    | Value                                |
+----------------------------------+--------------------------------------+
| wsrep_local_state_uuid           | af2a75b4-9e1c-11ed-9838-be4b133a6b15 |
+----------------------------------+--------------------------------------+
$ mysql -e "SHOW STATUS LIKE 'wsrep_local_state_comment'"
+-----------------------------------+---------+
| Variable_name                     | Value   |
+-----------------------------------+---------+
| wsrep_local_state_comment         | Synced  |
+-----------------------------------+---------+

While SST is in progress, wsrep_local_state_comment will show the sync percentage.

Checking Recovery Status

When a node rejoins after disconnect, status can be checked with:

mysql>SHOW STATUS WHERE `variable_name` LIKE'wsrep_%';

Look for wsrep_local_state_comment like Recovering or Donor/Desynced during recovery.

This way various stages of cluster syncs and recovery can be tracked.

Monitoring Cluster Size

To check the number of nodes in the cluster:

mysql> SHOW STATUS LIKE 'wsrep_cluster_size';
+--------------------+-------+
| Variable_name      | Value |
+--------------------+-------+
| wsrep_cluster_size | 2     |
+--------------------+-------+

Matching the expected number of nodes.

We can also use the clustercheck script to monitor cluster size:

$ clustercheck
Cluster is CORRECT (2 nodes)
$

This will warn if nodes are missing or extra.

Checking Node Consistency

The cluster status should be verified to ensure all nodes contain the same data.

Comparing wsrep_local_state_uuid variable between nodes indicates consistency:

$ mysql -e "SHOW STATUS LIKE 'wsrep_local_state_uuid'G" -h192.168.1.101
*************************** 1. row ***************************
Variable_name: wsrep_local_state_uuid
Value: af2a75b4-9e1c-11ed-9838-be4b133a6b15
$ mysql -e "SHOW STATUS LIKE 'wsrep_local_state_uuid'G" -h192.168.1.102
*************************** 1. row ***************************
Variable_name: wsrep_local_state_uuid
Value: af2a75b4-9e1c-11ed-9838-be4b133a6b15

If the UUID matches on nodes, data is consistent.

Checking Connection Status

Use socat to check TCP connection status between nodes:

$ socat - TCP:192.168.1.101:4567
>
$ socat - TCP:192.168.1.102:4567
>

This confirms TCP port 4567 is open between nodes for Galera Cluster replication traffic.

We can also use MySQL to check connection status:

mysql> SHOW STATUS WHERE `variable_name` LIKE '%connection%';
+-----------------------------+-------+
| Variable_name               | Value |
+-----------------------------+-------+
| Slave_connections           | 0     |
| Max_used_connections        | 2     |
| Aborted_connects            | 0     |
| Max_used_connections        | 2     |
+-----------------------------+-------+

Monitor open connection count to detect issues.

This provides insight into overall cluster health and connectivity.

Tracking Node History

The node history can be useful when troubleshooting or analyzing events:

mysql> SHOW STATUS LIKE 'wsrep_local_state_uuid%';
+--------------------------------+--------------------------------------+
| Variable_name                  | Value                                |
+--------------------------------+--------------------------------------+
| wsrep_local_state_uuid         | af2a75b4-9e1c-11ed-9838-be4b133a6b15 |
| wsrep_local_state_uuid_history | af2a75b4-9e1c-11ed-9838-be4b133a6b15 |
+--------------------------------+--------------------------------------+

Any past cluster UUIDs will be appended to wsrep_local_state_uuid_history on events like recoveries.

Similarly, the number of cluster membership changes is tracked by:

mysql> SHOW STATUS LIKE 'wsrep_cluster_size_change%';
+-------------------------------+-------+
| Variable_name                 | Value |
+-------------------------------+-------+
| wsrep_cluster_size_changes    | 1     |
| wsrep_cluster_size_change_history | 1 |
+-------------------------------+-------+

This provides insight into cluster activity over time.

Using these status variables and commands, the Galera cluster can be monitored for proper operation. Issues like node disconnects, replication lag, or consistency loss can be rapidly detected and debugged as well. Setup Galera Cluster is not difficult after you get the idea.

Configuring Galera Arbitrator (optional)

For a two node cluster, we should setup a Galera Arbitrator to avoid split brain scenarios. The arbitrator is a lightweight process that provides a quorum for the cluster to determine which node should continue operating in case of network splits.

On a third server, install just the galera-4 or galera package.

Modify /etc/my.cnf with:

[mysqld]
wsrep_provider=/usr/lib64/galera-4/libgalera_smm.so
[galera]
wsrep_cluster_address="gcomm://192.168.1.101,192.168.1.102"
wsrep_cluster_name='my_galera_cluster'

Start the arbitrator:

$ galera_arbitrator

Check logs at /var/log/mysql/galera.log to ensure it connected to the cluster successfully.

The arbitrator will now participate in quorum calculations and provide automatic failover in split brain scenarios. This prevents data loss in the event of network partitions.

Conclusion

In this detailed guide, we covered the steps to install, configure and monitor a two node Galera cluster on Ubuntu/Debian and CentOS/RHEL distributions step-by-step with practical examples. Setup Galera Cluster can be smooth if follow the guide.

Key takeaways include:

Installing the necessary Galera, MySQL/MariaDB, and utility packages on your chosen Linux distribution.
Configuring MySQL/MariaDB to utilize the Galera plugin for synchronous replication.
Bootstrapping the cluster by starting MySQL/MariaDB on the first node.
Adding subsequent nodes to the cluster.
Testing the cluster’s replication capabilities and failover behavior.
Monitoring the cluster’s health and status using command-line tools.
(Optional) Configuring a Galera Arbitrator for split-brain protection.

A two node Galera cluster serves well for reducing downtime and providing redundancy for many applications. Additional nodes can be introduced seamlessly later if needed.

Using this tutorial as a guide, you can now deploy highly available MySQL clusters with Galera on Ubuntu/Debian and CentOS/RHEL.

Alternative Solutions for High Availability MySQL/MariaDB

While Galera Cluster offers a robust solution for achieving high availability, it’s not the only option available. Here are two alternative approaches:

1. MySQL Replication with Orchestrator

MySQL’s built-in replication capabilities, combined with a tool like Orchestrator, can provide a high-availability setup. Orchestrator automates the failover process and ensures that a new primary node is promoted quickly in case of a failure.

Explanation:

Asynchronous Replication: MySQL replication is typically asynchronous, meaning that changes are not immediately propagated to the secondary nodes. This can lead to data loss in the event of a primary node failure, although semi-synchronous replication can minimize this risk.
Orchestrator: Orchestrator monitors the health of the MySQL servers and automatically performs failover operations. It detects failures, promotes a new primary node, and reconfigures the replication topology.
Advantages: Simpler to set up than Galera Cluster, utilizes standard MySQL features, and provides automatic failover.
Disadvantages: Potential for data loss due to asynchronous replication, requires careful configuration and monitoring of Orchestrator.

Example Configuration (Conceptual):

Configure MySQL Replication: Set up a primary-secondary replication topology between the MySQL servers. Enable binary logging on the primary server.
Install and Configure Orchestrator: Download and install Orchestrator on a separate server. Configure it to monitor the MySQL servers and define failover policies.
Testing: Simulate a primary server failure and verify that Orchestrator correctly promotes a secondary server to become the new primary.

New Code Example (Orchestrator Configuration Snippet – Example):

This is a conceptual example and depends on the Orchestrator’s specific configuration format.

{
  "MySQLHosts": [
    {"Host": "192.168.1.101", "Port": 3306, "Role": "primary"},
    {"Host": "192.168.1.102", "Port": 3306, "Role": "secondary"}
  ],
  "FailureDetectionPeriodSeconds": 5,
  "AutoFailover": true,
  "PromotionRules": {
    "PreferSameDatacenter": true
  }
}

This example shows configuring Orchestrator to monitor two MySQL hosts and automatically failover to the secondary node if the primary fails.

2. Cloud-Based Managed Database Services

Cloud providers like AWS (RDS, Aurora), Google Cloud (Cloud SQL), and Azure (Azure Database for MySQL) offer managed database services with built-in high availability features.

Explanation:

Managed Services: These services handle the complexities of database administration, including backups, patching, and failover.
Automatic Failover: Cloud providers typically offer automatic failover to a standby instance in case of a primary instance failure.
Advantages: Reduced operational overhead, simplified management, and built-in high availability.
Disadvantages: Vendor lock-in, potential cost considerations, and limited control over the underlying infrastructure.

Example Configuration (Conceptual):

Choose a Cloud Provider: Select a cloud provider that offers a managed MySQL or MariaDB service.
Create a Database Instance: Create a new database instance and enable high availability features (e.g., multi-AZ deployment in AWS RDS).
Configure Connection: Configure your application to connect to the database instance using the provided connection string.
Testing: Simulate a primary instance failure (e.g., by rebooting the instance) and verify that the application automatically connects to the standby instance.

New Code Example (AWS CLI Command – Example):

aws rds create-db-instance 
    --db-instance-identifier my-ha-db 
    --db-instance-class db.m5.large 
    --engine mysql 
    --engine-version 8.0 
    --master-username myuser 
    --master-user-password mypassword 
    --allocated-storage 20 
    --multi-az

This command creates a new MySQL database instance in AWS RDS with the --multi-az option, enabling high availability through automatic failover to a standby instance in a different availability zone.

These alternative solutions offer different trade-offs in terms of complexity, control, and cost. The best choice depends on the specific requirements and constraints of your application and infrastructure. By understanding these alternatives, you can make an informed decision about the most suitable high availability solution for your MySQL/MariaDB databases.