MongoDB Primary Switchover While Production is Running – Activating DR to be Primary

Table of Contents

Introduction

In this blog post, we are going to share our method on performing a primary switchover of a MongoDB Sharded Cluster from a production site to a disaster recovery site. You may use this method to promote the DR site, while the production site is running and accessible, commonly to test out if DR is functional as what it should be as the backup site.

Architecture Diagram

In this setup, we are having the following topology:

All hosts are running on Ubuntu 22.04 LTS, with MongoDB 5.0.

Switchover/Failover Method

In an event where one wants to perform a controlled activation of the MongoDB nodes on the DR site (switchover – useful for hardware maintenance, rolling reboot, physical migration, etc) where the production nodes can maintain the majority nodes to be online, one can perform the following procedures:

  1. Choose one Config server node on the DR, let’s pick DC2-conf4.
  2. Choose one Shard server node on the DR, let’s pick DC2-db4.
  3. Increase the priority of the chosen config server, DC2-conf4 to a value higher than 1. This will trigger an election (around 10 seconds tops) and will promote DC2-conf4 as the new PRIMARY.
  4. Increase the priority of the chosen shard server, DC2-db4 to a value higher than 1. This will trigger an election (around 10 seconds tops) and will promote DC2-db4 as the new PRIMARY.

At this point, it is safe to bring down or perform maintenance of any nodes on the PROD site (Attention: Only bring down one node at a time to maintain quorum).

Increasing the priority of the config server on the DR site

For step #3, login to one of the config MongoDB nodes in the DR and run the following as the root user (example on DC2-conf4):

mongo -u root -p yourpassword --authenticationDatabase admin

Once inside the MongoDB terminal, run the following:

cfg = rs.conf()
cfg.members[3].priority = 5
rs.reconfig(cfg)

Where, the integer 3 (inside the members’ array) is equal to the _id value of the nodes that you want to promote as the PRIMARY, as in this case is DC2-conf4. For DC2-conf5, the member id is 4. You can see the id of corresponding nodes after you execute the first command cfg = rs.conf().

Make sure you get the following output, indicating the configuration is accepted and will be applied accordingly:

{
"ok" : 1
...
}

After around 10 seconds, a new election procedure will be triggered to promote DC2-conf4 (the DR site) as the PRIMARY. You can verify by using the following command:

printjson(rs.status().members.map( function(m) { return {'name':m.name, '_id':m._id, 'health':m.health, 'stateStr':m.stateStr} }))

Example output (pay attention on the DC2-conf4 section below):

config-replset:PRIMARY> printjson(rs.status().members.map(function(m) { return {'name':m.name, '_id':m._id, 'health':m.health, 'stateStr':m.stateStr} }))
[
        {
                "name" : "DC1-conf1:27019",
                "_id" : 0,
                "health" : 1,
                "stateStr" : "SECONDARY"
        },
        {
                "name" : "DC1-conf2:27019",
                "_id" : 1,
                "health" : 1,
                "stateStr" : "SECONDARY"
        },
        {
                "name" : "DC1-conf3:27019",
                "_id" : 2,
                "health" : 1,
                "stateStr" : "SECONDARY"
        },
        {
                "name" : "DC2-conf4:27019",
                "_id" : 3,
                "health" : 1,
                "stateStr" : "PRIMARY"
        },
        {
                "name" : "DC2-conf5:27019",
                "_id" : 4,
                "health" : 1,
                "stateStr" : "SECONDARY"
        }
]

The config server is now promoted to PRIMARY on the DR site. Proceed to the next step, promoting one of the shard servers as described below.

Increasing the priority of the shard server on the DR site

For step #4, login to one of the shard MongoDB nodes in the DR and run the following as the root user (example on DC2-db4):

mongo -u root -p yourpassword --authenticationDatabase admin

Once inside the MongoDB terminal, run the following:

cfg = rs.conf()
cfg.members[3].priority = 5
rs.reconfig(cfg)

Where, the integer 3 (inside the members’ array) is equal to the _id value of the nodes that you want to promote as the PRIMARY, as in this case is DC2-db4. For DC2-db5, the member id is 4. You can see the id of corresponding nodes after you execute the first command cfg = rs.conf().

Make sure you get the following output, indicating the configuration is accepted and will be applied accordingly:

{
"ok" : 1
...
}

After around 10 seconds, a new election procedure will be triggered to promote DC2-db4 (the DR site) as the PRIMARY. You can verify by using the following command:

printjson(rs.status().members.map( function(m) { return {'name':m.name, '_id':m._id, 'health':m.health, 'stateStr':m.stateStr} }))

Example output (pay attention on the DC2-db4 section below):

replset:PRIMARY> printjson(rs.status().members.map(function(m) { return {'name':m.name, '_id':m._id, 'health':m.health, 'stateStr':m.stateStr} }))
[
        {
                "name" : "DC1-db1:27018",
                "_id" : 0,
                "health" : 1,
                "stateStr" : "SECONDARY"
        },
        {
                "name" : "DC1-db2:27018",
                "_id" : 1,
                "health" : 1,
                "stateStr" : "SECONDARY"
        },
        {
                "name" : "DC1-db3:27018",
                "_id" : 2,
                "health" : 1,
                "stateStr" : "SECONDARY"
        },
        {
                "name" : "DC2-db4:27018",
                "_id" : 3,
                "health" : 1,
                "stateStr" : "PRIMARY"
        },
        {
                "name" : "DC2-db5:27018",
                "_id" : 4,
                "health" : 1,
                "stateStr" : "SECONDARY"
        }
]

The DR activation is now complete. You may activate the applications on the DR site where the state is PRIMARY, or perform maintenance on the production site while DR is activated.

 

 

Related Post: