MongoDB Sharding Setup on Ubuntu 14.04

The Issue:

Sometime back, there was an issue with a client’s application that used MongoDB for data storage. Due to high volume of data handled, they were facing performance issues. Generally with such high volume of data, the CPU gets utilised heavily due to high I/O operations. As a result, it will affect the performance of the app. While the obvious solution in such a situation is to scale the database, vertical scaling was already tried from the client’s end. But the issue persisted nevertheless. That is when we stepped in to scale the database horizontally and so I had to setup a MongoDB Sharding in order to resolve this.

In this article, I am going to explain how I went about setting up MongoDB sharding in an Ubuntu machine.

Ok, first, What is this Sharding?

In general, shard means a part of a whole. And in our current context, it will be a partition of a database; a horizontal one, to be specific. This is because we are going to talk about MongoDB in this article and how to do MongoDB sharding (meaning creating horizontal partitions) on an Ubuntu 14.04 machine.

 

Sharding, in simple terms, is the process of writing data across different servers. With sharding, we can add more servers to support data growth, and the demands of read and write operations.

Next, the basic definition of MongoDB

MongoDB is an open-source NoSQL high performance database and stores data through a key-value system. It provides performance, scaling and high availability; and how does it provide scalability? MongoDB scaling is done through sharding.

And then, MongoDB Sharding

Sharding has three different components and each components performs different functions. The following diagram shows those three components and its connectivity:

 

Config Servers – Config servers store the metadata of the clusters dataset to the shards. A production sharding must have exactly three config servers. These will organize the data for Query router.

 

Shards (Replica set) – Shards are responsible to store the actual data. Each shard is composed of multiple replica set, and we can define a replica set as containing one primary and one or more secondary set. As a result, shards provide high availability through replica set. The primary set alone can do write operations and secondary can do only Read operations. Furthermore, when the primary replica set goes down then one of the secondary replica set will act as master.

 

Query Routes (mongos) – Application and direct mongo operations can be done through Query routers. For each request it will communicate with shards then returns the results to clients. Also, the shard setup may have multiple Query router to split the request load.

 

Setup Prerequisites

We require the following servers for Mongodb Sharding setup:
Query server – Server A
Config server & Shards / Replica Set – Server B
Config server & Shards / Replica Set – Server C
Config server & Shards / Replica Set – Server D

 

STEP 1 – Install mongoDB

Before starting MongoDB sharding setup, make sure that all the servers are having the same version of MongoDB. In order to install mongo 3.2 in all four servers, use the below commands,

 

# First of all, Install the apt repo:


$ apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv 7F0CEB10
# Then, Create a list file for MongoDB:

$ echo "deb http://repo.mongodb.org/apt/ubuntu "$(lsb_release -sc)"/mongodb-org/3.2 multiverse" | sudo tee /etc/apt/sources.list.d/mongodb-org-3.2.list
# Update the package:

$ sudo apt-get update
# Then Install MongoDB

$ sudo apt-get install -y mongodb-org
# Finally, Freeze MongoDB version:

$ echo "mongodb-org hold" | sudo dpkg --set-selections
$ echo "mongodb-org-server hold" | sudo dpkg --set-selections
$ echo "mongodb-org-shell hold" | sudo dpkg --set-selections
$ echo "mongodb-org-mongos hold" | sudo dpkg --set-selections
$ echo "mongodb-org-tools hold" | sudo dpkg --set-selections

STEP 2 – MongoDB Sharding Config Server Setup

In order to setup config server run the following steps in the Servers B, C and D

 

Step 2.1: First, Create config server metadata storage directory


$ mkdir -p /var/mongodb/mongo-metadata

Step 2.2: Then, Run mongo sharding config server


$ mongod --configsvr --dbpath /var/mongodb/mongo-metadata --port 27019

STEP 3 – Mongo Shards Replica set setup

In order to setup shards run the following steps in the Servers B, C and D

 

Step 3.1: First, create Shards storage directory


$ mkdir -p /var/mongodb/store/rs1
$ mkdir -p /var/mongodb/store/rs2
$ mkdir -p /var/mongodb/store/rs3

Step 3.2: In order to run MongoDB Shards Replica set


$ mongod --shardsvr --replSet rs1 --dbpath /var/mongodb/store/rs1 --port 27000
$ mongod --shardsvr --replSet rs2 --dbpath /var/mongodb/store/rs2 --port 27001
$ mongod --shardsvr --replSet rs1 --dbpath /var/mongodb/store/rs3 --port 27001

Step 3.3: Then, Open mongo console


$ mongo --port 27000
mongo> rs.initiate()
mongo> rs.add("shard.host.com:27001")
mongo> rs.add("shard.host.com:27002")

STEP 4 – MongoDB Sharding Query Routers setup

To setup Query routers run the following steps in the Servers A

 

Before continuing this step check if all three config servers are running and listening connections. Then, we need to stop Mongodb process.

 

In order to stop Mongodb service:


$ sudo service mongodb stop

In order to start the query router service:
# Using this same (following) command you can start multiple query servers


$ mongos --configdb config0.host.com:27019,config1.host.com:27019,config2.host.com:27019

STEP 5 – Add our shards servers in Query Router

Now that we have our config servers, query routers and replica set server configured, we can add the shard servers to our query routers.

 

In order to add shard servers to the mongo cluster run the below steps in Server A

 

Step 5.1: Open mongo console


$ mongo --host query0.example.com --port 27017
mongo> sh.addShard( "rep_set_name/rep_set_member:27017" )

Step 5.2: In order to find the replica set member name, go to Shard server


$ mongo --port 27000
mongo> rs.status()
Now, select the Primary members name.

 

STEP 6 – Enable Sharding for a Database and Collection

Open mongo console


$ mongo --host query0.example.com --port 27017

Stop the sharding balancer
Before enabling sharding we have to stop the balancer service from mongo console.


mongo> sh.stopBalancer()

Enable sharding to the database


mongo> use current_DB
mongo> sh.enableSharding("#{current_DB}")

Enable sharding to the collections


mongo> use current_DB
mongo> sh.shardCollection( "current_DB.test_collection", { "_id" : "shard key" } )
Specifically notable is that the shard key that you choose here for sharding the collection will have impact on the performance.

 

Conclusion

By the end of this blog, you should also be able to implement your own MongoDB sharding configuration and thus set it up on an Ubuntu machine.

 

Sharding definitely helped boost the application’s performance and high availability of the database. Not only that, but it also allows the client to scale up their database horizontally by adding as many shards according to the increase in volume of data.

 

Follow Agira Tech Blog for more interesting blogs and queries.