Which Is Best? Aggregation or Map Reduce | MongoDb

 

These days, websites has ideal structures and seamless designs, It’s obvious that your database should have a potential to handle high level data.

 

And also, It should maintain standard protocols to meet the challenges of handling data.

 

On other hand, Databases will possess different functionalities based on the size of data sets.

 

So, How we can find the appropriate method to handle data sets based on it’s size.

 

On this list, what MongoDb has in it’s bucket to offer us.

 

Hopefully, This article will help you to analyze the challenges of handling large datasets and small datasets in MongoDB,

 

 

MongoDB provides 2 way of performing aggregation:

 

  1. Aggregation pipeline
  2. Map-reduce function

 

What is Aggregation pipeline? And How it will work?

 

Aggregation pipeline is an enhanced framework for transforming excessive documents into aggregated results by utilizing multi stage pipeline.

 

Aggregation operation is precisely used to fetch computed results from group of values.

 

Will allow us to play out variety of operations to return specific results from the clustered data.

 

This framework profound an alternative solution for aggregation entitled as “ Map reduce”.

 

literally, It’s treated as a preferable method to overcoming on-board complexities.

 

 

The aggregate() Method:

 

Aggregation pipeline is addressed as aggregate() method in MongoDB.

 

And the basic syntax of aggregate() method is followed by,

 

db.COLLECTION_NAME.aggregate(AGGREGATE_OPERATION)

 

Henceforth, Let us explain with the piece of our data sample to figure out how this extensive operations and functions are held up.

READ NOW >>  How To Prototype A Successful Mobile App?

 

So you can examine more about it.

 

Will take following datasets,

{ "_id" : ObjectId("58be8ccb99c2b03ff17c4b25"), "HospitalAlias" : "cb", "DateOfService" : "1/28/2017", "DoctorAlias" : "aful", "Charge" : "a", "Level" : 2 }

{ "_id" : ObjectId("58be8ccb99c2b03ff17c4b26"), "HospitalAlias" : "cb", "DateOfService" : "1/29/2017", "DoctorAlias" : "jliu", "Charge" : "f", "Level" : 1 }

{ "_id" : ObjectId("58be8ccb99c2b03ff17c4b27"), "HospitalAlias" : "cb", "DateOfService" : "1/30/2017", "DoctorAlias" : "jliu", "Charge" : "f", "Level" : 1 }

 

 

With the above collection, if you want to calculate how many persons are admitted into the hospital by date of service, then you can use the following aggregate() method:

 

db.hospital_rounds.aggregate([{ $group: { _id: "$DateOfService", y: { $sum: 1 } } }])
[\code]

 

Now the above query returns the following results:

 

[code]
{ "_id" : "3/8/2017", "y" : 3 }

{ "_id" : "3/7/2017", "y" : 4 }

{ "_id" : "10/6/2016", "y" : 83 }

{ "_id" : "10/7/2016", "y" : 93 }

{ "_id" : "10/2/2016", "y" : 69 }

{ "_id" : "10/4/2016", "y" : 88 }

 

Probably you might noticed how we have aggregated the specific results from the above data cluster.

 

 

Following image will give you a clear idea,

 

 

 

agree 2

 

 

 

Similarly, you have other operators to perform various operations.

 

We have listed down few major operators on your view.

 

Pipeline operators:

 

  1. $match - Filter documents
  2. $project - Reshape documents
  3. $group - Summarize documents
  4. $wind - Expand documents
  5. $sort - Order documents
  6. $limit / $skip - Paginate documents
  7. $redact - restrict documents
  8. $geoNear - Proximity sort documents
  9. $let / $map - Bind variables to sub-expressions
  10. $out - send result to collection

 

What Is Map Reduce?

 

Map-reduce function wisely used to access large datasets into a handful aggregated results.

 

And mapreduce() command used to execute this function.

 

Custom JavaScript function will add flexibility to this function.

 

Leveraging this method, JavaScript function can evaluate and modify the final results to perform additional calculations.

 

The Map-Reduce Method:

 

Syntax:

db.COLLECTION_NAME.mapReduce(map(),reduce(),{query:{}})

Consider the following data for performing map-reduce method,

 

{ "_id" : ObjectId("58be8ccb99c2b03ff17c4b25"), "HospitalAlias" : "cb", "DateOfService" : "1/28/2017", "DoctorAlias" : "aful", "Charge" : "a", "Level" : 2 }

{ "_id" : ObjectId("58be8ccb99c2b03ff17c4b26"), "HospitalAlias" : "cb", "DateOfService" : "1/28/2017", "DoctorAlias" : "jliu", "Charge" : "f", "Level" : 1 }

{ "_id" : ObjectId("58be8ccb99c2b03ff17c4b27"), "HospitalAlias" : "cb", "DateOfService" : "1/30/2017", "DoctorAlias" : "jliu", "Charge" : "f", "Level" : 1 }

 

If you apply the same concept here, you could fetch the number of persons has visited the hospital in a specified date by executing following query.

 

var o = { }

o.map = function () { emit(this.dateOfService, 1); };

o.reduce = function (k, vals) { return Array.sum(vals); };

o.query = { hospitalAlias: “cb”};

db.COLLECTION_NAME.mapReduce(o)

 

This query will return back the specific values.

 

Output:

{ "_id" : "1/28/2017", "value" : 2 }

{ "_id" : "1/30/2017", "value" : 1 }

 

You can also take a look at the below sample,

 

 

 

map

 

Limitations

 

Limitation of storing data is allotted up to 16 MB in both cases.

 

Obviously, we will be in a trouble when the limit exceeds.

 

It’s always preferable to have a good practice of keeping old data aside".

 

In that case, once you reach the limit,  you can export the data and keep it separate with help of "Out operator" .

 

So you can set a free space to store your new data. Once the whole process got over, you can always merge all the datasets later.

 

We also have one more option to handle this, with help of  allowDiskUse command you can free up the disk space.

 

 

Conclusion

 

Queries which are very complex are difficult to handle in aggregation framework.  It can help you out but it’s not advisable to use for complex queries.

 

While proceeding, small dataset will take considerable time to load in Map reduce and even large datasets will occupy the same time to process.

 

Be it a small datasets or big, both will take a same amount of time to execute.

 

So it’s better to handle large datasets in map reduce function.

 

For the considerable flexibility over datasets, Map reduce function can handle large datasets faster.

 

Parallely, you can go with Aggregation pipeline for handling small datasets.

 

It would be great if you choose any of the method based on the size of your datasets.