I have a t2.medium server on AWS provisioned with 3 hard disks that work at different speeds to store and process the data. The disk where the data is stored has an IOPS of 1200 and a capacity of 200GB.
On the other hand I have the following query to post to post some data within an array in the collection documents. This collection can have millions of data which are not currently loaded. It only has around 3 million.
db.suf.aggregate([
{$match:{healthEntityCode:'ESS076'}},
{$unwind: "$activities"},
{$group: {_id:null,count:{$sum:1}}}
])
The first stage (match) iterates on around 500,000 records. The second stage (unwind) extends the number of records significantly up to 5 times the number of documents. I have not been able to count the total time that this operation requires since my patience runs out at 3 o'clock.
My question is: To make this kind of counts, is this the best strategy ?? I am seriously thinking of implementing a fragmentation infrastructure and scaling the project horizontally. Doing this ... Would the process improve?