MongoDB very expensive counting query

0

I have a t2.medium server on AWS provisioned with 3 hard disks that work at different speeds to store and process the data. The disk where the data is stored has an IOPS of 1200 and a capacity of 200GB.

On the other hand I have the following query to post to post some data within an array in the collection documents. This collection can have millions of data which are not currently loaded. It only has around 3 million.

db.suf.aggregate([
   {$match:{healthEntityCode:'ESS076'}},
   {$unwind: "$activities"},
   {$group: {_id:null,count:{$sum:1}}}
])

The first stage (match) iterates on around 500,000 records. The second stage (unwind) extends the number of records significantly up to 5 times the number of documents. I have not been able to count the total time that this operation requires since my patience runs out at 3 o'clock.

My question is: To make this kind of counts, is this the best strategy ?? I am seriously thinking of implementing a fragmentation infrastructure and scaling the project horizontally. Doing this ... Would the process improve?

    
asked by Christian Lopez 25.08.2016 в 23:47
source

1 answer

1

In principle, if you have an index on "healthEntityCode" it should go well. Climbing horizontally will only serve you for the first step, but the data is put together for the unwind and from that point on it is managed only in the mongos that are solving your request, so if the conflict is that the first match, even indexed, will be It may be useful, but first check the index.

    
answered by 06.01.2017 / 01:46
source