#100DaysOfMERN - Day 49

Subscribe to my newsletter and never miss my upcoming articles

✏ MongoDB Aggregation: $group stage

With $group, you can completely restructure the collection, depending on different criteria of your choice. This stage makes the most sense when using it together with accumulators (see below), but I'll start with a few simple examples.

The syntax for the most basic group aggregation is:

<Model>.aggregate([
    { $group: { _id: <expression> } }
])

Here, the_id field is mandatory. It does not refer to the ObjectId that every record has by default. $group produces a new set of documents, and the id field depends on the expression (= field reference) that you grouped the collection by.

Let's look at the collection again:

aggregation-example-collection.png

Grouping by Title

To get a list of all recipe titles, group by the title field. A field reference is indicated by wrapping the field name in quotes and a $:

Recipe.aggregate([
    { $group: { _id: '$title' } }
])

The output for this is a collection of documents that have only one property _id with the recipe title:

[
  { _id: 'Pasta' },
  { _id: 'Soup' },
  { _id: 'Dessert' },
  { _id: 'Salad' }
]

This is essentially the same as using .find({}) together with .select() to exclude the ingredients list and other fields. The difference is that $group returns only distinct documents, see the next example:

Grouping by Category

I have four recipes in three categories, grouping them by category:

Recipe.aggregate([
    { $group: { _id: '$info.category' } }
])

The output is a list of all different categories:

[
  { _id: 'lunch' },
  { _id: 'side' },
  { _id: 'dessert' }
]

Grouping by Info

The info field is a subdocument with two properties, the category and another object with preparation and cooking time. Grouping by '$info' results in four documents with only the info subdocument:

Recipe.aggregate([
    { $group: { _id: '$info' } }
])

// result
[
  { _id: { category: 'side', time: [Object] } },
  { _id: { category: 'dessert', time: [Object] } },
  { _id: { category: 'lunch', time: [Object] } },
  { _id: { category: 'lunch', time: [Object] } }
]

Mongo does a deep comparison of objects (and arrays). If I had two or more recipes in the same category and with the same values for info.time.preparation and info.time.cooking, they would not be considered distinct.

Grouping by Ingredients

Likewise, grouping by ingredients results in four documents with only their ingredients arrays (I'll talk about arrays in more detail when describing the $unwind stage). Grouping by '$ingredients.name' would return a list of ingredients for each document:

[
  { _id: [ 'tomatos', 'salt', 'pepper' ] },
  { _id: [ 'spaghetti', 'salt', 'pesto' ] },
  { _id: [ 'tomatos', 'oil', 'pepper' ] },
  { _id: [ 'tomatos' ] }
]

This list alone isn't very helpful though, since the title of the recipe is missing in the output.


✏ Adding more fields to the result collection

Instead of only one <expression>, pass an object with all fields that should be included. You can rename the keys of the resulting document:

Recipe.aggregate([
    { $group: { _id: {
        recipeName: '$title',
        ingredientsList: '$ingredients.name' }
    } }
])


// result
[
  { _id: { recipeName: 'Soup', ingredientsList: [Array] } },
  { _id: { recipeName: 'Dessert', ingredientsList: [Array] } },
  { _id: { recipeName: 'Salad', ingredientsList: [Array] } },
  { _id: { recipeName: 'Pasta', ingredientsList: [Array] } }
]

Adding more fields generally increases the number of result documents. If I had a collection with recipes in 5 categories and another field for "warm/cold", then grouping by the latter will result in 2 documents. Assuming that all categories have both warm and cold dishes, grouping by both "warm/cold" and category will result in 10 documents, because there's 10 distinct combinations.

But again, a result that only lists those combinations makes no sense in this case. I already know them, because I'm the one who decided which categories the recipes in my collection can have. It would only make sense if you have a database that's edited by many different users who are free to choose and name a category themselves.

The real power of $group lies in adding accumulators. Those allow for example to find out how many recipes I have in each category.


✏ Accumulators

The most common accumulators in the $group stage are $sum, $min, $max or $avg. The syntax is:

<Model>.aggregate([
    { $group: {
         _id: <expression>,
        <field>: { <accumulator>: <expression> }
        }
    }
])

To find out how many recipes I have in each category, I'd use the $sum operator with an expression (value) of 1:

Recipe.aggregate([
    { $group: {
        _id: '$info.category',
        total: { $sum: 1 }
        }
    }
])

// result
[
  { _id: 'lunch', total: 2 },
  { _id: 'dessert', total: 1 },
  { _id: 'side', total: 1 }
]

Here, $sum is used as a counter (indicated by the value of 1).

Pitfalls

Note that it makes a big difference how you group and count. In the first example below, I'm counting documents in each category (same as above), the second example gets a list of distinct categories and pointlessly returns the information that each distinct document is present with a total count of 1:

// counting documents in each category:
{ $group: { _id: '$info.category', total: { $sum: 1 } } }

// result
[
  { _id: 'lunch', total: 2 },
  { _id: 'dessert', total: 1 },
  { _id: 'side', total: 1 }
]

// pointless:
{ $group: { _id: { category: '$info.category', total: { $sum: 1 } } } }

// result
[
  { _id: 'lunch', total: 1 },
  { _id: 'dessert', total: 1 },
  { _id: 'side', total: 1 }
]

Accumulators with a field reference

Instead of using an accumulator like $sum with a value of 1, it can also be used with a field reference, to calculate the total preparation time per category. It's a slightly nonsensical query though, so I'll take $avg instead, to get the average preparation time per category:

Recipe.aggregate([
    { $group: {
        _id: '$info.category',
        averagePrepTime: { $avg: '$info.time.preparation' }
        }
    }
])

// result
[
  { _id: 'lunch', averagePrepTime: 1.5 },
  { _id: 'dessert', averagePrepTime: 0 },
  { _id: 'side', averagePrepTime: 10 }
]

✏ Grouping by null

Grouping by null allows to count/accumulate values for the whole collection. It returns one document (because there's only one distinct null value). If I take the above example and group by null instead of $info.category:

Recipe.aggregate([
    { $group: {
        _id: null,
        averagePrepTime: { $avg: '$info.time.preparation' }
        }
    }
])

// result
[ { _id: null, averagePrepTime: 3.25 } ]

With such a small collection of 4 recipes, it's difficult to use accumulators in any meaningful way, so I'll add some examples with a more realistic collection, with combinations of $match and $group:


✏ User collection: $match and $group

Imagine you have a collection of users, following this Model:

{
  email: String,
  name: String,
  age: Number,
  location: {
    address: String,
    country: String
  }
  isOnline: Boolean
}

Grouping by '$age' would return a list of all distinct age values in the collection. Adding accumulators, you can extract information about each group.

To find out how many are online, grouped by age, you'd first use $match to find all documents where isOnline is true, then use $group with a $sum accumulator. The result is an (unsorted) list of objects with age/online properties:

User.aggregate([
    { $match: { isOnline: true } },
    { $group: { _id: '$age', online: { $sum: 1 } } }
])

// result
[
  { _id: 18, online: 143 },
  { _id: 45, online: 98 },
  { _id: 26, online: 122 },
  ...
]

One last example: To find out the average age of users, grouped by country:

User.aggregate([
    { $group: {
        _id: '$location.country', 
        averageAge: { $avg: '$age' }
        }
    }
])

// result
[
  { _id: 'USA', averageAge: 25.34 },
  { _id: 'India', averageAge: 22.78 },
  { _id: 'Brazil', averageAge: 23.62 },
  ...
]

That's it for the $group stage. Next post: $project stage.


✏ Resources

I highly recommend this YouTube playlist:

MongoDB Aggregation Framework


✏ Recap

This post covered:

  • MongoDB aggregation: $group stage

✏ Thanks for reading!

I do my best to thoroughly research the things I learn, but if you find any errors or have additions, please leave a comment below, or @ me on Twitter. If you liked this post, I invite you to subsribe to my newsletter. Until next time 👋


✏ Previous Posts

You can find an overview of all previous posts with tags and tag search here:

#100DaysOfMERN - The App

No Comments Yet