#100DaysOfMERN - Day 47

Subscribe to my newsletter and never miss my upcoming articles

✏ Advanced MongoDB: Aggregation

Dealing with databases is a very common task for full stack developers. Depending on what you're building, it's likely that at some point, your application will have to interact with a database - to handle user logins, product catalogues, or deliver streams of information/media.

Handling data, especially large amounts, is a huge academic field, and not much of it is immediately relevant for web developers. However, the ability to efficiently organise and query your data is a valuable skill. If you have a large products database and a frontend to search and filter the data, you'd ideally fetch only those pieces that are actually required, instead of loading the whole catalogue and extract the relevant parts on the server, or worse, on the client.

Can I just use find()?

You'll often get away with Mongo's various .find() methods, but as soon as you deal with larger documents and nested subdocuments, .find() isn't the best choice, because it always returns the whole document (or many of them). You can exclude certain fields, but if you only need one nested value from a large subdocument, you'd still have to return the whole field.

Your server would have to loop over all results, to reorganise/extract what you're looking for, which can be very memory-expensive, depending on the size of the collection. But with just a few methods and more elaborate queries, you can avoid this altogether, and let Mongo do it for you. After all, that's what it's optimised for.


✏ The Aggregation Pipeline

The aggregation pipeline is a process with (often) multiple stages, where you can filter, group and transform a collection of documents until you get the result you want. That can be just a value, or a single document, or a completely new collection.

Mongo has a vast number of possible stage operators, most of which probably only being useful for google data scientist wizards trying to figure out whether you're looking for pictures of cute animals, BMX bikes or for tech blogs when you search "mongoose".

For most practical use cases of database queries in web development, you'll only need very few of these operators. Those are already powerful enough for very advanced filtering and restructuring of documents, and compared to .find(), they're also much faster.

Another advantage is that by refining the query, most of the work is happening in Mongo's server memory, not yours.


✏ Aggregation Stages

The collection goes through the stages in a certain order, each one modifying the data in a certain way. They're independent of each other, the data flows in, gets transformed, and flows out into the next stage.

mongo-aggregation.png

That's why the order is important. Even if the result would be the same if you swapped two stages, the order can have big impact on performance. If you have a collection of users, and want an alphabetically sorted list of users from a certain country, it's probably obvious that you'd filter before you sort. Aggregation is like chaining JavaScript array methods.

Mongo or Mongoose?

I'll use Mongoose to build up a pipeline, but you can just as well use Mongo, it doesn't matter because Mongoose uses Mongo's aggregation framework under the hood. The only difference is that Mongo doesn't return an array, but a cursor, which can be converted to an array with toArray().

The aggregate method takes an array as argument, defining the order of the different stages:

Mongo:

db.collection(<name>).aggregate([  ]).toArray()

Mongoose:

<Model>.aggregate([  ]);

✏ Aggregation Syntax

Each stage is represented by an object.

  • key: name/method of the stage
  • value: a string, a number, but most often one or more field/expression pairs in curly braces
<Model>.aggregate([
    { <stage1> : { <field> : <expression> } },
    { <stage2> : { <field> : <expression> } },
    ...
    { <stageN> : { <field> : <expression> } },
])

Stages and Stage Operators

Each stage operator transforms the collection in a specific way, quite comparable to Array methods like .filter, .map, .reduce or .slice. Some might sound familiar, because they have an equivalent wrapper function, like .sort or .limit:

aggregation-operators.png

Expressions and Expression Operators

Depending on the stage, expressions can be

  • a reference to a field name
  • a comparison { $gt: 20 } (usually in $project stage)
  • an accumulator { $sum: 1 } (usually in $group stage)
  • a value 1 or -1 ($sort stage)

(Without examples, this won't make much sense right now, I just wanted to list this all at the beginning for later reference)


✏ Example Collection

I've built myself a recipe database, where I can add, edit and delete recipes from a frontend interface. Each recipe has a title, array of ingredients, instructions, and an info field for category and time for preparation and cooking. Each ingredient has a name, amount and unit.

The schemas look like this:

const IngredientSchema = new Schema(
    {
        name: String,
        amount: Number,
        unit: String
    }
);

const RecipeSchema = new Schema(
    {
        title: String,
        ingredients: [ IngredientSchema ],
        instructions: String,
        info: {
            category: String
            time: {
                preparation: Number,
                cooking: Number,
            }
        }
    }
);

After I had entered a number of recipes into my database, I wanted to add another page to the frontend, to display a list of all ingredients from all recipes. I had formerly done that by keeping a second collection with only the ingredient names, which seemed redundant.

I wanted to learn how to extract the list of all ingredients from the recipe collection with only one query to Mongo, that's why I started looking into aggregation in the first place, and I've since realised what a powerful tool it is.

Here's a really contrived collection that I'll use for all examples below:

aggregation-example-collection.png


✏ The $match operator

$match is typically the first stage in the pipeline. I works like .find(), or like a WHERE clause in SQL. Oftentimes you don't want to extract information from the whole collection, for example if you only want customer's data from a specific country, or only recipes with category "lunch".

.find({ condition }) is an aggregation pipeline with only one $match stage

The simplest example for a $match stage would be this, where you don't filter at all, but get the whole collection:

Recipe.aggregate([
        { $match: { } }
    ]);

// the same as:
Recipe.find({ });

Whatever you can pass to .find({}) can also be passed to $match.


This was a lot of theory, but tomorrow I'll continue with a long list of examples for all kinds of queries using $match.


✏ Resources

Aggregation in MongoDB

An Introduction to Mongoose Aggregate


✏ Recap

This post covered:

  • the basic principles of the mongoDB aggregation framework
  • aggregation pipeline and stages
  • stage operators and expression operators
  • the $match operator

✏ Thanks for reading!

I do my best to thoroughly research the things I learn, but if you find any errors or have additions, please leave a comment below, or @ me on Twitter. If you liked this post, I invite you to subsribe to my newsletter. Until next time 👋


✏ Previous Posts

You can find an overview of all previous posts with tags and tag search here:

#100DaysOfMERN - The App

Comments (1)

Pratik Zinjurde's photo

nice ! keep going