MongoDB Aggregation Pipeline: $match, $group, $lookup, $facet

MongoDB Aggregation Pipeline

MongoDB aggregation is the part of the database that reshapes documents as they move through a pipeline. Each stage receives the output of the previous stage, so the order of stages changes both the result and the amount of work MongoDB has to do.

You reach for aggregation when a simple find() filter is not enough: totals, grouped counts, leaderboard-style summaries, array flattening, joins, faceted dashboards, and derived fields that should be calculated inside the database.

The lesson is easiest to understand when the same pipeline is traced from raw event documents to a final report document, because that exposes both the data flow and the stage-by-stage side effects.

What the Pipeline Does

A pipeline is a sequence of transformation stages. MongoDB starts with input documents from a collection, feeds them into the first stage, and passes the resulting documents into the next stage. Nothing is rewritten in place unless you explicitly send the result to another collection.

That flow matters because every stage has a different job. $match filters, $group collapses multiple documents into summaries, $project changes the shape, $sort orders the stream, $lookup pulls matching rows from another collection, and $facet splits the stream into several reports at once.

The output of one stage becomes the input of the next stage.
$group changes document cardinality by merging many documents into fewer summary documents.
$project is about shape, not filtering logic.
$lookup adds related data from another collection without leaving the pipeline.

Stage Order Changes the Result

The same stages can produce different answers depending on their order. A $match stage placed before $group reduces the number of documents entering the grouping step. The same filter placed after $group only filters the grouped summaries, which is a completely different job.

MongoDB can also exploit indexes better when a selective $match appears early in the pipeline. That makes stage order part of correctness and performance, not just style.

Filter early when you can discard irrelevant documents before heavier stages.
Group only after you have the rows you actually want to summarize.
Place reshaping stages where the final document shape becomes clear.
Use $sort after grouping when you want to rank summaries, not raw rows.

Common Stages and Their Shape Changes

$match keeps the same document shape but removes non-matching documents. $group collapses rows into one row per grouping key. $project can rename fields, hide fields, or compute new values. $unwind turns an array into one document per array element. $lookup adds a new array of matching documents from another collection.

$facet is the stage that makes a single pipeline branch into multiple sub-pipelines. That is useful when a dashboard needs counts by status, price bands, and recent items from the same dataset without running three separate queries.

Use $group when you need totals, averages, minimums, maximums, or counts by category.
Use $unwind when array contents must be treated like individual rows.
Use $lookup when the output needs related records from another collection.
Use $facet when one input stream must feed several reports at once.

How to Read a Pipeline Like a Debugger

When a pipeline is wrong, trace the shape after every stage. Ask three questions: how many documents remain, which fields still exist, and which values are newly computed. That is usually enough to find the stage that changed the output in an unexpected way.

If the pipeline is slow, inspect the first stages. A selective $match placed too late or an unnecessary $lookup before a filter often creates a much larger working set than the final report needs.

Count the number of documents after each stage.
Check whether the stage removes fields or only adds new ones.
Watch for array explosion after $unwind.
Look for a $lookup that runs before a narrowing $match.

Orders by shipping state

db.orders.aggregate([\n  { $match: { status: { $in: ["paid", "packed", "shipped"] } } },\n  { $group: {\n      _id: "$shipping.state",\n      orders: { $sum: 1 },\n      revenue: { $sum: "$grandTotal" }\n  }},\n  { $project: {\n      _id: 0,\n      state: "$_id",\n      orders: 1,\n      revenue: { $round: ["$revenue", 2] }\n  }},\n  { $sort: { revenue: -1 } }\n])

Flatten line items and attach catalog data

db.invoices.aggregate([\n  { $match: { invoiceDate: { $gte: ISODate("2026-01-01") } } },\n  { $unwind: "$lineItems" },\n  { $lookup: {\n      from: "catalog",\n      localField: "lineItems.sku",\n      foreignField: "sku",\n      as: "product"\n  }},\n  { $unwind: "$product" },\n  { $project: {\n      invoiceNo: 1,\n      sku: "$lineItems.sku",\n      qty: "$lineItems.qty",\n      unitPrice: "$lineItems.unitPrice",\n      productName: "$product.name"\n  } }\n])

Dashboard metrics with $facet

db.sessions.aggregate([\n  { $match: { app: "merchant-portal", createdAt: { $gte: ISODate("2026-05-01") } } },\n  { $facet: {\n      byRole: [\n        { $group: { _id: "$user.role", count: { $sum: 1 } } }\n      ],\n      byCountry: [\n        { $group: { _id: "$user.country", count: { $sum: 1 } } },\n        { $sort: { count: -1 } },\n        { $limit: 5 }\n      ],\n      errorRate: [\n        { $group: { _id: null, errors: { $sum: { $cond: ["$hadError", 1, 0] } }, total: { $sum: 1 } } },\n        { $project: { _id: 0, errorRate: { $divide: ["$errors", "$total"] } } }\n      ]\n  } }\n])

Key Takeaways

Place $match early when the pipeline can discard documents up front.
Check how $group changes document count and field shape.
Use $lookup only when the final document really needs related data.
Inspect array fields before and after $unwind.
Use $facet for parallel reports from the same filtered input.

Common Mistakes to Avoid

WRONG Grouping the entire collection before filtering the rows that matter.

RIGHT Put a selective $match before $group so fewer documents reach the expensive stage.

The earlier filter often changes both speed and correctness.

WRONG Using $lookup and then forgetting that the joined result is an array.

RIGHT Flatten the joined array with $unwind when the relationship is one-to-one or one-to-few.

The joined shape is part of the pipeline design, not an accident.

WRONG Moving fields with $project before you still need them later in the pipeline.

RIGHT Delay field removal until after the stages that consume those fields.

A projection that is too early can erase data the next stage depends on.

Practice Tasks

Write a pipeline that counts paid orders by region and sorts the result by revenue.
Add a $lookup that attaches user profile data to the same report, then flatten the joined array.
Take a dashboard-style query and split it into three $facet branches for counts, top values, and ratios.

Frequently Asked Questions

What is the MongoDB aggregation pipeline?

The aggregation pipeline is a framework for data transformation. Documents pass through a sequence of stages: $match (filter), $group (aggregate), $project (reshape), $sort, $lookup (join), $unwind (flatten arrays), $limit, $skip.

What is the difference between $match and find()?

find() is for simple queries returning documents as-is. $match inside an aggregation pipeline filters documents as part of a multi-stage transformation. $match can use indexes when it is the first stage.

How do I count documents by a field value?

<code>db.orders.aggregate([{ $group: { _id: "$status", count: { $sum: 1 } } }, { $sort: { count: -1 } }])</code> - groups by status field and counts documents in each group.

What is $facet in MongoDB aggregation?

$facet runs multiple aggregation pipelines in parallel on the same input documents and returns results in a single document. Useful for building faceted search (e.g., count by category AND count by price range simultaneously).

Previous Next

MongoDB Aggregation Pipeline: $match, $group, $lookup, $facet