MongoDB aggregation is the part of the database that reshapes documents as they move through a pipeline. Each stage receives the output of the previous stage, so the order of stages changes both the result and the amount of work MongoDB has to do.
You reach for aggregation when a simple find() filter is not enough: totals, grouped counts, leaderboard-style summaries, array flattening, joins, faceted dashboards, and derived fields that should be calculated inside the database.
The lesson is easiest to understand when the same pipeline is traced from raw event documents to a final report document, because that exposes both the data flow and the stage-by-stage side effects.
A pipeline is a sequence of transformation stages. MongoDB starts with input documents from a collection, feeds them into the first stage, and passes the resulting documents into the next stage. Nothing is rewritten in place unless you explicitly send the result to another collection.
That flow matters because every stage has a different job. $match filters, $group collapses multiple documents into summaries, $project changes the shape, $sort orders the stream, $lookup pulls matching rows from another collection, and $facet splits the stream into several reports at once.
The same stages can produce different answers depending on their order. A $match stage placed before $group reduces the number of documents entering the grouping step. The same filter placed after $group only filters the grouped summaries, which is a completely different job.
MongoDB can also exploit indexes better when a selective $match appears early in the pipeline. That makes stage order part of correctness and performance, not just style.
$match keeps the same document shape but removes non-matching documents. $group collapses rows into one row per grouping key. $project can rename fields, hide fields, or compute new values. $unwind turns an array into one document per array element. $lookup adds a new array of matching documents from another collection.
$facet is the stage that makes a single pipeline branch into multiple sub-pipelines. That is useful when a dashboard needs counts by status, price bands, and recent items from the same dataset without running three separate queries.
When a pipeline is wrong, trace the shape after every stage. Ask three questions: how many documents remain, which fields still exist, and which values are newly computed. That is usually enough to find the stage that changed the output in an unexpected way.
If the pipeline is slow, inspect the first stages. A selective $match placed too late or an unnecessary $lookup before a filter often creates a much larger working set than the final report needs.
db.orders.aggregate([\n { $match: { status: { $in: ["paid", "packed", "shipped"] } } },\n { $group: {\n _id: "$shipping.state",\n orders: { $sum: 1 },\n revenue: { $sum: "$grandTotal" }\n }},\n { $project: {\n _id: 0,\n state: "$_id",\n orders: 1,\n revenue: { $round: ["$revenue", 2] }\n }},\n { $sort: { revenue: -1 } }\n])
db.invoices.aggregate([\n { $match: { invoiceDate: { $gte: ISODate("2026-01-01") } } },\n { $unwind: "$lineItems" },\n { $lookup: {\n from: "catalog",\n localField: "lineItems.sku",\n foreignField: "sku",\n as: "product"\n }},\n { $unwind: "$product" },\n { $project: {\n invoiceNo: 1,\n sku: "$lineItems.sku",\n qty: "$lineItems.qty",\n unitPrice: "$lineItems.unitPrice",\n productName: "$product.name"\n } }\n])
db.sessions.aggregate([\n { $match: { app: "merchant-portal", createdAt: { $gte: ISODate("2026-05-01") } } },\n { $facet: {\n byRole: [\n { $group: { _id: "$user.role", count: { $sum: 1 } } }\n ],\n byCountry: [\n { $group: { _id: "$user.country", count: { $sum: 1 } } },\n { $sort: { count: -1 } },\n { $limit: 5 }\n ],\n errorRate: [\n { $group: { _id: null, errors: { $sum: { $cond: ["$hadError", 1, 0] } }, total: { $sum: 1 } } },\n { $project: { _id: 0, errorRate: { $divide: ["$errors", "$total"] } } }\n ]\n } }\n])
Grouping the entire collection before filtering the rows that matter.
Put a selective $match before $group so fewer documents reach the expensive stage.
Using $lookup and then forgetting that the joined result is an array.
Flatten the joined array with $unwind when the relationship is one-to-one or one-to-few.
Moving fields with $project before you still need them later in the pipeline.
Delay field removal until after the stages that consume those fields.
The aggregation pipeline is a framework for data transformation. Documents pass through a sequence of stages: $match (filter), $group (aggregate), $project (reshape), $sort, $lookup (join), $unwind (flatten arrays), $limit, $skip.
find() is for simple queries returning documents as-is. $match inside an aggregation pipeline filters documents as part of a multi-stage transformation. $match can use indexes when it is the first stage.
<code>db.orders.aggregate([{ $group: { _id: "$status", count: { $sum: 1 } } }, { $sort: { count: -1 } }])</code> - groups by status field and counts documents in each group.
$facet runs multiple aggregation pipelines in parallel on the same input documents and returns results in a single document. Useful for building faceted search (e.g., count by category AND count by price range simultaneously).
Explore 500+ free tutorials across 20+ languages and frameworks.