What You'll Learn
How to process, transform, and analyze data in MongoDB using the Aggregation Pipeline framework, including $match, $group, $project, and $lookup (NoSQL Joins).
What is the Aggregation Pipeline?
While db.collection.find() is great for retrieving documents, it can't compute totals, group data, or join collections. The Aggregation Pipeline solves this. It passes documents through a multi-stage pipeline, where each stage transforms the data and passes it to the next.
10,000 Orders
Filter: status = "completed"
Sum by customer_id
Top 5 highest spenders
Core Pipeline Stages
$match
Filters the documents. Always try to put $match as the first stage so it can use indexes and reduce the amount of data the rest of the pipeline has to process.
{ $match: { status: "completed", amount: { $gte: 100 } } }
$group
Groups documents by a specified identifier and applies accumulator expressions (like sum, avg, max).
{
$group: {
_id: "$customer_id", // Group by this field
totalSpent: { $sum: "$amount" }, // Calculate total
averageOrder: { $avg: "$amount" }, // Calculate average
orderCount: { $sum: 1 } // Count documents
}
}
$project
Reshapes the document. You can include/exclude fields, or create brand new computed fields.
{
$project: {
_id: 0, // Hide the _id field
fullName: { $concat: ["$firstName", " ", "$lastName"] },
discountApplied: { $gt: ["$discount", 0] } // Returns boolean
}
}
$lookup (NoSQL Joins)
Performs a left outer join to a collection in the same database. This bridges the gap between MongoDB and Relational databases!
{
$lookup: {
from: "customers", // Target collection to join
localField: "customer_id", // Field from the current (orders) collection
foreignField: "_id", // Field from the target (customers) collection
as: "customer_details" // Array field where results will be placed
}
}
A Complete Real-World Example
Let's find the top 3 spending customers from New York who bought electronics, complete with their name and email.
db.orders.aggregate([
// 1. Filter orders
{
$match: {
category: "electronics",
status: "delivered"
}
},
// 2. Group by customer to calculate total spend
{
$group: {
_id: "$customer_id",
totalSpent: { $sum: "$amount" }
}
},
// 3. Join with customers collection to get details
{
$lookup: {
from: "customers",
localField: "_id",
foreignField: "_id",
as: "customerData"
}
},
// 4. Unwind the array created by $lookup into an object
{
$unwind: "$customerData"
},
// 5. Filter for New York customers only
{
$match: {
"customerData.state": "NY"
}
},
// 6. Sort by highest total spent
{
$sort: { totalSpent: -1 }
},
// 7. Limit to Top 3
{
$limit: 3
},
// 8. Clean up the final output shape
{
$project: {
_id: 0,
name: "$customerData.name",
email: "$customerData.email",
totalSpent: 1
}
}
]);
Performance Optimization Tip
If you find yourself relying heavily on $lookup for your core application queries, you are treating MongoDB like a relational SQL database. In NoSQL, you should denormalize your data. Instead of joining collections, store a copy of the necessary customer data directly inside the order document!