MongoDB NoSQL Data Processing

Mastering the MongoDB Aggregation Pipeline

SJ
Sarah Jenkins
Database Administrator
Oct 20, 2025
16 min read

What You'll Learn

How to process, transform, and analyze data in MongoDB using the Aggregation Pipeline framework, including $match, $group, $project, and $lookup (NoSQL Joins).

What is the Aggregation Pipeline?

While db.collection.find() is great for retrieving documents, it can't compute totals, group data, or join collections. The Aggregation Pipeline solves this. It passes documents through a multi-stage pipeline, where each stage transforms the data and passes it to the next.

Collection
10,000 Orders
$match
Filter: status = "completed"
$group
Sum by customer_id
$sort
Top 5 highest spenders

Core Pipeline Stages

$match

Filters the documents. Always try to put $match as the first stage so it can use indexes and reduce the amount of data the rest of the pipeline has to process.

{ $match: { status: "completed", amount: { $gte: 100 } } }

$group

Groups documents by a specified identifier and applies accumulator expressions (like sum, avg, max).

{
  $group: {
    _id: "$customer_id",               // Group by this field
    totalSpent: { $sum: "$amount" },   // Calculate total
    averageOrder: { $avg: "$amount" }, // Calculate average
    orderCount: { $sum: 1 }            // Count documents
  }
}

$project

Reshapes the document. You can include/exclude fields, or create brand new computed fields.

{
  $project: {
    _id: 0,                            // Hide the _id field
    fullName: { $concat: ["$firstName", " ", "$lastName"] },
    discountApplied: { $gt: ["$discount", 0] } // Returns boolean
  }
}

$lookup (NoSQL Joins)

Performs a left outer join to a collection in the same database. This bridges the gap between MongoDB and Relational databases!

{
  $lookup: {
    from: "customers",                 // Target collection to join
    localField: "customer_id",         // Field from the current (orders) collection
    foreignField: "_id",               // Field from the target (customers) collection
    as: "customer_details"             // Array field where results will be placed
  }
}

A Complete Real-World Example

Let's find the top 3 spending customers from New York who bought electronics, complete with their name and email.

javascript — MongoDB Shell
db.orders.aggregate([
  // 1. Filter orders
  { 
    $match: { 
      category: "electronics", 
      status: "delivered" 
    } 
  },
  
  // 2. Group by customer to calculate total spend
  { 
    $group: { 
      _id: "$customer_id", 
      totalSpent: { $sum: "$amount" } 
    } 
  },
  
  // 3. Join with customers collection to get details
  { 
    $lookup: {
      from: "customers",
      localField: "_id",
      foreignField: "_id",
      as: "customerData"
    } 
  },
  
  // 4. Unwind the array created by $lookup into an object
  { 
    $unwind: "$customerData" 
  },
  
  // 5. Filter for New York customers only
  { 
    $match: { 
      "customerData.state": "NY" 
    } 
  },
  
  // 6. Sort by highest total spent
  { 
    $sort: { totalSpent: -1 } 
  },
  
  // 7. Limit to Top 3
  { 
    $limit: 3 
  },
  
  // 8. Clean up the final output shape
  { 
    $project: {
      _id: 0,
      name: "$customerData.name",
      email: "$customerData.email",
      totalSpent: 1
    } 
  }
]);

Performance Optimization Tip

If you find yourself relying heavily on $lookup for your core application queries, you are treating MongoDB like a relational SQL database. In NoSQL, you should denormalize your data. Instead of joining collections, store a copy of the necessary customer data directly inside the order document!

Keep Reading

D
DevOps

Docker Networking Demystified: Bridge, Host & Overlay

8 min read Read More
C
Cloud

AWS IAM Roles vs Users vs Policies

10 min read Read More
P
Programming

Understanding Python's GIL & Multiprocessing

14 min read Read More