The aggregation frameworks is a pipeline of steps that runs on your data to convert the output on the format you need it. It provides a set of stages that you can chain together to create pipelines that process your data in a series of steps.
Here are some of the common aggregation stages:
-
$match: Filters documents based on specified criteria.
-
$project: Selects or excludes fields from documents.
-
$group: Groups documents by a specified field and calculates aggregate values.
-
$sort: Sorts documents by a specified field.
-
$limit: Limits the number of documents returned.
-
$skip: Skips a specified number of documents.
-
$unwind: Unwinds an array field, creating a new document for each element in the array.
-
$lookup: Joins two collections based on a specified field.
-
$redact: Redacts fields in documents based on specified criteria.
-
$bucket: Buckets documents into groups based on a specified field.
-
$sample: Samples a random subset of documents.
-
$geoNear: Finds documents near a specified point.
Example: Calculating Average Order Value
Scenario: Let's assume we have a collection named orders
with documents representing individual orders. Each document has fields like order_id
, customer_id
, product_name
, and price
. We want to calculate the average order value for each customer.
db.orders.aggregate([
{
$group: {
_id: "$customer_id",
total_spent: { $sum: "$price" },
total_orders: { $count: {} }
}
},
{
$project: {
average_order_value: { $divide: ["$total_spent", "$total_orders"] }
}
}
])
Explanation:
-
$group:
- Groups the documents by
customer_id
.
- Calculates the total spent for each customer using
$sum
.
- Counts the total number of orders for each customer using
$count
.
-
$project:
- Calculates the average order value by dividing
total_spent
by total_orders
.
Result: The aggregation pipeline will return a list of documents, each containing the customer_id
and the calculated average_order_value
.