from collections import defaultdict
def group_by(rows, key):
groups = defaultdict(list)
for row in rows:
groups[row[key]].append(row)
return dict(groups)
users = [
{"name": "Ada", "team": "core"},
{"name": "Linus", "team": "kernel"},
{"name": "Grace", "team": "core"},
]
for team, members in group_by(users, "team").items():
names = ", ".join(m["name"] for m in members)
print(f"{team}: {names}")core: Ada, Grace
kernel: LinusOpen this snippet in the editor
Launches a fresh code space with this Python already loaded — edit it, share the link, or keep building.
How it works
`defaultdict(list)` creates an empty list automatically the first time a key is seen, so you can append without a `if key in groups` guard on every row. Wrapping the result in `dict(...)` at the end hands back an ordinary dictionary, which prints cleanly and serialises to JSON without surprises.
This is the standard-library answer to what pandas `groupby` does in one line. For a few thousand rows it's fast, dependency-free, and easy to read — and because regular dicts preserve insertion order in modern Python, the rows inside each group stay in the order they appeared, which is often what you want for display.
The pattern generalises well. Swap the key function for any expression — `row["date"][:7]` to bucket by month, `len(row["name"])` to bucket by length, a tuple `(row["team"], row["role"])` to group by two fields at once. The shape of the code stays identical; only the key changes.
Reach for pandas only when you also need real aggregation, joins, or columnar math across the groups. Pulling in a heavyweight dependency just to bucket a list is overkill, and the defaultdict version is easier for the next reader to follow.
Variations
Group and aggregate in one pass
Often you don't want the rows themselves, just a count or a sum per group. Accumulate directly instead of collecting lists.
from collections import defaultdict
def count_by(rows, key):
counts = defaultdict(int)
for row in rows:
counts[row[key]] += 1
return dict(counts)
print(count_by(users, "team")) # {'core': 2, 'kernel': 1}itertools.groupby (only on sorted input)
The standard library's groupby groups *consecutive* equal keys, so it requires the data to be sorted by that key first. Forget the sort and the same key splits into multiple groups.
from itertools import groupby
from operator import itemgetter
rows = sorted(users, key=itemgetter("team"))
for team, members in groupby(rows, key=itemgetter("team")):
print(team, [m["name"] for m in members])Bucketing orders by status
A common reporting task: take a flat list of orders from a query and split them by status so each bucket can be rendered or totalled separately.
orders = [
{"id": 1, "status": "paid", "total": 40},
{"id": 2, "status": "pending", "total": 15},
{"id": 3, "status": "paid", "total": 90},
]
by_status = group_by(orders, "status")
paid_total = sum(o["total"] for o in by_status["paid"])
print(paid_total) # 130Common mistakes & good to know
- Don't confuse this with itertools.groupby, which only groups consecutive equal keys — it requires sorting by the key first or it splits the same key apart.
- row[key] raises KeyError if a record is missing the field. Use row.get(key) when some rows may not have it, and decide what the None bucket means.
- The grouping key must be hashable. Strings and numbers are fine; a list or dict as the key raises TypeError.
- If you only need counts or sums, don't collect full lists and re-iterate — accumulate in one pass as in the aggregation variant.
Frequently asked questions
Should I use pandas for this?
Only if you also need aggregation, joins, or columnar operations across groups. For plain bucketing, defaultdict is faster to write, has no dependency, and is easier to read.
Why does itertools.groupby give me duplicate groups?
Because it only groups consecutive equal keys. Sort the input by the same key first, or use the defaultdict approach, which doesn't care about order.
How do I handle rows missing the key?
Use row.get(key) instead of row[key] to avoid KeyError, and decide whether missing values go into a None bucket or get skipped.
Can I group by more than one field?
Yes — use a tuple as the key, e.g. (row['team'], row['role']). Tuples are hashable, so they work as dict keys directly.
Related snippets
Previous
Deep clone an object in JavaScript
Next
Read a file line by line in Java