Python

Group a list of dictionaries by key in Python

Turn a flat list of records into buckets keyed by one of their fields, using only the standard library.

Python
from collections import defaultdict


def group_by(rows, key):
    groups = defaultdict(list)
    for row in rows:
        groups[row[key]].append(row)
    return dict(groups)


users = [
    {"name": "Ada", "team": "core"},
    {"name": "Linus", "team": "kernel"},
    {"name": "Grace", "team": "core"},
]

for team, members in group_by(users, "team").items():
    names = ", ".join(m["name"] for m in members)
    print(f"{team}: {names}")
Output
core: Ada, Grace
kernel: Linus

Open this snippet in the editor

Launches a fresh code space with this Python already loaded — edit it, share the link, or keep building.

The ShareCode editor — write or paste your code, then share the link to bring someone in.
Share a code space by link, QR code, email, social apps, or an embed snippet for your own site.

How it works

`defaultdict(list)` creates an empty list automatically the first time a key is seen, so you can append without a `if key in groups` guard on every row. Wrapping the result in `dict(...)` at the end hands back an ordinary dictionary, which prints cleanly and serialises to JSON without surprises.

This is the standard-library answer to what pandas `groupby` does in one line. For a few thousand rows it's fast, dependency-free, and easy to read — and because regular dicts preserve insertion order in modern Python, the rows inside each group stay in the order they appeared, which is often what you want for display.

The pattern generalises well. Swap the key function for any expression — `row["date"][:7]` to bucket by month, `len(row["name"])` to bucket by length, a tuple `(row["team"], row["role"])` to group by two fields at once. The shape of the code stays identical; only the key changes.

Reach for pandas only when you also need real aggregation, joins, or columnar math across the groups. Pulling in a heavyweight dependency just to bucket a list is overkill, and the defaultdict version is easier for the next reader to follow.

Variations

Group and aggregate in one pass

Often you don't want the rows themselves, just a count or a sum per group. Accumulate directly instead of collecting lists.

Python
from collections import defaultdict

def count_by(rows, key):
    counts = defaultdict(int)
    for row in rows:
        counts[row[key]] += 1
    return dict(counts)

print(count_by(users, "team"))   # {'core': 2, 'kernel': 1}

itertools.groupby (only on sorted input)

The standard library's groupby groups *consecutive* equal keys, so it requires the data to be sorted by that key first. Forget the sort and the same key splits into multiple groups.

Python
from itertools import groupby
from operator import itemgetter

rows = sorted(users, key=itemgetter("team"))
for team, members in groupby(rows, key=itemgetter("team")):
    print(team, [m["name"] for m in members])

Bucketing orders by status

A common reporting task: take a flat list of orders from a query and split them by status so each bucket can be rendered or totalled separately.

Python
orders = [
    {"id": 1, "status": "paid", "total": 40},
    {"id": 2, "status": "pending", "total": 15},
    {"id": 3, "status": "paid", "total": 90},
]

by_status = group_by(orders, "status")
paid_total = sum(o["total"] for o in by_status["paid"])
print(paid_total)   # 130

Common mistakes & good to know

  • Don't confuse this with itertools.groupby, which only groups consecutive equal keys — it requires sorting by the key first or it splits the same key apart.
  • row[key] raises KeyError if a record is missing the field. Use row.get(key) when some rows may not have it, and decide what the None bucket means.
  • The grouping key must be hashable. Strings and numbers are fine; a list or dict as the key raises TypeError.
  • If you only need counts or sums, don't collect full lists and re-iterate — accumulate in one pass as in the aggregation variant.

Frequently asked questions

Should I use pandas for this?

Only if you also need aggregation, joins, or columnar operations across groups. For plain bucketing, defaultdict is faster to write, has no dependency, and is easier to read.

Why does itertools.groupby give me duplicate groups?

Because it only groups consecutive equal keys. Sort the input by the same key first, or use the defaultdict approach, which doesn't care about order.

How do I handle rows missing the key?

Use row.get(key) instead of row[key] to avoid KeyError, and decide whether missing values go into a None bucket or get skipped.

Can I group by more than one field?

Yes — use a tuple as the key, e.g. (row['team'], row['role']). Tuples are hashable, so they work as dict keys directly.

Related snippets

Previous

Deep clone an object in JavaScript

Next

Read a file line by line in Java