Chapter 11: Comprehensions and Generators¶
1. Overview¶
Python gives you concise, readable syntax for building collections and processing sequences without writing explicit loops every time. These tools — comprehensions and generators — are among the most distinctively Pythonic features of the language.
Comprehensions let you build lists, sets, and dicts in a single expression. Generators let you produce values one at a time, on demand, without building the entire sequence in memory first.
Used well, these features make code shorter and often faster. Used carelessly, they produce unreadable one-liners. This chapter shows you both how to use them and when to reach for a plain loop instead.
2. What You Will Learn¶
- List comprehensions: syntax, filtering, and nesting
- Set comprehensions
- Dictionary comprehensions
- Generator expressions: lazy evaluation and memory efficiency
- Generator functions with
yield - The difference between a generator and a list
yield fromfor delegating to sub-generators- When to use comprehensions vs. loops vs. generators
- Common built-in functions that work with iterables:
any(),all(),sum(),min(),max(),enumerate(),zip()
3. Core Concepts¶
3.1 List Comprehensions¶
A list comprehension builds a new list by applying an expression to each item in an iterable.
# Basic form
[expression for item in iterable]
# With a filter
[expression for item in iterable if condition]
Compare the loop version with the comprehension version:
# Loop
squares = []
for n in range(1, 6):
squares.append(n ** 2)
# Comprehension — same result, one line
squares = [n ** 2 for n in range(1, 6)]
print(squares) # [1, 4, 9, 16, 25]
Filtering with if¶
Add an if clause to include only items that meet a condition.
# Only even numbers
evens = [n for n in range(1, 11) if n % 2 == 0]
print(evens) # [2, 4, 6, 8, 10]
# Words longer than 4 characters
words = ["apple", "fig", "banana", "kiwi", "cherry"]
long_words = [w for w in words if len(w) > 4]
print(long_words) # ['apple', 'banana', 'cherry']
Transforming items¶
The expression can be any valid Python expression.
names = ["alice", "bob", "charlie"]
upper = [name.upper() for name in names]
print(upper) # ['ALICE', 'BOB', 'CHARLIE']
# Extract a field from a list of dicts
students = [
{"name": "Alice", "grade": 88},
{"name": "Bob", "grade": 92},
]
grades = [s["grade"] for s in students]
print(grades) # [88, 92]
Nested comprehensions¶
You can nest for clauses to iterate over multiple iterables.
# All pairs (x, y) where x != y
pairs = [(x, y) for x in range(1, 4) for y in range(1, 4) if x != y]
print(pairs)
# [(1, 2), (1, 3), (2, 1), (2, 3), (3, 1), (3, 2)]
This is equivalent to nested loops:
Flattening a matrix is a common use case:
matrix = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
flat = [value for row in matrix for value in row]
print(flat) # [1, 2, 3, 4, 5, 6, 7, 8, 9]
Conditional expression in the output¶
You can use a ternary expression in the output part (not the filter part) to transform items differently based on a condition.
# Replace negatives with 0
numbers = [4, -1, 7, -3, 2, -5]
clamped = [n if n >= 0 else 0 for n in numbers]
print(clamped) # [4, 0, 7, 0, 2, 0]
The structure is:
3.2 Set Comprehensions¶
A set comprehension builds a set using the same syntax as a list comprehension, but with curly braces.
# Unique lengths of words
words = ["apple", "fig", "banana", "kiwi", "cherry", "plum"]
lengths = {len(w) for w in words}
print(lengths) # {3, 4, 5, 6} — order not guaranteed
Duplicates are automatically removed, just like a regular set.
numbers = [1, 2, 2, 3, 3, 3, 4]
unique_squares = {n ** 2 for n in numbers}
print(unique_squares) # {1, 4, 9, 16}
3.3 Dictionary Comprehensions¶
A dict comprehension builds a dictionary from an iterable.
# Map each word to its length
words = ["apple", "fig", "banana"]
lengths = {word: len(word) for word in words}
print(lengths) # {'apple': 5, 'fig': 3, 'banana': 6}
# Squares dict
squares = {n: n ** 2 for n in range(1, 6)}
print(squares) # {1: 1, 2: 4, 3: 9, 4: 16, 5: 25}
Inverting a dictionary¶
original = {"a": 1, "b": 2, "c": 3}
inverted = {v: k for k, v in original.items()}
print(inverted) # {1: 'a', 2: 'b', 3: 'c'}
Filtering a dictionary¶
scores = {"Alice": 88, "Bob": 55, "Charlie": 92, "Diana": 47}
passing = {name: score for name, score in scores.items() if score >= 60}
print(passing) # {'Alice': 88, 'Charlie': 92}
3.4 Generator Expressions¶
A generator expression looks like a list comprehension but uses parentheses instead of square brackets. It does not build a list in memory — it produces values one at a time, on demand.
# List comprehension — builds the whole list immediately
squares_list = [n ** 2 for n in range(1_000_000)] # uses ~8 MB
# Generator expression — produces values lazily
squares_gen = (n ** 2 for n in range(1_000_000)) # uses ~200 bytes
You iterate over a generator the same way you iterate over a list.
But a generator can only be iterated once. After it is exhausted, it produces no more values.
gen = (n ** 2 for n in range(3))
print(list(gen)) # [0, 1, 4]
print(list(gen)) # [] — already exhausted
Passing a generator to a function¶
When a generator expression is the only argument to a function, you can omit the extra parentheses.
total = sum(n ** 2 for n in range(1, 6))
print(total) # 55
largest = max(len(w) for w in ["apple", "fig", "banana"])
print(largest) # 6
any_negative = any(n < 0 for n in [1, 2, -3, 4])
print(any_negative) # True
This is more memory-efficient than building a list first.
3.5 Generator Functions¶
A generator function uses yield instead of return. Each time the
caller asks for the next value, the function resumes from where it left off.
def count_up(start: int, stop: int):
"""Yield integers from start to stop (inclusive)."""
current = start
while current <= stop:
yield current
current += 1
for n in count_up(1, 5):
print(n, end=" ")
# 1 2 3 4 5
The function body does not run when you call count_up(1, 5). It runs
incrementally each time the caller requests the next value.
How yield works¶
- The caller calls the generator function — Python returns a generator object without running any code.
- The caller calls
next()on the generator (or iterates withfor). - The function runs until it hits
yield, returns that value, and pauses. - The next call to
next()resumes from the line afteryield. - When the function returns (or falls off the end),
StopIterationis raised.
def simple():
print("before first yield")
yield 1
print("before second yield")
yield 2
print("after last yield")
gen = simple()
print(next(gen)) # before first yield → 1
print(next(gen)) # before second yield → 2
# next(gen) # after last yield → StopIteration
Infinite generators¶
Generators can produce an infinite sequence because they never build the whole thing in memory.
def integers_from(n: int):
"""Yield integers starting from n, forever."""
while True:
yield n
n += 1
gen = integers_from(1)
for _ in range(5):
print(next(gen), end=" ")
# 1 2 3 4 5
Generating Fibonacci numbers¶
def fibonacci():
"""Yield Fibonacci numbers indefinitely."""
a, b = 0, 1
while True:
yield a
a, b = b, a + b
fib = fibonacci()
first_ten = [next(fib) for _ in range(10)]
print(first_ten) # [0, 1, 1, 2, 3, 5, 8, 13, 21, 34]
3.6 yield from¶
yield from delegates to another iterable or generator, yielding each of its
values in turn.
def chain(*iterables):
"""Yield all items from each iterable in sequence."""
for iterable in iterables:
yield from iterable
for item in chain([1, 2], [3, 4], [5]):
print(item, end=" ")
# 1 2 3 4 5
Without yield from, you would need an inner loop:
yield from is also useful for recursive generators:
def flatten(nested):
"""Recursively yield all non-list items from a nested structure."""
for item in nested:
if isinstance(item, list):
yield from flatten(item)
else:
yield item
print(list(flatten([1, [2, [3, 4]], [5, 6]])))
# [1, 2, 3, 4, 5, 6]
3.7 Generators vs. Lists¶
| List | Generator | |
|---|---|---|
| Memory | Stores all values | Stores only current state |
| Speed to create | Immediate | Immediate (no work done yet) |
| Iteration | Multiple times | Once only |
| Indexing | Yes (items[2]) |
No |
len() |
Yes | No |
| Best for | Small data, multiple passes | Large/infinite data, single pass |
Use a list when:
- You need to iterate over the data more than once.
- You need random access by index.
- You need len().
- The data is small enough to fit comfortably in memory.
Use a generator when: - The data is large or infinite. - You only need one pass. - You want to pipeline transformations without intermediate lists.
3.8 Useful Built-in Functions for Iterables¶
These functions work with any iterable — lists, generators, sets, dicts, etc.
any() and all()¶
numbers = [2, 4, 6, 7, 8]
print(any(n % 2 != 0 for n in numbers)) # True — at least one odd
print(all(n > 0 for n in numbers)) # True — all positive
print(all(n % 2 == 0 for n in numbers)) # False — not all even
any() short-circuits on the first True. all() short-circuits on the
first False. This makes them efficient with generators.
sum(), min(), max()¶
data = [3, 1, 4, 1, 5, 9, 2, 6]
print(sum(data)) # 31
print(min(data)) # 1
print(max(data)) # 9
# With a key function
words = ["banana", "apple", "fig"]
print(min(words, key=len)) # fig
print(max(words, key=len)) # banana
enumerate() and zip()¶
These were covered in Chapter 8 but are worth repeating here because they return iterators (not lists) and work naturally with comprehensions.
names = ["Alice", "Bob", "Charlie"]
scores = [88, 92, 75]
# Enumerate
indexed = [(i, name) for i, name in enumerate(names, start=1)]
print(indexed) # [(1, 'Alice'), (2, 'Bob'), (3, 'Charlie')]
# Zip
paired = {name: score for name, score in zip(names, scores)}
print(paired) # {'Alice': 88, 'Bob': 92, 'Charlie': 75}
4. Practical Examples¶
4.1 Cleaning a Dataset¶
raw_data = [" Alice ", "BOB", "", " charlie", None, "Diana "]
cleaned = [
name.strip().title()
for name in raw_data
if name and name.strip()
]
print(cleaned) # ['Alice', 'Bob', 'Charlie', 'Diana']
4.2 Building a Lookup Table¶
# Map each character to its ASCII code
ascii_table = {char: ord(char) for char in "abcdefghijklmnopqrstuvwxyz"}
print(ascii_table["a"]) # 97
print(ascii_table["z"]) # 122
4.3 Transposing a Matrix¶
matrix = [
[1, 2, 3],
[4, 5, 6],
[7, 8, 9],
]
transposed = [[row[i] for row in matrix] for i in range(len(matrix[0]))]
for row in transposed:
print(row)
# [1, 4, 7]
# [2, 5, 8]
# [3, 6, 9]
Using zip(*matrix) is more idiomatic:
4.4 Lazy File Processing¶
Reading a large file line by line with a generator avoids loading the whole file into memory.
def read_non_empty_lines(path: str):
"""Yield non-empty, stripped lines from a file."""
with open(path, encoding="utf-8") as f:
for line in f:
stripped = line.strip()
if stripped:
yield stripped
# Usage:
# for line in read_non_empty_lines("data.txt"):
# process(line)
4.5 Generating a Multiplication Table¶
size = 5
table = {(i, j): i * j for i in range(1, size + 1) for j in range(1, size + 1)}
# Print it
for i in range(1, size + 1):
row = [f"{table[(i, j)]:3}" for j in range(1, size + 1)]
print(" ".join(row))
# 1 2 3 4 5
# 2 4 6 8 10
# 3 6 9 12 15
# 4 8 12 16 20
# 5 10 15 20 25
4.6 Infinite Counter with itertools.islice¶
from itertools import islice
def naturals():
"""Yield 1, 2, 3, ... forever."""
n = 1
while True:
yield n
n += 1
# Take the first 10 natural numbers
first_ten = list(islice(naturals(), 10))
print(first_ten) # [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
# Sum of first 100 squares
total = sum(n ** 2 for n in islice(naturals(), 100))
print(total) # 338350
4.7 Pipeline of Generators¶
Generators can be chained into a pipeline where each stage processes values from the previous one, all lazily.
def read_numbers(data: list[str]):
"""Yield each string from the input."""
yield from data
def parse_ints(strings):
"""Yield integers, skipping non-numeric strings."""
for s in strings:
if s.strip().lstrip("-").isdigit():
yield int(s.strip())
def only_positive(numbers):
"""Yield only positive numbers."""
for n in numbers:
if n > 0:
yield n
raw = ["3", " -1 ", "hello", "7", "0", " 4 ", "bad", "-2"]
pipeline = only_positive(parse_ints(read_numbers(raw)))
print(list(pipeline)) # [3, 7, 4]
Each stage is a generator — no intermediate lists are created.
4.8 Checking Conditions Across a Collection¶
scores = [88, 92, 75, 60, 45, 98]
# Did anyone fail?
print(any(s < 60 for s in scores)) # True
# Did everyone pass?
print(all(s >= 60 for s in scores)) # False
# How many passed?
passed = sum(1 for s in scores if s >= 60)
print(f"{passed}/{len(scores)} passed") # 5/6 passed
# Average of passing scores
passing_scores = [s for s in scores if s >= 60]
avg = sum(passing_scores) / len(passing_scores)
print(f"Average passing score: {avg:.1f}") # Average passing score: 82.6
4.9 Unique Words in Order¶
def unique_words(text: str) -> list[str]:
"""Return unique words in the order they first appear."""
seen: set[str] = set()
return [
word
for word in text.lower().split()
if word not in seen and not seen.add(word)
]
text = "the cat sat on the mat the cat"
print(unique_words(text)) # ['the', 'cat', 'sat', 'on', 'mat']
Note: seen.add(word) returns None (falsy), so not seen.add(word) is
always True — it is used purely for its side effect of adding to the set.
This is a clever trick, but a plain loop is clearer for most readers:
def unique_words(text: str) -> list[str]:
seen: set[str] = set()
result = []
for word in text.lower().split():
if word not in seen:
seen.add(word)
result.append(word)
return result
4.10 Batching an Iterable¶
def batched(iterable, size: int):
"""Yield successive chunks of `size` items from iterable."""
batch = []
for item in iterable:
batch.append(item)
if len(batch) == size:
yield batch
batch = []
if batch:
yield batch
data = list(range(1, 12))
for chunk in batched(data, 3):
print(chunk)
# [1, 2, 3]
# [4, 5, 6]
# [7, 8, 9]
# [10, 11]
Python 3.12 added itertools.batched() for this exact purpose.
5. Common Mistakes¶
5.1 Writing Unreadable Comprehensions¶
Comprehensions should be easy to read at a glance. If you need to squint, use a loop.
# Hard to read — too much going on
result = [f"{k}={v}" for d in data for k, v in d.items() if v is not None and k != "id"]
# Clearer as a loop
result = []
for d in data:
for k, v in d.items():
if v is not None and k != "id":
result.append(f"{k}={v}")
A good rule of thumb: if a comprehension does not fit comfortably on two lines, write a loop.
5.2 Exhausting a Generator Accidentally¶
A generator can only be iterated once. Iterating it a second time yields nothing.
gen = (n ** 2 for n in range(5))
print(list(gen)) # [0, 1, 4, 9, 16]
print(list(gen)) # [] — already exhausted
# Fix: recreate the generator, or use a list if you need multiple passes
squares = [n ** 2 for n in range(5)]
print(squares) # [0, 1, 4, 9, 16]
print(squares) # [0, 1, 4, 9, 16]
5.3 Using a List Comprehension When a Generator Would Do¶
If you only need to iterate once and pass the result to a function like
sum(), any(), or max(), a generator expression is more memory-efficient.
# Builds a full list in memory, then sums it
total = sum([n ** 2 for n in range(1_000_000)])
# Sums lazily — no list created
total = sum(n ** 2 for n in range(1_000_000))
5.4 Confusing the Filter Position¶
The if clause in a comprehension filters which items are included. It is
not the same as a ternary expression in the output.
numbers = [1, 2, 3, 4, 5, 6]
# Filter: only include even numbers
evens = [n for n in numbers if n % 2 == 0]
print(evens) # [2, 4, 6]
# Transform: replace odd numbers with 0
zeroed = [n if n % 2 == 0 else 0 for n in numbers]
print(zeroed) # [0, 2, 0, 4, 0, 6]
The filter if goes at the end. The ternary if/else goes in the expression
at the front.
5.5 Forgetting That yield Makes a Function a Generator¶
Once a function contains yield, calling it returns a generator object — it
does not run the body immediately.
def gen():
print("start")
yield 1
print("end")
g = gen() # nothing printed yet
print(next(g)) # start → 1
print(next(g)) # end → StopIteration
If you expect the function to run immediately, use return instead of
yield.
5.6 Nested Comprehension Loop Order¶
The loop order in a nested comprehension matches the order you would write
nested for loops — outer loop first, inner loop second.
# Outer: rows, Inner: columns
flat = [matrix[r][c] for r in range(3) for c in range(3)]
# Equivalent loops:
flat = []
for r in range(3): # outer
for c in range(3): # inner
flat.append(matrix[r][c])
A common mistake is reversing the order and getting unexpected results.
6. Practice Tasks¶
-
Write a list comprehension that produces all multiples of 3 between 1 and 50 (inclusive).
-
Write a dict comprehension that maps each word in a list to
Trueif it starts with a vowel,Falseotherwise. -
Write a generator function
evens_up_to(n)that yields all even numbers from 0 ton(inclusive). -
Write a generator function
running_total(numbers)that yields the cumulative sum after each item. For example,[1, 2, 3, 4]yields1, 3, 6, 10. -
Use
any()andall()with generator expressions to check whether a list of strings contains any empty string, and whether all strings are lowercase. -
Write a set comprehension that produces the set of all first letters ( lowercase) from a list of words.
-
Write a function
take(n, iterable)that returns the firstnitems from any iterable as a list, without consuming the rest. -
Rewrite the following loop as a list comprehension:
result = []
for sentence in paragraphs:
for word in sentence.split():
if len(word) > 5:
result.append(word.lower())
7. Key Takeaways¶
- List comprehensions build lists concisely:
[expr for x in iterable if cond]. They are clearer than loops for simple transformations and filters. - Set comprehensions
{expr for x in iterable}build sets — duplicates are removed automatically. - Dict comprehensions
{k: v for x in iterable}build dictionaries from any iterable. - Generator expressions
(expr for x in iterable)are lazy — they produce values on demand and use almost no memory. - Generator functions use
yieldto pause and resume execution. They are ideal for large or infinite sequences. - A generator can only be iterated once. Use a list if you need multiple passes.
yield fromdelegates to another iterable, simplifying recursive and chained generators.any(),all(),sum(),min(), andmax()all accept generator expressions directly — no intermediate list needed.- Keep comprehensions readable. If the logic is complex, a plain loop is clearer.
- Prefer generator expressions over list comprehensions when you only need one pass and are passing the result to a consuming function.
Further Reading¶
What's Next¶
Ready to continue? Head to the next chapter: Errors, Exceptions, and Debugging.
→ Chapter 12 — Errors, Exceptions, and Debugging
See also: - Exercise - Solution - Cheatsheet