Chapter 18: Testing and Code Quality¶
1. Overview¶
Writing code that works once is not enough. You need to know it still works after you change something, after a teammate edits it, or after you come back to it six months later. Tests give you that confidence.
This chapter introduces automated testing with pytest — the standard testing tool in the Python ecosystem — and covers the habits that make code easier to test and maintain. You will also get a brief look at two tools, ruff and mypy, that catch problems before your tests even run.
2. What You Will Learn¶
- Why testing matters and what makes code easy to test
- How to write simple, testable functions
- Installing and running pytest
- Writing test files and test functions
- Using
assertstatements to check results - Running tests from the command line
- Using fixtures for test setup and teardown
- Organizing tests in a
tests/directory - Code quality habits: small functions, meaningful names, no magic numbers
- What ruff does and how to run it
- What mypy does and why type checking helps
3. Core Concepts¶
3.1 Why Testing Matters¶
Every program has bugs. The question is whether you find them before your users do. Automated tests let you:
- Catch regressions — changes that break something that used to work
- Document expected behavior in a concrete, runnable form
- Refactor with confidence, knowing the tests will catch mistakes
- Collaborate more safely, because tests verify that new code does not break existing behavior
A test is just a function that calls your code and checks the result. If the check fails, the test fails and you know something is wrong.
3.2 Writing Testable Code¶
Not all code is equally easy to test. The easiest code to test is a pure function: a function that takes inputs, returns an output, and has no side effects.
A side effect is anything a function does beyond returning a value: writing to a file, printing to the screen, modifying a global variable, making a network request. Side effects are not bad — programs need them — but they make testing harder because you have to set up and clean up the external state.
Hard to test — side effects mixed in:
def process_order(order_id: int) -> None:
# reads from a database, writes to a file, sends an email
order = db.get_order(order_id)
total = order["qty"] * order["price"]
with open("receipts.txt", "a") as f:
f.write(f"Order {order_id}: ${total:.2f}\n")
send_email(order["email"], total)
Easy to test — pure calculation separated out:
Now calculate_total can be tested with a simple call and an assert. The
parts that touch the database, filesystem, and email can be tested separately
or mocked.
Rules of thumb for testable code:
- Keep functions small and focused on one thing
- Separate calculation from I/O
- Avoid global state
- Accept inputs as parameters rather than reading them from the environment
3.3 Installing pytest¶
pytest is not part of the standard library, but it is the de facto standard for Python testing. Install it into your virtual environment:
Verify the installation:
If you are using a pyproject.toml, you can add pytest as a development
dependency:
Then install with:
3.4 Writing Your First Test¶
pytest discovers tests automatically. You just need to follow two naming rules:
- Test files must be named
test_*.pyor*_test.py - Test functions must start with
test_
Here is a simple module and its test file.
math_utils.py — the code being tested:
def add(a: float, b: float) -> float:
return a + b
def is_even(n: int) -> bool:
return n % 2 == 0
def clamp(value: float, low: float, high: float) -> float:
"""Return value clamped to the range [low, high]."""
return max(low, min(value, high))
test_math_utils.py — the tests:
from math_utils import add, clamp, is_even
def test_add_positive_numbers():
assert add(2, 3) == 5
def test_add_negative_numbers():
assert add(-1, -4) == -5
def test_add_floats():
assert add(0.1, 0.2) == pytest.approx(0.3)
def test_is_even_with_even_number():
assert is_even(4) is True
def test_is_even_with_odd_number():
assert is_even(7) is False
def test_is_even_with_zero():
assert is_even(0) is True
def test_clamp_within_range():
assert clamp(5.0, 0.0, 10.0) == 5.0
def test_clamp_below_low():
assert clamp(-3.0, 0.0, 10.0) == 0.0
def test_clamp_above_high():
assert clamp(15.0, 0.0, 10.0) == 10.0
Notice that test_add_floats uses pytest.approx. Floating-point arithmetic
is not exact, so 0.1 + 0.2 is not exactly 0.3. pytest.approx checks
that two values are close enough (within a small tolerance).
3.5 The assert Statement¶
assert is Python's built-in way to check a condition. If the condition is
False, it raises AssertionError.
assert 1 + 1 == 2 # passes silently
assert 1 + 1 == 3 # raises AssertionError
assert "hello".upper() == "HELLO"
assert len([1, 2, 3]) == 3
pytest intercepts AssertionError and produces a detailed failure message
showing the actual and expected values. You do not need a special assertion
library — plain assert is all you need.
You can add an optional message after a comma:
Common assertion patterns:
# Equality
assert result == expected
# Inequality
assert result != unexpected
# Membership
assert "error" in log_output
# Type check
assert isinstance(result, list)
# Boolean
assert is_valid(data)
assert not is_empty(collection)
# Approximate equality for floats
import pytest
assert result == pytest.approx(3.14, rel=1e-3)
# Exception is raised
import pytest
with pytest.raises(ValueError):
parse_date("not-a-date")
3.6 Running Tests¶
Run all tests in the current directory and subdirectories:
Run with verbose output (shows each test name and pass/fail):
Run a specific test file:
Run a specific test function:
Run tests whose names match a keyword:
Stop after the first failure:
Reading pytest output:
A passing run looks like this:
========================= test session starts ==========================
collected 9 items
tests/test_math_utils.py ......... [100%]
========================== 9 passed in 0.12s ===========================
Each . is a passing test. An F means a failure. An E means an error
(an exception was raised outside of a pytest.raises block).
A failing run shows the assertion that failed and the actual values:
FAILED tests/test_math_utils.py::test_add_positive_numbers
AssertionError: assert 4 == 5
+ where 4 = add(2, 2)
3.7 Fixtures: Setup and Teardown¶
A fixture is a function that prepares something your tests need — a
temporary file, a sample dataset, a configured object — and optionally cleans
it up afterward. Fixtures are decorated with @pytest.fixture.
import pytest
@pytest.fixture
def sample_scores() -> list[int]:
return [72, 85, 90, 88, 76, 92, 85, 68, 95, 80]
def test_average(sample_scores):
total = sum(sample_scores)
avg = total / len(sample_scores)
assert avg == pytest.approx(83.1)
def test_max_score(sample_scores):
assert max(sample_scores) == 95
def test_min_score(sample_scores):
assert min(sample_scores) == 68
pytest sees that test_average has a parameter named sample_scores, finds
the fixture with that name, calls it, and passes the result in. Each test gets
a fresh call to the fixture.
Fixtures with cleanup using yield¶
Use yield instead of return when you need to clean up after the test.
Everything before yield is setup; everything after is teardown.
import pytest
from pathlib import Path
@pytest.fixture
def temp_file(tmp_path: Path):
"""Create a temporary file with some content."""
path = tmp_path / "data.txt"
path.write_text("line1\nline2\nline3\n", encoding="utf-8")
yield path
# teardown: pytest's tmp_path fixture handles cleanup automatically,
# but you could delete files or close connections here
def test_line_count(temp_file: Path):
lines = temp_file.read_text(encoding="utf-8").splitlines()
assert len(lines) == 3
def test_first_line(temp_file: Path):
lines = temp_file.read_text(encoding="utf-8").splitlines()
assert lines[0] == "line1"
tmp_path is a built-in pytest fixture that provides a temporary directory
unique to each test. You do not need to define it yourself.
Other useful built-in fixtures¶
| Fixture | What it provides |
|---|---|
tmp_path |
A Path to a temporary directory, cleaned up after the test |
capsys |
Capture stdout and stderr output |
monkeypatch |
Temporarily replace attributes, environment variables, or functions |
caplog |
Capture log records |
3.8 Organizing Tests¶
For small scripts, a single test_*.py file next to your code is fine. For
larger projects, put all tests in a tests/ directory at the project root.
my_project/
my_module.py
utils.py
tests/
__init__.py ← optional, but helps with imports
test_my_module.py
test_utils.py
conftest.py ← shared fixtures live here
conftest.py is a special file that pytest loads automatically. Put
fixtures that are shared across multiple test files in conftest.py — you do
not need to import them; pytest finds them by name.
# tests/conftest.py
import pytest
@pytest.fixture
def admin_user() -> dict:
return {"name": "Admin", "role": "admin", "active": True}
@pytest.fixture
def regular_user() -> dict:
return {"name": "Alice", "role": "user", "active": True}
Any test file in the tests/ directory can use admin_user or regular_user
as a parameter without importing anything.
3.9 Code Quality Habits¶
Tests catch bugs at runtime. Good code habits prevent bugs from being written in the first place and make code easier to read, test, and maintain.
Keep functions small¶
A function that does one thing is easier to name, easier to test, and easier to understand. If a function is getting long, look for a natural split.
# Hard to test — does too many things at once
def process(filepath: str) -> None:
with open(filepath) as f:
lines = f.readlines()
cleaned = [line.strip().lower() for line in lines if line.strip()]
counts = {}
for word in " ".join(cleaned).split():
counts[word] = counts.get(word, 0) + 1
top = sorted(counts.items(), key=lambda x: x[1], reverse=True)[:10]
for word, n in top:
print(f"{word}: {n}")
# Easier to test — each step is a separate function
def read_lines(filepath: str) -> list[str]:
with open(filepath, encoding="utf-8") as f:
return f.readlines()
def clean_lines(lines: list[str]) -> list[str]:
return [line.strip().lower() for line in lines if line.strip()]
def count_words(lines: list[str]) -> dict[str, int]:
counts: dict[str, int] = {}
for word in " ".join(lines).split():
counts[word] = counts.get(word, 0) + 1
return counts
def top_n(counts: dict[str, int], n: int = 10) -> list[tuple[str, int]]:
return sorted(counts.items(), key=lambda x: x[1], reverse=True)[:n]
Now each function can be tested independently with simple inputs.
Use meaningful names¶
Names should say what something is or does. Avoid single-letter variables outside of short loops or mathematical formulas.
# Unclear
def calc(x, y, z):
return x * y * (1 - z)
# Clear
def calculate_discounted_price(
quantity: int, unit_price: float, discount_rate: float
) -> float:
return quantity * unit_price * (1 - discount_rate)
Avoid magic numbers¶
A magic number is a numeric literal with no explanation. Replace them with named constants.
# Magic numbers — what do 0.15 and 100 mean?
def apply_tax(amount: float) -> float:
return amount * 1.15 if amount > 100 else amount
# Named constants — intent is clear
TAX_RATE = 0.15
TAX_THRESHOLD = 100.0
def apply_tax(amount: float) -> float:
if amount > TAX_THRESHOLD:
return amount * (1 + TAX_RATE)
return amount
Named constants also make tests easier to write, because you can reference the constant rather than repeating the literal.
3.10 ruff — Fast Linting¶
A linter reads your code without running it and reports style problems,
potential bugs, and violations of best practices. ruff is a fast Python
linter written in Rust. It replaces several older tools (flake8, isort,
pyupgrade) with a single command.
Install it:
Run it on your project:
Fix auto-fixable issues:
Example output when there are issues:
math_utils.py:3:1: F401 `os` imported but unused
math_utils.py:12:5: E711 comparison to `None` (use `is` or `is not`)
Found 2 errors.
ruff checks things like:
- Unused imports
- Undefined names
- Comparison style (
== Nonevsis None) - Unreachable code
- Import ordering
You do not need to memorize the rules. Run ruff check . and fix what it
reports. Over time the habits become automatic.
3.11 mypy — Static Type Checking¶
mypy reads your type hints and checks that you are using values consistently. It catches type errors before you run the code.
Install it:
Run it:
Example — a function with a type error:
def greet(name: str) -> str:
return "Hello, " + name
result: int = greet("Alice") # type error: str assigned to int variable
print(result.upper())
mypy output:
mypy does not run your code. It only reads it. This means it can catch mistakes in code paths that your tests might not exercise.
You do not need to add type hints everywhere to benefit from mypy. Start by annotating function signatures and let mypy tell you where the types do not add up.
4. Practical Examples¶
4.1 Testing a String Utility Module¶
string_utils.py:
def slugify(text: str) -> str:
"""Convert a string to a URL-friendly slug.
Example: "Hello World!" -> "hello-world"
"""
import re
text = text.lower().strip()
text = re.sub(r"[^\w\s-]", "", text)
text = re.sub(r"[\s_]+", "-", text)
text = re.sub(r"-+", "-", text)
return text.strip("-")
def truncate(text: str, max_length: int, suffix: str = "...") -> str:
"""Truncate text to max_length characters, appending suffix if cut."""
if len(text) <= max_length:
return text
return text[: max_length - len(suffix)] + suffix
def count_vowels(text: str) -> int:
"""Return the number of vowels in text (case-insensitive)."""
return sum(1 for ch in text.lower() if ch in "aeiou")
tests/test_string_utils.py:
import pytest
from string_utils import count_vowels, slugify, truncate
class TestSlugify:
def test_basic(self):
assert slugify("Hello World") == "hello-world"
def test_special_characters(self):
assert slugify("Hello, World!") == "hello-world"
def test_multiple_spaces(self):
assert slugify("too many spaces") == "too-many-spaces"
def test_already_lowercase(self):
assert slugify("already-fine") == "already-fine"
def test_empty_string(self):
assert slugify("") == ""
class TestTruncate:
def test_short_string_unchanged(self):
assert truncate("hello", 10) == "hello"
def test_exact_length_unchanged(self):
assert truncate("hello", 5) == "hello"
def test_long_string_truncated(self):
result = truncate("hello world", 8)
assert result == "hello..."
assert len(result) == 8
def test_custom_suffix(self):
result = truncate("hello world", 7, suffix="…")
assert result.endswith("…")
def test_empty_string(self):
assert truncate("", 5) == ""
class TestCountVowels:
def test_basic(self):
assert count_vowels("hello") == 2
def test_case_insensitive(self):
assert count_vowels("HELLO") == 2
def test_no_vowels(self):
assert count_vowels("rhythm") == 0
def test_all_vowels(self):
assert count_vowels("aeiou") == 5
def test_empty_string(self):
assert count_vowels("") == 0
Grouping related tests into classes keeps the file organized. The class name describes the function being tested; each method tests one specific behavior.
4.2 Testing Exception Behavior¶
Use pytest.raises as a context manager to assert that a function raises a
specific exception.
validators.py:
def parse_age(value: str) -> int:
"""Parse a string as a non-negative integer age."""
try:
age = int(value)
except ValueError:
raise ValueError(f"Age must be a number, got: {value!r}")
if age < 0:
raise ValueError(f"Age cannot be negative, got: {age}")
if age > 150:
raise ValueError(f"Age {age} is unrealistically large")
return age
tests/test_validators.py:
import pytest
from validators import parse_age
def test_valid_age():
assert parse_age("25") == 25
def test_zero_age():
assert parse_age("0") == 0
def test_non_numeric_raises():
with pytest.raises(ValueError, match="must be a number"):
parse_age("abc")
def test_negative_age_raises():
with pytest.raises(ValueError, match="cannot be negative"):
parse_age("-5")
def test_unrealistic_age_raises():
with pytest.raises(ValueError, match="unrealistically large"):
parse_age("200")
The match argument to pytest.raises checks that the exception message
contains the given substring (as a regex pattern). This makes the test more
specific — it verifies not just that an exception was raised, but that it was
the right one.
4.3 Using Fixtures for Shared Setup¶
inventory.py:
class Inventory:
def __init__(self) -> None:
self._items: dict[str, int] = {}
def add(self, item: str, qty: int) -> None:
if qty <= 0:
raise ValueError("Quantity must be positive")
self._items[item] = self._items.get(item, 0) + qty
def remove(self, item: str, qty: int) -> None:
if item not in self._items:
raise KeyError(f"Item not found: {item}")
if self._items[item] < qty:
raise ValueError("Not enough stock")
self._items[item] -= qty
if self._items[item] == 0:
del self._items[item]
def stock(self, item: str) -> int:
return self._items.get(item, 0)
def total_items(self) -> int:
return sum(self._items.values())
tests/test_inventory.py:
import pytest
from inventory import Inventory
@pytest.fixture
def stocked_inventory() -> Inventory:
inv = Inventory()
inv.add("apple", 10)
inv.add("banana", 5)
inv.add("cherry", 20)
return inv
def test_initial_stock(stocked_inventory: Inventory):
assert stocked_inventory.stock("apple") == 10
assert stocked_inventory.stock("banana") == 5
def test_total_items(stocked_inventory: Inventory):
assert stocked_inventory.total_items() == 35
def test_remove_reduces_stock(stocked_inventory: Inventory):
stocked_inventory.remove("apple", 3)
assert stocked_inventory.stock("apple") == 7
def test_remove_all_deletes_item(stocked_inventory: Inventory):
stocked_inventory.remove("banana", 5)
assert stocked_inventory.stock("banana") == 0
def test_remove_nonexistent_raises(stocked_inventory: Inventory):
with pytest.raises(KeyError):
stocked_inventory.remove("mango", 1)
def test_remove_too_many_raises(stocked_inventory: Inventory):
with pytest.raises(ValueError, match="Not enough stock"):
stocked_inventory.remove("cherry", 100)
def test_add_invalid_qty_raises():
inv = Inventory()
with pytest.raises(ValueError, match="must be positive"):
inv.add("apple", 0)
Each test gets a fresh stocked_inventory because the fixture is called
once per test. Tests do not share state, so they cannot interfere with each
other.
4.4 Testing File I/O with tmp_path¶
tmp_path is a built-in pytest fixture that provides a temporary directory
unique to each test. Use it whenever your code reads or writes files.
# file_tools.py
from pathlib import Path
def count_lines(path: Path) -> int:
"""Return the number of non-empty lines in a file."""
return sum(1 for line in path.read_text(encoding="utf-8").splitlines() if line.strip())
# tests/test_file_tools.py
from pathlib import Path
from file_tools import count_lines
def test_count_lines(tmp_path: Path):
p = tmp_path / "sample.txt"
p.write_text("line one\nline two\n\nline four\n", encoding="utf-8")
assert count_lines(p) == 3 # blank line not counted
def test_count_lines_empty_file(tmp_path: Path):
p = tmp_path / "empty.txt"
p.write_text("", encoding="utf-8")
assert count_lines(p) == 0
Each test gets its own temporary directory, so tests do not interfere with each other's files. pytest cleans up the directory after the test run.
5. Common Mistakes¶
5.1 Testing Implementation Instead of Behavior¶
Tests should check what a function does, not how it does it. If you test internal details, the tests break every time you refactor, even when the behavior is unchanged.
# Wrong — tests internal variable name
def test_word_count_internal():
wc = WordCounter("hello world hello")
assert wc._word_dict == {"hello": 2, "world": 1} # fragile
# Correct — tests observable behavior
def test_word_count_behavior():
wc = WordCounter("hello world hello")
assert wc.count("hello") == 2
assert wc.count("world") == 1
assert wc.count("missing") == 0
5.2 Writing Tests That Always Pass¶
A test that never fails is not a test — it is noise. This often happens when the assertion is too weak.
# Wrong — always passes, proves nothing
def test_process():
result = process_data([1, 2, 3])
assert result is not None
# Correct — checks the actual result
def test_process():
result = process_data([1, 2, 3])
assert result == [2, 4, 6]
5.3 One Giant Test Function¶
Putting many unrelated checks in one test makes it hard to tell what failed and why. Write one test per behavior.
# Hard to diagnose — which assertion failed?
def test_everything():
assert add(1, 2) == 3
assert add(-1, 1) == 0
assert is_even(4) is True
assert is_even(3) is False
assert clamp(5, 0, 10) == 5
assert clamp(-1, 0, 10) == 0
# Better — each test has a clear name and single purpose
def test_add_positive():
assert add(1, 2) == 3
def test_add_to_zero():
assert add(-1, 1) == 0
def test_is_even_true():
assert is_even(4) is True
5.4 Not Testing Edge Cases¶
Most bugs live at the edges: empty inputs, zero, negative numbers, the maximum value, a single-element list. Always ask "what happens at the boundary?"
def test_average_normal():
assert average([10, 20, 30]) == 20.0
# Edge cases worth testing:
def test_average_single_element():
assert average([42]) == 42.0
def test_average_empty_raises():
with pytest.raises(ValueError):
average([])
def test_average_negative_numbers():
assert average([-10, -20, -30]) == -20.0
5.5 Forgetting to Import pytest for pytest.approx and pytest.raises¶
pytest.approx and pytest.raises require import pytest. Forgetting this
gives a NameError that looks confusing.
# Wrong — NameError: name 'pytest' is not defined
def test_area():
assert circle_area(1.0) == pytest.approx(3.14159, rel=1e-4)
# Correct
import pytest
def test_area():
assert circle_area(1.0) == pytest.approx(3.14159, rel=1e-4)
5.6 Naming Test Files or Functions Without the test_ Prefix¶
pytest silently skips any file or function that does not follow the naming convention. This is a common source of confusion when tests appear to pass but were never actually run.
# These are NOT discovered by pytest:
check_math.py
tests/math_checks.py
def verify_add(): ...
def should_add(): ...
# These ARE discovered:
test_math.py
tests/test_math.py
def test_add(): ...
If you run pytest -v and see 0 items collected, check your file and
function names.
6. Practice Tasks¶
-
Write a function
celsius_to_fahrenheit(c: float) -> floatthat converts Celsius to Fahrenheit using the formulaF = C * 9/5 + 32. Then write at least five tests covering normal values, zero, negative temperatures, and the boiling point (100°C = 212°F). -
Write a function
is_palindrome(text: str) -> boolthat returnsTrueif the string reads the same forwards and backwards, ignoring case and non-alphanumeric characters. Write tests for: a plain palindrome, a palindrome with punctuation ("A man, a plan, a canal: Panama"), a non-palindrome, an empty string, and a single character. -
Write a function
find_duplicates(items: list) -> listthat returns a sorted list of items that appear more than once. Write tests for: a list with duplicates, a list with no duplicates, an empty list, and a list where all items are the same. -
Create a
tests/conftest.pywith a fixture calledword_listthat returns["apple", "banana", "cherry", "apple", "date", "banana", "apple"]. Write a test file that uses this fixture to test amost_common(words, n)function that returns thenmost frequent words. -
Write a function
safe_divide(a: float, b: float) -> floatthat raisesZeroDivisionErrorwith the message"Cannot divide by zero"whenbis zero. Write tests that verify: normal division, division by a negative number, and that the correct exception and message are raised forb = 0. -
Write a function
read_csv_column(path: Path, column: int) -> list[str]that reads a CSV-like text file (comma-separated, one row per line) and returns all values from the given column index. Usetmp_pathin your tests to create temporary CSV files with known content. -
Install
ruffin your virtual environment and runruff check .on a Python file you have written. Fix any issues it reports. Then run it again to confirm the issues are resolved. -
Add type hints to the functions you wrote in tasks 1–3. Run
mypyon those files and fix any type errors it reports.
7. Key Takeaways¶
- Tests are functions that call your code and use
assertto check the result. If the assertion fails, the test fails. - Pure functions — no side effects, same output for the same input — are the easiest to test. Separate calculation from I/O to make both parts testable.
- pytest discovers tests automatically: files named
test_*.pyand functions namedtest_*. Runpytest -vto see each test by name. - Use
pytest.approxfor floating-point comparisons. Usepytest.raisesto assert that a function raises a specific exception. - Fixtures (
@pytest.fixture) set up shared test data and clean up afterward. Put shared fixtures intests/conftest.pyso all test files can use them. tmp_pathis a built-in fixture that gives each test its own temporary directory — use it whenever your code reads or writes files.- Keep functions small, use meaningful names, and replace magic numbers with named constants. These habits make code easier to read and easier to test.
ruff check .catches style issues and common bugs without running your code. Run it regularly and fix what it reports.mypychecks that your type hints are consistent. It catches type errors before runtime, especially in code paths that tests might not cover.
Further Reading¶
What's Next¶
Ready to continue? Head to the next chapter: Type Hints.
See also: - Exercise - Solution - Cheatsheet