Note: this note is a rewrite of Parse, Don’t Validate AKA Some C Safety Tips for Python.
“Parse, Don’t Validate” - the Python version¶
If you’ve read the original post on “Parse, Don’t Validate” or the C version, you know the focus is on conceptual correctness. Here, I’ll show how this technique applies to Python—a language sometimes criticized for being “unsafe” due to its dynamic nature, but one that actually possesses powerful tools to enforce structural integrity.
In this blog post you will see how to stop writing “Stringly Typed” code and reduce exploitable errors in Python.
“Parse, Don’t Validate” - the TLDR version¶
The basic idea is this:
- Data Comes Into Your System.
- Your System Processes It.
Your first instinct, when your system receives an email address as input, is to perform validate_email(untrusted_input) and then pass the raw string further into the depths of the system for usage.
The problem is that other code deep within the rest of the system is going to also do some sort of validation on the string they just got. Every single function deep within the bowels of the system will still need to worry if the string is valid.
I’ll bet good money that the processing functions will attempt to validate their input again. Because they’re logically far away from the boundary, they’ll either do it a different way or fail to do it altogether.
So instead of this:
# The "Validate" Anti-Pattern
def process_data(untrusted_input: str):
if not validate_email(untrusted_input):
raise ValueError("Invalid email")
# Rest of system uses `untrusted_input` (which is still just a str)
save_to_db(untrusted_input)
Rather do this instead:
# The "Parse" Pattern
def process_data(untrusted_input: str):
# This might raise an exception, or return None
email_obj = Email.parse(untrusted_input)
# Rest of system uses `email_obj`
# The system knows this is valid by definition of it existing.
save_to_db(email_obj)
This removes any opportunity for errors to creep in within the rest of the system. If you hold an instance of Email, it is valid. Period.
Some conventions for Safety in Python¶
What does this have to do with Python strings? Good question.
Python is strongly typed (you can’t add "1" + 1), but it is dynamically checked. However, with the advent of Type Hints and static analysis tools like Mypy, Pyright, Pyrefly or Ty, we can enforce safety boundaries just like a compiled language.
The problem in many Python codebases is “Primitive Obsession”—passing str everywhere. A str doesn’t tell you if it’s an email, a name, or a SQL query.
If you have a function store_user(email: str, name: str), no runtime check will save you if you accidentally swap the arguments around when calling the function. Both are strings. Both are valid. Your database is now corrupt.
But, you have options—even in Python—by creating specific types (Value Objects).
You parse the input into the correct type once, and then functions which accept that type will produce a static analysis error (red squigglies in your IDE) if you mix things up.
When you create the correct types for data entering the system, you can then do this:
# Python code
try:
email = Email.from_string(untrusted_input)
except ValidationError:
# Handle error at the boundary
return error_response()
# From here on, 'email' is guaranteed to be valid.
In addition to the safety from using custom types, there’s a bigger architectural benefit:
You remove generic
strvalues from your business logic. You force the only occurrences of rawstrvalues to be at the boundary of your system (API endpoints, CLI args), where all input is untrusted anyway!
When your internal functions never accept str parameters, your risk of logic errors drops significantly. By leveraging Type Hints, you can ensure that the system won’t pass static analysis checks even if some heretic decides they want to pass a raw str to a function expecting an Email.
An Actual Example¶
Only the functions on the boundary of the system should parse input. Everything else should accept only type-checkable parameters.
Here is a runnable example. We will use Python’s dataclasses to create distinct types.
First, the “Types” definition (the safe internal world):
# types.py
from dataclasses import dataclass
from typing import Optional
# We use distinct classes. Even though they both wrap a string,
# they are completely different types to the Type Checker.
@dataclass(frozen=True)
class Email:
value: str
@classmethod
def parse(cls, untrusted: Optional[str]) -> "Email":
if not untrusted or "@" not in untrusted:
raise ValueError("Invalid Email Format")
# In a real app, do regex or DNS checks here
return cls(value=untrusted)
@dataclass(frozen=True)
class Name:
value: str
@classmethod
def parse(cls, untrusted: Optional[str]) -> "Name":
if not untrusted or len(untrusted) == 0:
raise ValueError("Name cannot be empty")
return cls(value=untrusted)
And, of course, the business logic and the caller:
# main.py
from types import Email, Name
# ---------------------------------------------------------
# The Internal System (Safe Zone)
# ---------------------------------------------------------
def store_record_old(email: str, name: str) -> None:
"""The old way: vulnerable to argument swapping."""
print(f"Saving {name} with email {email}")
def store_record_new(email: Email, name: Name) -> None:
"""The new way: types ensure correctness."""
print(f"Saving {name.value} with email {email.value}")
# ---------------------------------------------------------
# The Boundary (Danger Zone)
# ---------------------------------------------------------
def rx_untrusted_input(untrusted_name: str, untrusted_email: str) -> bool:
try:
# 1. PARSE (don't validate)
# If these lines succeed, we have guaranteed valid objects.
email = Email.parse(untrusted_email)
name = Name.parse(untrusted_name)
except ValueError as e:
print(f"Input rejected: {e}")
return False
# 2. PROCESS
# WHOOPS - we accidentally specified the parameters in the wrong order!
# Python runtime allows this because both are strings.
# Mypy cannot catch this mistake.
store_record_old(untrusted_name, untrusted_email)
# Same mistake with custom types:
# Mypy/Pyright will flag this line as an ERROR!
# "Argument 1 to 'store_record_new' has incompatible type 'Name'; expected 'Email'"
store_record_new(name, email)
return True
There is now literally no way for any non-boundary code in your system to accidentally use an Email value in place of a Name value without your tools screaming at you.
Let me count the ways…¶
This is a practical way of hardening your system: Parse, Don’t Validate.
In the C version of this post, we discussed destructors and double-frees. In Python, we have Garbage Collection, so we don’t worry about memory leaks. However, we swap memory safety for Mutability Safety.
Notice in the example above I used @dataclass(frozen=True). This makes the object immutable (read-only) after creation.
Why? Because if an Email object cannot be modified, you can pass it to fifty different functions deep in your system, and you are guaranteed that function #49 didn’t accidentally chop off the domain name.
With Parse, Don’t Validate plus Immutability, you will never run into the situation of accidentally swapping parameters, nor will you have “valid” objects becoming “invalid” due to side effects in your code.
Summary: Why Parse, Don’t Validate?¶
By applying Parse, Don’t Validate in Python, you gain three benefits:
Contextual Clarity
- Raw
strtypes are ambiguous.Nameare self-documenting.- You don’t need to read the function body to know what format the data is in—the type signature tells you.
Reduced Attack Surface
- Untrusted input is immediately transformed into safe, structured data.
- Functions deep in the system never deal with unvalidated input. If you have an
@symbol. You don’t need to check again.
Tooling-Enforced Safety
- Accidentally swapping parameters (e.g., passing name instead of email) becomes a static analysis error caught by your editor/CI pipeline before you even run the code.
- It eliminates “Primitive Obsession” bugs.
By leveraging Python’s Type Hints and Data Classes, we eliminate entire classes of bugs while making the code more robust and maintainable. Instead of checking values for correctness repeatedly, we parse them once and let the Type Checker enforce the rest.
TL;DR for Experienced Developers¶
The Problem: “Shotgun Parsing.”
Validating input (e.g., is_valid_email(s)) returns a boolean but leaves the data as a raw str. This forces you to either re-validate deep in the call stack or blind-trust that the caller did their job. It leads to Primitive Obsession, where func(name: str, email: str) is vulnerable to silent argument swapping.
The Solution: Type-Driven Design (Parse, Don’t Validate).
Move validation to the system boundary and transform untrusted input into Value Objects immediately. If the object exists, it is valid by definition.
The Implementation:
1. Define Domain Types: Use frozen=True dataclasses or NewType instead of primitives (str, int).
2. Parse at Boundary: In your API handlers or CLI entry points, convert str -> DomainType. Fail fast if invalid.
3. Type Hint Internals: Internal business logic should never accept str; it should only accept DomainType.
4. Static Analysis: Use MyPy/Pyright/Ty/Pyrefly. They will flag accidental argument swapping or usage of raw strings where domain objects are expected.
Code Diff:
# ❌ BAD: Stringly Typed & Distributed Validation
def process(email: str):
# Optimistic: hopes the caller validated it first.
# Pessimistic: validates it again (DRY violation/perf hit).
if "@" not in email: raise ValueError(...)
save(email)
# ✅ GOOD: Type Safety & Parse Once
@dataclass(frozen=True)
class Email:
value: str
def __post_init__(self):
if "@" not in self.value: raise ValueError(...)
def process(email: Email):
# Guaranteed valid. Impossible to pass a raw string here
# if using Static Analysis (MyPy).
save(email)
Page last modified: 2026-01-26 09:31:10