Toulouse, 1 June 2022
psycopg is a PostgreSQL database adapter for the Python programming language. As such, one of its main goals is to automatically adapt PostgreSQL and Python types together when issuing queries and retrieving results. While this conversion applies fine for built-in types, psycopg also exposes a native way to handle domain-specific data types: row factories.
These domain models are typically defined by application developers in the form of data classes in Python, mapped to one or more relations (usually tables) in PostgreSQL. This is often where ORMs come into play by providing an abstract layer to write queries from model classes and map results to model instances. In psycopg 3, row factories provide a simple, yet powerful, way to handle domain-specific models without any extra dependency. In addition, we’ll see how this lightweight feature can be leveraged to write safer applications, through static typing in particular.
What’s a row factory?
Let’s assume we are working with weather data, collecting date reports for
cities into a weather
table:
CREATE TABLE weather (
city varchar(80),
temp_lo int,
temp_hi int,
prcp real,
date date
);
When querying this table with psycopg, we get rows as tuple values, each column having the correct Python type matching PostgreSQL one:
>>> with psycopg.connect() as conn:
... print(conn.execute("SELECT * FROM weather").fetchone())
('San Francisco', 46, 50, 0.25, datetime.date(1994, 11, 27))
The simplest way to get something else is to use built-in row factories, such as dict_row which produces a dictionary mapping result set column names to values:
>>> with psycopg.connect() as conn, conn.cursor(row_factory=psycopg.rows.dict_row) as cur:
... print(cur.execute("SELECT * FROM weather").fetchone())
{'city': 'San Francisco', 'temp_lo': 46, 'temp_hi': 50, 'prcp': 0.25, 'date': datetime.date(1994, 11, 27)}
The row factory is passed upon cursor initialization and the row obtained from fetchone() call is indeed a dictionary.
Now more interestingly, let’s assume we have the following domain-specific model in our Python application:
from dataclasses import dataclass
from datetime import date
@dataclass
class Weather:
city: str
temperature: tuple[int, int]
precipitation: float
date: date
The goal would be to get instances of this Weather
class when executing
queries such as SELECT * FROM weather
. That’s where custom row factories
come into play.
A row factory is typically a function, which processes raw Python data
(adapted by psycopg from SQL values) into some final type, here an instance of
the Weather
class above. We can write such a function as:
def weather_from_row(city, temp_lo, temp_hi, prcp, date):
return Weather(
city=city, temperature=(temp_lo, temp_hi), precipitation=prcp, date=date
)
However, this is not enough because psycopg is missing the information about columns in the SQL result set to use with this function. That’s why a row factory additionally makes use of the cursor being used for the current query. Accordingly, this function must be wrapped into a factory as:
def weather_row_factory(cursor):
# Extract result set column names.
columns = [column.name for column in cursor.description]
def make_row(values):
# Map column names to values
row = dict(zip(columns, values))
return weather_from_row(**row)
return make_row
Next we’d arguably get a better design if these functions were class
methods of our Weather
class above; let’s do it:
@dataclass
class Weather:
city: str
...
@classmethod
def from_row(cls, *, city, temp_lo, temp_hi, prcp, date):
return cls(
city=city, temperature=(temp_lo, temp_hi), precipitation=prcp, date=date
)
@classmethod
def row_factory(cls, cursor):
columns = [column.name for column in cursor.description]
def make_row(values):
row = dict(zip(columns, values))
return cls.from_row(**row)
return make_row
And now that all is in place, let’s see how it works:
>>> with psycopg.connect() as conn:
... with conn.cursor(row_factory=Weather.row_factory) as cur:
... cur.execute("SELECT * FROM weather")
... row = cur.fetchone()
>>> print(row)
Weather(city='San Francisco', temperature=(46, 50), precipitation=0.25, date=datetime.date(1994, 11, 27))
As you can see, the row
variable above is a Weather
instance so we can
manipulate it as such, e.g. accessing attributes:
>>> import statictics
>>> statictics.mean(row.temperature)
48
How is this safer?
In the context of databases, or I/O in general, the code involved working at the I/O boundary — where one transforms data from/to one type system into another — is typically quite “unsafe” and requires special care, such as validation when decoding / encoding.
By using row factories when issuing queries from psycopg as shown in previous section, we perform transformation from “raw” Python types to domain-specific models early. Thus the time spent in this I/O boundary is reduced. Typically, one would take great care of writing correct SQL queries and their corresponding row factories, carefully test them, and the rest of application code would work nicely without being concerned about data coming from the database. This is typically where an ORM also helps.
Another great benefit comes when combining row factories with Python static
typing. Static typing, with type checkers such as mypy,
provides additional safety guarantees without impacting runtime performances.
However, leveraging this at the I/O boundary might be harder because this
typically involves “weak” typing (i.e. str
or dict
types, as opposed to,
e.g. Weather
, considered as a “strong” type). In this context, the row
factory feature of psycopg helps a lot because it brings strong typing into the
I/O boundary. As such, one would typically use model types with strong
validation capabilities (for instance, libraries such as pydantic in the
Python ecosystem).
So let’s get back to our previous example to add type annotations:
from typing import Any, Sequence
@dataclass
class Weather:
city: str
...
@classmethod
def from_row(cls, *, city: str, temp_lo: int, temp_hi: int, prcp: float, date: date) -> Weather:
return cls(
city=city, temperature=(temp_lo, temp_hi), precipitation=prcp, date=date
)
@classmethod
def row_factory(cls, cursor: Cursor[Any]) -> Callable[[Sequence[Any]], Weather]:
columns = [column.name for column in cursor.description]
def make_row(values: Sequence[Any]) -> Weather:
row = dict(zip(columns, values))
return cls.from_row(**row)
return make_row
the following example type-checks fine:
def get_weather_reports(conn: Connection[Any]) -> list[Weather]:
with conn.cursor(row_factory=Weather.row_factory) as cur:
cur.execute("SELECT * FROM weather")
return cur.fetchall()
By passing a typed row factory to the cursor being used to fetch queries,
the value obtained from this cursor’s fetchall() call is inferred
to have the corresponding row type. Namely, here, the cursor has a row factory
returning rows of type Weather
so fetchall()
returns a list[Weather]
value, which matches what get_weather_reports()
declares (and would be
verified by a type checker such as mypy).
This is safer because data comes from the database in the form of strong types
(e.g. the Weather
type), not tuple
(or dict
) of anonymous values (int
,
date
, etc.).
This is safer because, as we’re passing those strongly typed values, we avoid
typical runtime errors (such as IndexError
for tuple
or KeyError
for
dict
) that a type checker cannot verify; on the other hand typical
AttributeError
occurring with strong types can be checked.
For instance, the following IndexError
is impossible to catch until runtime:
>>> with conn.cursor() as cur:
... rows = cur.execute("SELECT * FROM weather").fetchall() # type is list[tuple[Any, ...]]
>>> rows[0][5]
Traceback (most recent call last):
...
IndexError: tuple index out of range
whereas the following AttributeError
gets caught by a type checker (and can
thus be avoided at runtime):
>>> with conn.cursor(row_factory=Weather.row_factory) as cur:
... rows = cur.execute("SELECT * FROM weather").fetchall() # type is list[Weather]
>>> rows[0].prcp
Traceback (most recent call last):
...
AttributeError: 'Weather' object has no attribute 'prcp'
Finally, we get safer by using row factories, a built-in feature of psycopg,
but we also gain expressiveness by avoiding clumsy dict
or so when
querying the database.
How does this work, under the hood?
Now the last part of previous section, demonstrating that types defined in the
row factory propagate through Cursor methods, may seem a little magic. Let’s
then take back the last example and add a few reveal_type()
calls (with results shown as # note:
comments):
def get_weather_reports(conn: Connection[Any]) -> list[Weather]:
with conn.cursor(row_factory=Weather.row_factory) as cur:
reveal_type(cur)
# note: Revealed type is "psycopg.cursor.Cursor[Weather]"
cur.execute("SELECT * FROM weather")
rset = cur.fetchall()
reveal_type(rset)
# note: Revealed type is "builtins.list[Weather]"
return rset
we can see that the cur: Cursor[Weather]
value is parametrized on the “row”
type Weather
, as declared by in Weather.row_factory(cursor:
Cursor[Any]) -> Callable[[Sequence[Any]], Weather]
(the important bit is the
Weather
type declared as the return value of the callable returned by the
row factory). Likewise, rset: list[Weather]
is indeed inferred as a list of
Weather
objects.
To understand how it works, let’s first look at the definition of Connection
in psycopg (simplified):
Row = TypeVar("Row", covariant=True)
RowFactory = Callable[[Sequence[Any]], Row] # simplified
class Connection:
@overload
def cursor(self) -> Cursor[Tuple[Any, ...]]:
...
@overload
def cursor(self, *, row_factory: RowFactory[Row]) -> Cursor[Row]:
...
def cursor(self, *, row_factory: Optional[RowFactory[Any]] = None) -> Cursor[Any]:
# implementation here
Method Connection.cursor() is defined as an overload
depending on the value of the row_factory
parameter and returning a Cursor
object with a specific type variable Row
. In other words, that Row
type
variable is bound from the row factory parameter to the Cursor returned value.
The same Row
type variable is then used to define a generic Cursor
class,
which in turns allows fetch*()
methods to return Row
values (still
simplified):
class Cursor(Generic[Row]):
def fetchone(self) -> Optional[Row]:
...
def fetchall(self) -> List[Row]:
...
So if we consider back our last example:
>>> cur = conn.cursor(row_factory=Weather.row_factory)
>>> cur.execute("SELECT * FROM weather")
>>> rows = cur.fetchall()
cur
is aCursor[Weather]
becauseWeather.row_factory
has typeRowFactory[Weather]
, and,rows
is alist
ofWeather
becausecur
is aCursor[Weather]
.
That’s all for this overview of row factories in psycopg. Next check out
psycopg documentation, especially the page about static typing (which explains how Cursor
is generic on Row
), the main
row factories page as well as the rows API module which provides
nice row factory generator helpers.