Full post has the queries and two challenges to try yourself.
jamalhansen.com/blog/window-functions-th...
Posts by Jamal Hansen
Python can do this with Pandas groupby and cumsum. But you have to pull all the data first. Window functions do it in the database, where the data already lives. That matters at scale.
Combining CTEs with window functions is where things get useful. Rank orders with ROW_NUMBER(), wrap it in a CTE, then WHERE rn <= 3. You just got the top 3 orders per customer.
LAG() and LEAD() let you look at the previous or next row. Want to compare this order to the last one? LAG(amount) OVER (PARTITION BY customer_id ORDER BY order_date). Done.
RANK() vs DENSE_RANK(). If two rows tie for 2nd, RANK() goes 1,2,2,4. DENSE_RANK() goes 1,2,2,3. Small difference, big impact depending on what you're building. Know which one you need.
SUM(amount) OVER (PARTITION BY customer_id ORDER BY order_date) gives you a running total per customer, row by row. No subqueries. No self-joins. One clean column added to your result.
The syntax looks intimidating at first. It's not. OVER() is the key word. PARTITION BY is like GROUP BY but keeps all your rows. ORDER BY controls the order within each group. That's it.
GROUP BY collapses your data down to one row per group. But
GROUP BY collapses your data down to one row per group. But what if you want every order row AND the running total? Every row AND its rank? GROUP BY can't do that. Window functions can.
Excellent point! Of course this compounds if you switch to a left join to find the rows, but don't realize you have a where criteria based on the table you tried to left join :)
Full post: three-valued logic, the NOT IN trap, COALESCE, NULLIF, and three practice challenges. Part of the SQL for Python developers series.
https://jamalhansen.com/blog/null-the-value-that-isnt
COUNT(*) and COUNT(column) are not the same thing.
COUNT(*) counts every row. COUNT(middle_name) skips NULLs. I don't pass a column to COUNT unless I specifically want to exclude NULLs. Worth knowing before your numbers stop adding up.
NULLIF is COALESCE's quieter sibling.
NULLIF(count, 0) returns NULL when count is zero. Combine it with COALESCE and you get division-by-zero protection in one line:
COALESCE(total / NULLIF(count, 0), 0)
COALESCE is your first line of defense.
COALESCE(nickname, name, 'Unknown') returns the first non-NULL value left to right. Think of it as a chain of fallbacks. Python's "value if value is not None else default", but for multiple options.
The fix is two lines.
Add WHERE headquarters_city IS NOT NULL to your subquery. Or use NOT EXISTS instead, which handles this cleanly by design. I reach for NOT EXISTS now by default.
The NOT IN trap will ruin your day.
If your subquery returns even one NULL, NOT IN returns zero rows. Because city NOT IN ('Portland', NULL) secretly asks: is city != NULL? That's NULL. The whole thing collapses.
Three-valued logic is real and it matters.
FALSE AND NULL = FALSE (the NULL can't save it)
TRUE OR NULL = TRUE (same reason)
TRUE AND NULL = NULL (now it matters)
WHERE only keeps rows where the result is TRUE. NULL rows get dropped.
The mental model that fixes everything: NULL means "I don't know."
You can't compare unknowns. You can only ask IS NULL or IS NOT NULL. The equals operator will not work. Full stop.
NULL = NULL is not TRUE in SQL. It's NULL. So WHERE status
NULL = NULL is not TRUE in SQL. It's NULL.
So WHERE status != 'active' silently drops NULL rows too. Not because they match 'active'. Because SQL can't know if they don't.
COALESCE is your first line of defense.
COALESCE(nickname, name, 'Unknown') returns the first non-NULL value left to right. Think of it as a chain of fallbacks. Python's "value if value is not None else default", but for multiple options.
The fix is two lines.
Add WHERE headquarters_city IS NOT NULL to your subquery. Or use NOT EXISTS instead, which handles this cleanly by design. I reach for NOT EXISTS now by default.
The NOT IN trap will ruin your day.
If your subquery returns even one NULL, NOT IN returns zero rows. Because city NOT IN ('Portland', NULL) secretly asks: is city != NULL? That's NULL. The whole thing collapses.
Three-valued logic is real and it matters.
FALSE AND NULL = FALSE (the NULL can't save it)
TRUE OR NULL = TRUE (same reason)
TRUE AND NULL = NULL (now it matters)
WHERE only keeps rows where the result is TRUE. NULL rows get dropped.
The mental model that fixes everything: NULL means "I don't know."
You can't compare unknowns. You can only ask IS NULL or IS NOT NULL. The equals operator will not work. Full stop.