Advertisement · 728 × 90

Posts by Jamal Hansen

Full post has the queries and two challenges to try yourself.

jamalhansen.com/blog/window-functions-th...

3 days ago 0 0 0 0

Python can do this with Pandas groupby and cumsum. But you have to pull all the data first. Window functions do it in the database, where the data already lives. That matters at scale.

3 days ago 0 0 1 0

Combining CTEs with window functions is where things get useful. Rank orders with ROW_NUMBER(), wrap it in a CTE, then WHERE rn <= 3. You just got the top 3 orders per customer.

3 days ago 0 0 1 0

LAG() and LEAD() let you look at the previous or next row. Want to compare this order to the last one? LAG(amount) OVER (PARTITION BY customer_id ORDER BY order_date). Done.

3 days ago 0 0 1 0

RANK() vs DENSE_RANK(). If two rows tie for 2nd, RANK() goes 1,2,2,4. DENSE_RANK() goes 1,2,2,3. Small difference, big impact depending on what you're building. Know which one you need.

3 days ago 0 0 1 0

SUM(amount) OVER (PARTITION BY customer_id ORDER BY order_date) gives you a running total per customer, row by row. No subqueries. No self-joins. One clean column added to your result.

3 days ago 0 0 1 0

The syntax looks intimidating at first. It's not. OVER() is the key word. PARTITION BY is like GROUP BY but keeps all your rows. ORDER BY controls the order within each group. That's it.

3 days ago 0 0 1 0
GROUP BY collapses your data down to one row per group. But

GROUP BY collapses your data down to one row per group. But

GROUP BY collapses your data down to one row per group. But what if you want every order row AND the running total? Every row AND its rank? GROUP BY can't do that. Window functions can.

3 days ago 0 0 1 0
Advertisement

Excellent point! Of course this compounds if you switch to a left join to find the rows, but don't realize you have a where criteria based on the table you tried to left join :)

1 week ago 0 0 0 0

Full post: three-valued logic, the NOT IN trap, COALESCE, NULLIF, and three practice challenges. Part of the SQL for Python developers series.

https://jamalhansen.com/blog/null-the-value-that-isnt

1 week ago 1 0 0 0

COUNT(*) and COUNT(column) are not the same thing.

COUNT(*) counts every row. COUNT(middle_name) skips NULLs. I don't pass a column to COUNT unless I specifically want to exclude NULLs. Worth knowing before your numbers stop adding up.

1 week ago 0 0 1 0

NULLIF is COALESCE's quieter sibling.

NULLIF(count, 0) returns NULL when count is zero. Combine it with COALESCE and you get division-by-zero protection in one line:

COALESCE(total / NULLIF(count, 0), 0)

1 week ago 0 0 1 0

COALESCE is your first line of defense.

COALESCE(nickname, name, 'Unknown') returns the first non-NULL value left to right. Think of it as a chain of fallbacks. Python's "value if value is not None else default", but for multiple options.

1 week ago 0 0 1 0

The fix is two lines.

Add WHERE headquarters_city IS NOT NULL to your subquery. Or use NOT EXISTS instead, which handles this cleanly by design. I reach for NOT EXISTS now by default.

1 week ago 0 0 1 0

The NOT IN trap will ruin your day.

If your subquery returns even one NULL, NOT IN returns zero rows. Because city NOT IN ('Portland', NULL) secretly asks: is city != NULL? That's NULL. The whole thing collapses.

1 week ago 0 0 1 0

Three-valued logic is real and it matters.

FALSE AND NULL = FALSE (the NULL can't save it)
TRUE OR NULL = TRUE (same reason)
TRUE AND NULL = NULL (now it matters)

WHERE only keeps rows where the result is TRUE. NULL rows get dropped.

1 week ago 0 0 1 0

The mental model that fixes everything: NULL means "I don't know."

You can't compare unknowns. You can only ask IS NULL or IS NOT NULL. The equals operator will not work. Full stop.

1 week ago 1 0 2 0
NULL = NULL is not TRUE in SQL. It's NULL.  So WHERE status

NULL = NULL is not TRUE in SQL. It's NULL. So WHERE status

NULL = NULL is not TRUE in SQL. It's NULL.

So WHERE status != 'active' silently drops NULL rows too. Not because they match 'active'. Because SQL can't know if they don't.

1 week ago 1 0 1 0
Advertisement

COALESCE is your first line of defense.

COALESCE(nickname, name, 'Unknown') returns the first non-NULL value left to right. Think of it as a chain of fallbacks. Python's "value if value is not None else default", but for multiple options.

1 week ago 0 0 0 0

The fix is two lines.

Add WHERE headquarters_city IS NOT NULL to your subquery. Or use NOT EXISTS instead, which handles this cleanly by design. I reach for NOT EXISTS now by default.

1 week ago 0 0 1 0

The NOT IN trap will ruin your day.

If your subquery returns even one NULL, NOT IN returns zero rows. Because city NOT IN ('Portland', NULL) secretly asks: is city != NULL? That's NULL. The whole thing collapses.

1 week ago 0 0 1 0

Three-valued logic is real and it matters.

FALSE AND NULL = FALSE (the NULL can't save it)
TRUE OR NULL = TRUE (same reason)
TRUE AND NULL = NULL (now it matters)

WHERE only keeps rows where the result is TRUE. NULL rows get dropped.

1 week ago 0 0 1 0

The mental model that fixes everything: NULL means "I don't know."

You can't compare unknowns. You can only ask IS NULL or IS NOT NULL. The equals operator will not work. Full stop.

1 week ago 0 0 1 0