Tuesday's Data Debug lightning talks are up on youtube!
Context management for AI agents (Claire Gouze), AI in daily data science workflows (Kasia Rachuta) & building self-improving AI skills (me).
Playlist: www.youtube.com/playlist?lis...
Posts by Dori
Data Debug SF's March Meetup is tomorrow! All 3 lightning talks are about AI this month:
context engineering for analytics agents
integrating AI into daily data science work
building & evaluating AI skills (I'm giving this one)
join us here: luma.com/lo8ogbub
Tomorrow 9am PT: CL is joining Bauplan live to show how AI can safely modify data pipelines without wrecking production.
Branch-level isolation + Recce's review agent catching issues before merge.
Free & online. luma.com/mm3gsalo?tk=...
Every bad join Claude writes becomes a rule in the skills file. Every ignored existing model becomes a convention. The skills get better every time.
The next run will be tighter because of everything I caught on this one.
I let Claude Code build dbt models from raw production data. It made certain tables incremental without being told. Smart inference from the data pattern.
It also silently dropped rows on edge cases via inner joins. The decisions that matter are still yours.
Happy hour tonight, Nashville tomorrow. CL is speaking at DataTune on Saturday. He did 288 benchmark trials on building data agents that don't break your pipeline. I'll be at the Recce booth. Come find us!
"We are in the age of unstructured data and people are not using it enough."
Not everything you need is in Snowflake. Bryan built a dataset spanning PDFs, logs, & structured data to prove it.
From our Data Renegades lightning round with Bryan Bischof: youtu.be/-s6Xh5sCdLs
I've never thought about architecture or the why behind things so upfront as i have when trying to get an llm to do work i want it to
Please do! And let me know what you think! It's incredible to hear how others use Recce.
Also, anything you've learned from the MCP configs & context set-up? I've been trying to do diff context mgmt techniques so I don't have to "redo" anything between sessions
any skills or lessons learned from fixing the infra that you took forward? I've been working on trying to get any bumps I hit coded up such that Claude doesn't do it the next time.
New Data Renegades is up. CL & I talked with Wes McKinney about building pandas, radical accountability for software, & why data infrastructure might be the last AI-resistant frontier.
His book changed my career. This one was personal. Wherever you get your podcasts.
Wes McKinney told CL & me he's not sure anyone will read technical books in a year or two. This from the person whose book changed my career.
New Data Renegades tomorrow. This one covers a lot of ground.
New blog post. How I went from four separate Claude chats with manually pasted prompts to persistent skills that improve every session. Covers the podcast workflow, the dbt side, & why business context matters more than conventions.
doriwilson.com/blog/your-ai...
Listen to the full Data Renegades episode with Bryan Bischof wherever you get your favorite podcasts.
Bryan's worst production bug: too many backpacks.
Stitch Fix recommender gave someone three backpacks. System built to prevent duplicates. They lived in a weird part of latent space, close to everything.
Same bug bit him twice.
I spent more time building skills & MCP configs for Claude Code than watching it generate dbt models. The setup is the work, not the prompt.
AI-assisted analytics engineering is an infrastructure problem.
Wrote about it on the Recce blog. blog.reccehq.com/i-let-claude...
Claude Code filtered out rows with missing org_ids instead of flagging a potential production bug.
An AI made a data quality decision that should have been a human decision. And it didn't flag it. It just handled it.
Read about it here: blog.reccehq.com/i-let-claude...
"Saying you are wrong is not curious. Saying why are your priors different than what the data is showing is curious."
Bryan on how to work with people who resist uncomfortable data.
Listen to more on Data Renegades with Bryan Bischof.
youtu.be/-s6Xh5sCdLs
"What was GTM engineer before Clay decided to make a name for it? Well, that was a data engineer."
Same work. New labels.
From our Data Renegades chat with Bryan Bischof.
Listen to the full episode here: youtu.be/-s6Xh5sCdLs
"BigQuery UI feels like someone designed it to punish me. Snowflake was like, that's cute. Hold our database query."
Anyone who's used these UIs felt this in their soul.
More on Data Renegades with Bryan Bischof.
youtu.be/-s6Xh5sCdLs
Next Data Debug SF is Tues 3/24, some speaker slots are still open! DM me if you're interested in speaking. Otherwise you can RSVP here: luma.com/lo8ogbub
Enterprise AI POCs failed their year-end reviews. The fix everyone landed on: context. A context graph is a knowledge graph subset optimized for AI. The hard part: AI-generated content becomes the new context. The loop closes & now you're dealing with drift. Watch: youtu.be/cgPw4SSl4Ew
Duck Lake stores lakehouse metadata in a relational database instead of scattered metadata files. That's the whole design. Building a native implementation against the spec meant reading DuckDB's source code because the docs & the code didn't agree. Watch here: youtu.be/VtvjyMKYPEA
Good one at Data Debug SF this week. Three talks: DuckLake without DuckDB, the builder stack for open source tooling, & what a context graph actually is. Same through-line from 3 directions: building on moving targets. Summary 🧵
Ask the sycophantic data scientist ready for a promotion how growth looks. They'll tell you it's great.
Ask an AI agent the same question. It'll tell you what you want to hear.
Trustworthy answers are a different problem.
From our Data Renegades chat with Bryan Bischof.
"What's the biggest predictive feature for coffee recommendations?"
"Your favorite salad dressing."
Bryan built Blue Bottle's coffee recommender. The coffee team's domain knowledge made the model work, not his ML intuition.
Listen to the full ep:
Happy Valentine's Day, data folks. This week I ran the Data Valentine Challenge: 5 companies, 5 data problems. Three things kept surfacing: breaks happen in the handoffs between tools, "just in case" infrastructure is the enemy & AI works when constraints are tight
blog.reccehq.com/data-valenti...
last day of data valentines tomorrow: bauplan's "Let AI Build Your Pipelines Without Breaking Your Heart (or Production)"
register for tomorrow here: riverside.com/webinar/regi...
they deleted everything without a downstream dependency & regenerated dbt docs. lineage went from chaos to clean. "the best code is the code you don't write. or in this case, the code you delete."
full replay here: youtu.be/2snf_AY94-A
data valentines day 4: database tycoon did a dbt makeover show. Chloe pulled up the lineage & immediately: "she's trying her best, but this is giving overwhelmed & overworked." cross joins that nothing uses, orphan dimensions, 7 models off one source with 3 leading nowhere