Fun conversation with @jkxosound.com about where I think things are going in the AI agents space, Strands Agents, and rats 🐀
Posts by John Kutay
youtu.be/mDFuNhehIOc?...
@clare.dev is one of the leaders making AI engineering simple and scalable at AWS. Had a great time chatting with her as we discussed Strands Agents and patterns like “Retrieval as a Tool”.
Really appreciate the depth at which the Hex team broke down their Text-to-SQL implementation. Everyone's trying to teach LLMs SQL like it's a training problem but it's really a graph traversal problem.
In my post about #DuckDB I digress into the role of the database buffer cache to discuss how we segregate transactional workloads from analytical. DuckDB turned out to be a natural, lightweight approach to offloading analytical queries ensuring our application upheld performance requirements.
Instead of materialized views, we built in-process DuckDB caching in the control plane of Striim Developer — improving query performance 5–10x with zero added infra.
PostgreSQL for OLTP, DuckDB for Operational OLAP. But I won't call it HTAP 🤐
medium.com/striim/beyon...
@marcbrooker.bsky.social breaks down how they've architected a fully ACID-compliant database service that combines simple, serverless management with high availability and massive scale on AWS Aurora DSQL.
youtube.com/shorts/dScUi...
Thanks Marc! Super fun to learn how you combined the best parts of PostgreSQL and your own distributed processing engine.
This was actually my longest podcast ever at over 70 minutes. Not sure I could have made it any shorter because nerding out on databases with Andy Pavlo was too fun.
Was super fun chatting with @andypavlo.bsky.social
to kick off the new season of What's New in Data. We dive into vector databases, text to sql, trends in data infrastructure, and Andy's awesome (and open) database course.
youtube.com/shorts/tjLmx...
A side effect of LLMs: I'm taking on way more than I ever have in my life. I don't know if this is more productive or diluting myself. tbd!
Just found out one of the internal b2b CRUD app vendors is more like CRD because it doesn't support updating submissions. AI gonna cook that sector so hard.
and that’s why I’m working on a Saturday morning 🫠
Your adversaries are taking (not my) Presidents Day off. Time to ship. 🚀
I’ll never forget where I was the day I learned oats could be milked.
Them: Wait so you're saying I don't need to deploy Kafka?
Me: No
Them: Kinesis?
Me: No
Them: Zookeeper? YARN?
Me: No
Them: Will you write every record to disk and replicate it?
Me: No
Unfortunately the bar of complexity for streaming has been set so high. I'm calling it Streamholm Syndrome.
I'm not sure whether to be more amazed at the hate for FiveTran's price increase or the fact that Reddit doesn't know Striim exists and are proposing batch solutions to this persons obvious streaming CDC use case.
www.reddit.com/r/dataengine...
They really gave the smell of rain an epic name: petrichor. They really did that.
If you don't have that type of scale, but simply want a reliable, real-time streaming service, you can use Striim Developer for free 🤘
signup-developer.striim.com
A single Striim cluster (multi-node for scalability and fault tolernace) can handle 35k, very wide, very active databases that produce millions of DML per hour hour and dozens of DDL per day. The 'intelligence' layer or Striim was able to apply rule based logic on how to handle complex DDL.
I will die on this hill but MySQL's 'Alter Table Add Column AFTER' DDL is pointless. It doesn't change the layout on disk. If you care about order of the columns, that's purely a read side construct and you should address it in your query not your DDL!
We’ve shifted embedding generation and transformers left into the streaming layer to support near real-time RAG. Take a read if you want to hear the optimizations we made for change data capture and incremental embedding generation.
www.striim.com/blog/real-ti...
Remind me: 1,000 days.
Driving from SF to LA talking to ChatGPT about Kafka. I think that’s how schizo starts.
Nice screenshot !
I always love seeing clovers in the wild! Knowing every transaction is streamed to Snowflake in real time with Striim’s streaming CDC service. We do a lot of work to ensure transactions are bound reliably and replicated with no duplicates while maintaining low latency.