I just realized: if Airflow and other orchestration engines emit open-lineage data to S3 Files and enable Claude to search them, you've got a data catalog.
Posts by Ananth Packkildurai
So, are we debating the semantic layer again?
docs.getdbt.com/blog...
What do you call a semantic layer from your perspective?
Whether you like or dislike Apache Kafka, its KIPs are among the best learning materials for distributed systems. KIP-848 is an excellent read
cwiki.apache.org/con...
Most data platform failures don’t start with bad infra. They start at the team boundary. My new post argues platforms scale through operating interfaces: contracts, ownership, communication, and adoption design, not tooling alone.
More ETL pipelines will run next year than ever before. And ETL is still dead. Not dead, like nobody uses it. Dead like landlines — they work, but nobody builds their strategy around one.
The data engineer job title is due for an update.
Not because AI is replacing the role, but because AI is finally revealing what the role was always actually about.
Moving data was never the point. Meaning it is.
Read more:
At the end of 2026, we will talk about "AI Fan Effect" [en.wikipedia.org/wik...] and the invention of a new field: Psychology for AI. Perhaps, I feel this is the future of software engineering.
As we move from dashboards to autonomous agents, something breaks.
Systems of record capture what happened, not why.
Why data platforms need Truth Registries + Context Graphs for the agentic era 👇
www.dataengineeringw...
#DataEngineering #AgenticAI #Graphs #LLMs
Data Engineering Weekly's 254th edition is out. Context Graph is the new talk of the town!!
The companies that build the most boring data stack often win the market!!!
Prove me wrong.
Data Contract: There was no shortage of activity around the topic. Definitions were proposed and refined. Conceptual boundaries were drawn and redrawn.
I pen down a reflection of the Data Contracts here
www.dataengineeringweekly.com/p/data-contr...
How to build a scalable shopping agent?
Here's a wild thought:
What if—and hear me out—we let humans click that Buy Now button? Just throwing ideas out there.
This week, it is mostly about Multi-Agent Architecture. Do you think the data infrastructure is ready for a multi-agent architecture? Where is the gap?
Is semantic Spec Good enough to run an enterprise system? I listed challenges to adopting the Iceberg Rest Catalog
Continuing our yearly tradition of Year in Review Data Engineering Weekly, we published the 2025 Year in Review. What do you think is the most notable trend of 2025?
www.dataengineeringw...
Look at the tech stack IBM now controls:
🐧 Compute: Red Hat (Linux/OpenShift)
☁️ IaC: HashiCorp (Terraform)
💰 FinOps: Kubecost
🌊 Streaming: Confluent (Kafka)
🧠 Vector/AI: DataStax (Cassandra)
⚡ Query Engine: Ahana (Presto)
🔄 Ingest: StreamSets
LinkedIn moves FishDB to Rust, DoorDash builds AI swarms, and Dropbox masters context engineering. 🤯 Data Engineering Weekly #247 is packed with system design deep dives from the best engineering teams.
If the Data Catalog is the answer for AI, the question was wrong.
We stopped asking if data was useful because storage got cheap. Now, "Dark Data" is actively poisoning your AI context windows with hallucination vectors.
Read about the Data Sustainability index
The open source companies built their success on top of open-source platforms, benefited from community contributions and adoption, but now must abandon open-source principles to survive commercially.
🚀 The 244th edition of Data Engineering Weekly dives into:
AI agents as execution engines, LLM inference economics, databases for AI, personalization, and product evidence.
Read more 👉 www.dataengineeringw...
#DataEngineering #AI #LLMs
Cricket has been India’s greatest force in overcoming centuries of colonial suppression. Today’s Women’s World Cup win echoes the spirit of 1983 — a triumph that will inspire generations to come. 🇮🇳🏆
This is the most personal essay that I have written in Data Engineering Weekly. I shared a few key moments in my life and how fortunate I was to meet mentors along my professional journey, which shaped my career.
🚀 Data Vault vs. Dimensional Modeling vs. Medallion Architecture — When viewed through a modern enterprise data lens, these techniques interlock.
I break down how in Part 2 of my “Revisiting the Medallion Architecture” series.
Fivetran and dbt form a strong foundation for modern data infrastructure, known for bringing simplicity to complex engineering workflows. That said, calling it “open” data infrastructure feels like a stretch.
Should we update the definition of an "Analytical Engineer"?
As a data engineer, you can't treat zero-party (consent) and third-party (inferred) data the same way. This distinction is critical for building systems that are scalable, private, and trustworthy.
Here’s my guide:
Could be. Composable CDP has not gained significant market share, as identity resolution is a key component that is often proprietary.