With the VARIANT type, logs, API events, and streams become first-class citizens: you query ever-changing data without schema migrations, but still get near-columnar performance.
Posts by Data Code 101
Iceberg v3 brings advances for incremental data processing: better support for updates, deletes, and CDC via features like deletion vectors and row-level lineage.
This means more efficient ingestion, faster commits, and more scalable table operations in the open lakehouse.
The real game changer: one copy of data, multiple formats/engines. Iceberg v3 brings Iceberg, Delta, and Parquet closer together, cutting lock-in and avoiding data rewrites in interoperable pipelines.
Until now, many incremental and semi-structured workloads depended on fragile workarounds in data lakes. Iceberg v3 removes those hacks with a design built for modern workloads.
Apache Iceberg v3 just entered Public Preview on Databricks, marking a new era for the open lakehouse and open table formats.
#dataengineering #databricks #iceberg
www.databricks.com/blog/next-er...
These systems have a lot in common, they are all about using object store as the primary or the only state.
The post cover iceberg and Slatedb, and described how they use object store for tracking state of the table for basic operations.
Iceberg & SlateDB
#dataengineering #database
Open table formats (Iceberg, Hudi and Delta) define the rules for organizing set of files in parquet or other formats as efficient analytical tables. SlateDB is an embedded key-value store built on object store.
datapapers.substack.com/p/exploring-...
Stop slow ingestion and high costs. Learn advanced patterns for high-throughput data ingestion using Spark, Delta Lake, and Zero-Trust security. #dataengineering
Data Engineer doesn’t just move data; they engineer the trustworthy, intelligent foundations that power the AI revolution.
#DataEngineering #AIEngineering
pinei.github.io/Data/Fundame...
Agentic Design Patterns
#Agentic #AI
A senior Google engineer dropped a 421-page doc called Agentic Design Patterns.
Every chapter is code-backed and covers the frontier of AI systems:
→ Prompt chaining, routing, memory
→ MCP & multi-agent coordination
→ Guardrails, reasoning, planning
According to Ali Ghodsi, co-founder and CEO of Databricks, Genie Code points the way towards "agent-based data work.
Instead of merely assisting developers in writing code, the agent is said to independently take on complex tasks: building data pipelines, troubleshooting production systems, creating dashboards, and maintaining ongoing systems.
Databricks has introduced Genie Code, an AI agent that is set to fundamentally change the work of data teams.
#AI #DataEngineering
www.databricks.com/blog/introdu...
Basically any screen-based jobs are in trouble.
$3.7T annual wages in high-exposure jobs (7+)
pre-computed as ∑(BLS employment count × BLS median annual wage) over exactly those occupations whose Gemini Flash score is ≥7.
The average exposure score is 5.3. Move the score, move the probability it will get wiped out by AI.
- Software developers 9/10,
- medical transcriptionists are a 10/10.
- Lawyers 8/10
- General Office clerks 9/10
Andrej Karpathy just put out this tool that looks at AI's impact on job.
#AI #Employment #Jobs
He also deleted the original Github repo very quickly.
Basically, he pulled 342 job types from the Bureau of Labor Statistics and had an LLM score each one from 0 to 10 based on AI exposure.
AI-Powered Coding
"If you're still typing for i in range every day, brace yourself: within 24 months, the market will demand your ability to orchestrate fleets of agents — not produce loops."
#AI #DataEngineering
rentry.co/svuxfxis
The future of software engineering
AI is changing software engineering by shifting the focus from writing code to supervising AI agents.
The future requires new tools, practices, and roles that help humans and AI work together effectively.
www.thoughtworks.com/content/dam/...
Power BI connects to the warehouse SQL instance using a gateway or direct connection.
Builds relationships between dimension and fact tables, defines measures like Total Sales, Orders, Avg Order Value, and filters by date/restaurant/region.
Load
Insert/update into warehouse tables, usually with upsert logic for slowly changing data like restaurant or menu details.
Create indexes and possibly summary/aggregate tables to speed up BI queries.
Transform
Data quality: handle nulls, fix invalid values, standardize timestamps and currencies.
Business logic: derive status (completed/cancelled), order duration, delivery time, etc.
Dimensional modeling: create dimensions and facts with surrogate keys.
Extract
Periodic jobs (e.g., stored procedures, scripts, or an external tool) read new/changed rows from the OLTP MySQL database.
Data is loaded into staging tables without heavy logic, often as 1‑to‑1 copies of source tables plus load metadata.
Data warehouse / reporting schema
Dimensional or star‑like tables (e.g., dim_customer, dim_restaurant, dim_date, fact_orders) are built for analytics.
Staging schemas
Raw tables may be copied or materialized into staging tables where basic cleaning, type fixes, and simple joins happen.
Source OLTP DB
Tables like customers, restaurants, menu items, orders, order_items, payments hold raw, highly normalized data optimized for the ordering app, not reporting.
End-to-End Data Engineering Project: Food Order ETL Pipeline using MySQL & Power BI
#dataengineering
This project shows a full ETL/analytics flow for a food‑ordering business, from raw operational data in MySQL to interactive dashboards in Power BI.
Flow of data and where people experience the problems
Image by Matt Arderne (Forbes)
Prompting is temporary.
Structure is permanent.
When your repo is organized this way, Claude stops behaving like a chatbot…
…and starts acting like a project-native engineer.