Data Code 101 (@datacode101) Bsky

With the VARIANT type, logs, API events, and streams become first-class citizens: you query ever-changing data without schema migrations, but still get near-columnar performance.

1 week ago 0 0 0 0

Iceberg v3 brings advances for incremental data processing: better support for updates, deletes, and CDC via features like deletion vectors and row-level lineage.

This means more efficient ingestion, faster commits, and more scalable table operations in the open lakehouse.

1 week ago 0 0 1 0

The real game changer: one copy of data, multiple formats/engines. Iceberg v3 brings Iceberg, Delta, and Parquet closer together, cutting lock-in and avoiding data rewrites in interoperable pipelines.

1 week ago 0 0 1 0

Until now, many incremental and semi-structured workloads depended on fragile workarounds in data lakes. Iceberg v3 removes those hacks with a design built for modern workloads.

1 week ago 0 0 1 0

The Next Era of the Open Lakehouse: Apache Iceberg™ v3 in Public Preview on Databricks Toda

Apache Iceberg v3 just entered Public Preview on Databricks, marking a new era for the open lakehouse and open table formats.
#dataengineering #databricks #iceberg

www.databricks.com/blog/next-er...

1 week ago 0 0 1 0

These systems have a lot in common, they are all about using object store as the primary or the only state.

The post cover iceberg and Slatedb, and described how they use object store for tracking state of the table for basic operations.

1 week ago 0 0 0 0

Exploring Table Formats - Iceberg & SlateDB What is common between Apache Icerberg, SlateDB and object-store first table formats? Explore with real examples.

Iceberg & SlateDB
#dataengineering #database

Open table formats (Iceberg, Hudi and Delta) define the rules for organizing set of files in parquet or other formats as efficient analytical tables. SlateDB is an embedded key-value store built on object store.

datapapers.substack.com/p/exploring-...

1 week ago 0 0 1 0

The Data Bottleneck: Architecting High-Throughput Ingestion for Real-Time Analytics

Stop slow ingestion and high costs. Learn advanced patterns for high-throughput data ingestion using Spark, Delta Lake, and Zero-Trust security. #dataengineering

2 weeks ago 2 1 0 0

Data Engineer Role in 2026 In 2026, data engineers are no longer just “pipeline builders.” They are the strategic architects of intelligent, real-time, and trustworthy data systems that power AI, analytics, and business decisio...

Data Engineer doesn’t just move data; they engineer the trustworthy, intelligent foundations that power the AI revolution.
#DataEngineering #AIEngineering

pinei.github.io/Data/Fundame...

4 weeks ago 1 0 0 0

GitHub - DanieleSalatti/AgenticDesignPatterns: Agentic Design Patterns: A Hands-On Guide to Building Intelligent Systems by Antonio Gulli Agentic Design Patterns: A Hands-On Guide to Building Intelligent Systems by Antonio Gulli - DanieleSalatti/AgenticDesignPatterns

A Hands-On Guide to Building Intelligent Systems by Antonio Gulli
github.com/DanieleSalat...

1 month ago 0 0 0 0

Agentic Design Patterns
#Agentic #AI

A senior Google engineer dropped a 421-page doc called Agentic Design Patterns.

Every chapter is code-backed and covers the frontier of AI systems:

→ Prompt chaining, routing, memory
→ MCP & multi-agent coordination
→ Guardrails, reasoning, planning

1 month ago 1 0 1 0

According to Ali Ghodsi, co-founder and CEO of Databricks, Genie Code points the way towards "agent-based data work.

1 month ago 0 0 0 0

Instead of merely assisting developers in writing code, the agent is said to independently take on complex tasks: building data pipelines, troubleshooting production systems, creating dashboards, and maintaining ongoing systems.

1 month ago 0 0 1 0

Introducing Genie Code We a Genie Code is your AI partner for data, with new agentic capabilities that help data teams analyze, build, and debug autonomously.

Databricks has introduced Genie Code, an AI agent that is set to fundamentally change the work of data teams.
#AI #DataEngineering

www.databricks.com/blog/introdu...

1 month ago 1 0 1 0

Basically any screen-based jobs are in trouble.

$3.7T annual wages in high-exposure jobs (7+)
pre-computed as ∑(BLS employment count × BLS median annual wage) over exactly those occupations whose Gemini Flash score is ≥7.

1 month ago 0 0 0 0

The average exposure score is 5.3. Move the score, move the probability it will get wiped out by AI.

- Software developers 9/10,
- medical transcriptionists are a 10/10.
- Lawyers 8/10
- General Office clerks 9/10

1 month ago 0 0 1 0

Andrej Karpathy just put out this tool that looks at AI's impact on job.
#AI #Employment #Jobs

He also deleted the original Github repo very quickly.

Basically, he pulled 342 job types from the Bureau of Labor Statistics and had an LLM score each one from 0 to 10 based on AI exposure.

1 month ago 0 0 1 0

AI-Powered Coding: From Vibe Coding to Agent Engineering — The Practical Guide to 10× Productivit... "If you're still typing for i in range every day, brace yourself: within 24 months, the market will demand your ability to orchestrate fleets of agents — not produce loops." Introduction — T...

AI-Powered Coding

"If you're still typing for i in range every day, brace yourself: within 24 months, the market will demand your ability to orchestrate fleets of agents — not produce loops."
#AI #DataEngineering

rentry.co/svuxfxis

1 month ago 1 0 0 0

Anthropic Courses Browse all Anthropic courses

Anthropic courses for free
#AI #Agents #Claude
anthropic.skilljar.com

1 month ago 1 0 0 0

The future of software engineering

AI is changing software engineering by shifting the focus from writing code to supervising AI agents.

The future requires new tools, practices, and roles that help humans and AI work together effectively.
www.thoughtworks.com/content/dam/...

1 month ago 1 0 1 0

Power BI connects to the warehouse SQL instance using a gateway or direct connection.

Builds relationships between dimension and fact tables, defines measures like Total Sales, Orders, Avg Order Value, and filters by date/restaurant/region.

1 month ago 0 0 0 0

Load

Insert/update into warehouse tables, usually with upsert logic for slowly changing data like restaurant or menu details.

Create indexes and possibly summary/aggregate tables to speed up BI queries.

1 month ago 0 0 1 0

Transform

Data quality: handle nulls, fix invalid values, standardize timestamps and currencies.

Business logic: derive status (completed/cancelled), order duration, delivery time, etc.

Dimensional modeling: create dimensions and facts with surrogate keys.

1 month ago 0 0 1 0

Extract

Periodic jobs (e.g., stored procedures, scripts, or an external tool) read new/changed rows from the OLTP MySQL database.

Data is loaded into staging tables without heavy logic, often as 1‑to‑1 copies of source tables plus load metadata.

1 month ago 0 0 1 0

Data warehouse / reporting schema

Dimensional or star‑like tables (e.g., dim_customer, dim_restaurant, dim_date, fact_orders) are built for analytics.

1 month ago 0 0 1 0

Staging schemas

Raw tables may be copied or materialized into staging tables where basic cleaning, type fixes, and simple joins happen.

1 month ago 0 0 1 0

Source OLTP DB

Tables like customers, restaurants, menu items, orders, order_items, payments hold raw, highly normalized data optimized for the ordering app, not reporting.

1 month ago 0 0 1 0

End-to-End Data Engineering Project: Food Order ETL Pipeline using MySQL & Power BI
#dataengineering

This project shows a full ETL/analytics flow for a food‑ordering business, from raw operational data in MySQL to interactive dashboards in Power BI.

1 month ago 0 0 1 0

Flow of data and where people experience the problems
Image by Matt Arderne (Forbes)

1 month ago 0 0 0 0

Prompting is temporary.

Structure is permanent.

When your repo is organized this way, Claude stops behaving like a chatbot…

…and starts acting like a project-native engineer.

1 month ago 0 0 0 0

Posts by Data Code 101