Kostas Pardalis (@cpard) Bsky

That’s a different concept though, right? As you said, here you have a proxy and you pick a different query engine over the same storage. I think using the term federation in this case will confuse people. I can see how this pattern can work.

6 months ago 2 0 0 0

Full scans on different data sources that then need to be joined and a much closer to ETL workload. This will kill every federated query engine.

Plus what do you do when you have different semantic between different query engines? Let’s say how you handle decimal overflows.

6 months ago 3 0 0 0

Oh no. Trino tried tried to do that. You really can’t do it. The problem with federated queries is that they work well when you can push computation down to the query engine you federate at and get out a highly reduced dataset. That’s not the case with ETL though.

6 months ago 3 0 2 0

fenic 0.4.0 brings fenic and its expressive API for working with data, to agents.

With tooling becoming a catalog artifact, MCP servers and toolsets being available with just a cli command you can turn any data set you have into well curated context for your agents.

check it out!

7 months ago 0 0 0 0

New episode: chatting with bauplan founders Jacopo Tagliabue and Ciro Greco on shipping AI with real-world data constraints.

Why listen

1. Data pipelines determine model effectiveness, far more than most teams admit.

7 months ago 4 3 1 0

8 months ago 2 0 0 0

https://github.com/typedef-ai/fenic

7/7

Give it a try, ⭐ the repo, open issues and join the community!

👉 t.co/zDj8rBO5Ce

8 months ago 1 0 0 0

6/7

Performance & DX

Rust optimizations plus leaner default configs deliver performance gains and a frictionless setup experience.

so you spend less time tuning and more time building.

8 months ago 0 0 1 0

5/7

New Functions & Models

Access built-in summarization, new semantic APIs, and multiple embedding providers (e.g. Cohere, Google Gemini) out of the box.

This broadens your toolkit, so you can prototype and productionize a wider range of AI workflows quickly.

8 months ago 0 0 1 0

4/7

Composable Pipelines

Save intermediate DataFrames as persistent views in the fenic catalog.

Reuse and chain complex transformations across jobs without rewriting or rerunning upstream logic, accelerating iteration and collaboration.

8 months ago 0 0 1 0

3/7

Typed Semantics

Define your output schema once with Pydantic and get back validated, strongly typed results.

This enforces consistency, surfaces errors early, and eliminates manual parsing of LLM responses.

8 months ago 1 0 1 0

2/7

Robust Fuzzy Text Matching

Ground LLM outputs against your existing data: record linkage, deduplication, and typo-tolerant joins become first-class operations.

This improves precision in extraction pipelines and slashes downstream error rates.

8 months ago 0 0 1 0

Here's a bit more information on each of the new 🦊 fenic 🦊 features.

1/7 🧵

Dynamic Templating

Turn any column struct or array into a live prompt fragment. No more string concatenation hacks. You get per row, data driven prompts with minimal code, boosting relevance and reducing boilerplate.

8 months ago 0 0 1 0

GitHub - typedef-ai/fenic: Build reliable AI and agentic applications with DataFrames Build reliable AI and agentic applications with DataFrames - typedef-ai/fenic

check the repo for more information and give it a try!

github.com/typedef-ai/f...

8 months ago 1 1 0 0

Using Jinja templates to dynamically create prompts for semantic filtering in fenic.

fenic v0.3.0 is out and it's a release I'm really excited about!

Here are a few of the things that this release is introducing.

Jinja as a column function
Robust Fuzzy Text Matching
Full Pydantic support in all semantic operators
Persistent views
More Functions & Models
Perf & DX improvements

8 months ago 4 1 1 0

@steveklabnik.com Joined us on an episode where we discussed about

Why:
• Cargo & friendly errors > benchmarks
• 6-week releases > years-long committees
• How Rust united Ruby, FP & C++ devs
• Next-gen picks

and many more!

Check the episode on your favorite platform!

8 months ago 4 1 0 0

Everyone’s heads down on AI these days, but please take a break and soak in some deep systems wisdom from Josh Howards.

He’s one of the folks behind R2 at Cloudflare.

After all, whatever you build in AI will sit on top of these foundations.

check @totrrocks.bsky.social for the episode link.

10 months ago 2 0 0 0

Startups and new products increasingly prioritize serverless models to reduce user friction and accelerate adoption.

@philippemnoel.bsky.social from ep.12

11 months ago 2 1 0 0

The value proposition of formal methods becomes clear when dealing with complex distributed transactions involving multiple independent services.

Jayaprabhakar(JP) Kadarkarai from ep.5

11 months ago 1 1 0 0

User experience and developer interaction with complex data abstractions remain a significant challenge beyond the technical integration.

Nikhil Simha & Varant Zanoyan from ep.2

11 months ago 2 1 0 0

Successful AI developer tools must balance synchronous co-pilot style assistance with asynchronous autonomous agent workflows.

@ivanburazin.bsky.social from ep.9

11 months ago 2 1 0 0

Managing AI access and permissions requires careful role-based controls to prevent over-privileged AI actions in enterprise environments.

Well said, even before hashtag#MCP was as popular as today.
@ivanburazin.bsky.social from ep.9

11 months ago 1 1 0 0

I had the rare opportunity to sit down and chat with someone who helped shape that story of Splunk, co-founder Erik Swan.

There's a lot to learn from him but what inspired me the most is his energy. Even after a success like Splunk, still learning and building

listen here @totrrocks.bsky.social

11 months ago 1 0 0 0

Incremental materialization has stumped the industry for decades.

Epsio led by Gilad , is changing that: product-first, real-world incremental views.

If real-time data infra matters to you, check out my chat with Gilad on @totrrocks.bsky.social

11 months ago 5 0 0 0

Streaming Democratized: Ease Across the Latency Spectrum with Delayed View Semantics and Snowflake Dynamic Tables Streaming data pipelines remain challenging and expensive to build and maintain, despite significant advancements in stronger consistency, event time semantics, and SQL support over the last decade. P...

Just came across this! "transaction isolation in the presence of IVM remains underspecified." I was literally talking about this with @frankmcsherry.bsky.social 6 hours ago.

1 year ago 7 1 1 0

You should definitely check the project!

1 year ago 0 0 0 0

Lakekeeper is an open source data catalog built on the Apache Iceberg REST catalog API.

If data infrastructure drives you, check out the project and catch Viktor Kessler's insights on the latest @TotrRocks episode!

1 year ago 1 0 1 0

Tech on the Rocks | Reinventing Stream Processing: From LinkedIn to Responsive with Apurva Mehta SummaryIn this episode, Apurva Mehta, co-founder and CEO of Responsive, recounts his extensive journey in stream processing—from his early work at LinkedIn and Confluent to his current venture at R...

I'm always excited to chat with @apurvamehta.com about what @responsive.dev is building.

Streaming and real time as terms are being constantly reinvented as the market needs change rapidly, and Apurva is one of the best to talk about that.

Check the conversation here: techontherocks.show/15

1 year ago 5 2 0 0

Peninsula Data Happy Hour · Luma 🔥 An Unmissable Evening of Data & Magic! 🔥 🎉 Back by popular demand, it's time for the March edition of our Peninsula Data Happy Hour! This time we've got…

We'll be hosting another event at our offices in San Mateo. We want to bring together people who are interested in data and infra, from systems engineers who build data platforms, AI engineers, VCs and everything in between.

Connect and have fun while we learn from each other.

lu.ma/2hc1qm1v

1 year ago 1 0 0 0

Tech on the Rocks | Semantic Layers: The Missing Link Between AI and Data with David Jayatillake from Cube In this episode, we chat with David Jayatillake, VP of AI at Cube, about semantic layers and their crucial role in making AI work reliably with data. We explore how semantic layers act as a bridge ...

New episode: “Semantic Layers: The Missing Link Between AI and Data” with @jayatillake.bsky.social .

We discuss how semantic layers bridge raw data and AI, achieving 100% accuracy for natural language queries, and what’s next for LLM-powered data pipelines.

🎧 https://techontherocks.show/14 🎧

1 year ago 6 4 1 1

Posts by Kostas Pardalis