Exploring Apache Iceberg and SlateDB formats - with a repo link for additional exploration.
datapapers.substack.com/p/exploring-...
Posts by Vignesh Chandramohan
Tests are more comprehensive than spec level validation.
Isn't spec still more deterministic, more token efficient, and theoretically makes coding agents converge faster? Would specs being more approachable for human reviewers, and generated by coding agents change the equation?
Related items:
arxiv.org/abs/2509.00997 - talks about agent's query patterns, and how data systems should adapt.
www.malloydata.dev - another promising query language, the claim is complex queries are expressed in simpler form than SQL, making llms make less mistake.
@jayaprabhakar.bsky.social Interesting read on formal verification.
Short talk on Iceberg use cases in last week's Seattle Iceberg meetup.
youtu.be/F7qpOVVnxek?...
It has three increasingly more verbose levels of description. They are probably trying to optimize the initial set of searches. And with markdown based skills, custom workflows are approachable for a broader audience compared to MCP, and probably safer too.
It is only available in Claude desktop.
If you find yourself in SF next week, @almog.xyz is talking about SlateDB at the SF Systems Meetup on Wednesday!
Tools, prompts, sampling - all these seem to be a result of generalizing how Claude code / research feature was built over time, and extracting ask out of those patterns. Uniformity is the biggest and probably only benefit.
And maybe the goal is not about solutions that spend less tokens?
Intend to follow this along. Done with chapter 1, looking forward to the next one!
2/ This mindset improves productivity and outcome significantly overtime.
1/ Nice read:
medium.com/@xiafan/time...
Software engineer growth in AI era:
* Build composable tools.
* Assume non deterministic outcomes from a group of ai agents.
* Understand how LLM works to the next level, like how you would understand to read a query plan.
Just added sqlync.com to SlateDB's adopters list! They're building a streaming system that speaks MQTT or PostgreSQL across millions of connected users and devices. 🤯
3/ And as an extension, how it handles maintenance operations such as vacuum on iceberg tables dones out of band.
2/ I assume the iceberg writes uses iceberg open source libraries. This would ensure the write part continues to evolve with iceberg advancements.
I don't yet know if this handles compacted topics (which would introduce deletes on iceberg)
1/ Leveraging Remote storage manager and storing Kafka segments as parquet files + iceberg metadata is really good. Avoids having to consume, serialize and manage a separate process.
I wonder if confluent's TableFlow launched about a year back has a similar design. www.confluent.io/blog/introdu...
Love the idea. Could some of these eventually become sub projects, and hosted in the SlateDB organization as a separate repo? Starting projects that have that potential as GitHub issues with a specific tag would make it easy to track.
Insane amount of SlateDB work going on:
- snapshot reads
- split/merge DBs (zero copy)
- deterministic simulation testing
And someone just pushed Python bindings in a PR! 🤯
My Data council talk on SlateDB.
youtu.be/gcTRXZeKbNg?...
Got it. So, if I wanted a view to update, say once an hour incrementally, would I create a "hourly view" that uses now() and join against it?
Clock tick as an input is indeed a way to model it! Would the clock tick table be joined in all views that need this property?
Finally got to read this.
One additional aspect to ivm, is reasoning about the data in the computed. For a lot of use cases, it is often easy to think of a view/table to move in predictable increments (day, hour, 15 minutes etc). This notion is not modeled as a first class concept in many.
SlateDB 0.6.0 is out!
github.com/slatedb/slat...
Highlights include a hybrid cache (using Foyer), a lot of internal cleanup, and more groundwork for transactions.
Oh, and put performance jumped ~80% for write-heavy workloads :)
slatedb.io/performance/...
Today marks SlateDB’s one year anniversary! It’s been a lot of fun. Thanks to @rohanpd.bsky.social @flaneur2024.bsky.social @almog.ai @vigneshc.bsky.social @paulbutler.org Jason Gustafson, David Moravek, and many others for joining the project. 😀
Commonhaus is 1! 🎂
14 projects, solid foundations, and more on the way.
If you believe in light governance, shared care, and thoughtful support for open source, come see what we’re building.
www.commonhaus.org/activity/253...
Yo SF Bay Area #databs crew, want to talk lakehouses at a real Lake House? :)
Next week after Data Council, join the founders of @clickhouse.com, @motherduck.com, @startreedata.bsky.social, and @tobikodata.com to talk real-time databases and next-generation ETL.
www.rilldata.com/events/data-...
SlateDB 0.5.0 is out!
Features:
- Checkpoints
- Clones
- Read only client
- Split/merge database foundation
- TTL filtering on reads
- Last version with breaking byte format changes
By the numbers:
- 62 commits
- 2 new contributors
- 10 total contributors
github.com/slatedb/slat...
DEBS conference hosts a grand challenge every year. This year's challenge is detecting outliers in a stream of images from laser powder bed fusion.
The challenge involves submitting a kubernetes app (constraint: 2 cores 8 gb). Interesting to try if you have the time!
2025.debs.org/call-for-gra...
Great episode!
Towards the end @vanlightly.bsky.social mentions about alloytools.org finding a data model bug.
Never thought of an intersection between data model and formal verification. Do you have more details on this?
Python Folks - which data/workflow engine has the best developer experience for packaging code? We have looked into - Modal, Beam, Airflow, Flyte, AWS Lambda, Prefect, Dagster and Spark. Haven’t seen any approach which is fast, reliable and intuitive.