I'll be speaking at the upcoming Voxel51 event in Stuttgart this Tuesday!๐
I'll talk about the anatomy of AI agents, with a focus on document agents and building good harnesses for themโ๏ธ
Swing by if you're interested, and check the official page for more details: voxel51.com/events/stut...
Posts by Clelia Astra Bertelli
๐I decided to write a blog post about it, mostly to document the building journey for myself, but also to share my thinking and coding process around sunbears: clelia.dev/blog/2026-0...
Enjoy!โจ
I've been building sunbears, a typescript library for CSV data loading written in Rust๐ฆ
Explore more:
๐ Blog: www.llamaindex.ai/blog/parseb...
๐ป Code: github.com/run-llama/P...
๐ค Dataset: huggingface.co/datasets/ll...
What makes it different?
ParseBench optimizes for semantic correctness, not exact text matching.
That means evaluating whether parsed outputs are actually useful for humans and AI agents making downstream decisions ๐
It includes:
โข 2,000+ human-reviewed enterprise documents
โข 167,000 evaluation rules
โข Coverage across 5 key areas: tables, charts, content faithfulness, semantic formatting, and visual grounding
ParseBench is here!๐
Weโve just released ParseBench, an open benchmark + dataset for evaluating document parsing at scale.
sunbears uses ๐๐ข๐ต๐ข๐๐ณ๐ข๐ฎ๐ฆ as its primary data structure, a columnar format with strict typing, null and NaN filtering, and convenient column-to-array transformations๐
๐ฆ Get started now: ๐ฏ๐ฑ๐ฎ ๐ช๐ฏ๐ด๐ต๐ข๐ญ๐ญ @๐ค๐ญ๐ฆ-๐ฅ๐ฐ๐ฆ๐ด-๐ต๐ฉ๐ช๐ฏ๐จ๐ด/๐ด๐ถ๐ฏ๐ฃ๐ฆ๐ข๐ณ๐ด
I'm building ๐๐๐ป๐ฏ๐ฒ๐ฎ๐ฟ๐, a CSV data loader library for TS written in Rust๐ฆ
In Node, it can read a file with 1.000.000 rows in 0.3s, write the same amount of rows in 0.15s, respectively 4x and 2x faster than the `csv` packageโก
Visually rich documents are especially challenging for agents.
Tables, charts, and images often break traditional document pipelines, making complex reasoning difficult๐
So we teamed up with LanceDB to build a structure-aware PDF QA pipeline๐
Hereโs how it works:
With our eval dataset, the agent got near-perfect scores on most complex QA tasks, showing how a strong parsing foundation and multimodal retrieval can really improve your search๐
- Parse files and take page-level screenshots with LiteParse, the parser we just open sourced at LlamaIndex
- Chunk and embed text, and store everything (text, image bytes, vector data) in a local LanceDB instance
- Expose text and image retrieval tools to a Claude agent, and let it reason on both
How can you improve your agentic search pipeline?
I just wrote a blog post in collab with LanceDB to answer exactly that.
TLDR:
๐ Learn how it works in the blog post: auth0.com/blog/securi...
๐ฆ Get started with LlamaParse: cloud.llamaindex.ai/signup
Thatโs why we teamed up with @auth0byokta.bsky.social to build a real-world demo of a secure document processing and retrieval pipeline, powered by fine-grained authentication so only trusted actors can access specific content.
That starts with powerful document processing building blocks like LlamaParse and LlamaExtract, but great agents also need the right access controls, as they should only see the documents theyโre authorized to use.
At @llamaindex.bsky.social, we're committed to building the most capable document agents.
๐ PS: I'll follow up with a blog post on my experience while creating this library!
For now, sunbears focuses on fast CSV reading, but Iโm planning to expand the library further and keep improving performance over time ๐
โญ Give it a star: github.com/AstraBert/s...
๐ฆ Install with ๐ฏ๐ฑ๐ฎ ๐ช๐ฏ๐ด๐ต๐ข๐ญ๐ญ @๐ค๐ญ๐ฆ-๐ฅ๐ฐ๐ฆ๐ด-๐ต๐ฉ๐ช๐ฏ๐จ๐ด/๐ด๐ถ๐ฏ๐ฃ๐ฆ๐ข๐ณ๐ด
In benchmarks, sunbears can load a CSV with 1 million rows in about 0.4 seconds, making it roughly 3ร faster than ๐ค๐ด๐ท-๐ฑ๐ข๐ณ๐ด๐ฆ, although still about 2ร slower than Polars in Python โ๏ธ
๐๐๐ป๐ฏ๐ฒ๐ฎ๐ฟ๐ converts CSV files into a DataFrame, a tabular data structure with strictly typed columns whose values can be easily extracted as arrays and used with familiar operations like ๐ฎ๐ข๐ฑ and ๐ง๐ช๐ญ๐ต๐ฆ๐ณ๐
I just published a TypeScript library for loading CSV data, with an API inspired by Pandas and @pola.rs, but fully written in Rust ๐ฆ
Our OSS engineer @cle-does-things.bsky.social recently built ๐น๐ถ๐๐ฒ๐๐ฒ๐ฎ๐ฟ๐ฐ๐ต, a fully local document ingestion and retrieval CLI/TUI application powered by LiteParse โก
litesearch demonstrates how developers can assemble a high-performance, local-first pipeline using tools from across the ecosystem:
- Store embeddings in a local @qdrant.bsky.social edge shard (custom-built in Rust and compiled as a native add-on๐ฆ)
- Retrieve from stored files with (optional) path-based filtering and a relevance threshold
The app runs on @bun.sh, so make sure you have it installed๐ฅ
- Parse your unstructured documents with LiteParse, the lightning fast parser that we just open sourced at @llamaindex.bsky.social
- Chunk with @chonkie.bsky.social
- Embed with a local model through @hf.co transformers.js
Hey there ๐ , I built ๐น๐ถ๐๐ฒ๐๐ฒ๐ฎ๐ฟ๐ฐ๐ต, a fully local document ingestion and retrieval CLI and TUI app, powered by LiteParseโก