Raw financial PDFs → structured agent-ready data. We'll build it live.
Register → landing.llamaindex.ai/liteparse
Posts by LlamaIndex
LiteParse hit 4K+ GitHub stars in 3 weeks. ~500 pages in 2 seconds. No GPU. No API keys. 50+ file formats.
Now @LoganMarkewich, our Head of Open Source, will show you how to build with it.
Live workshop — April 28, 9 AM PST: Build a Financial Due Diligence Agent with LiteParse.
📚Learn more about the problem, and how the skills solve it: <blog-link>
🦙 Get started with LlamaParse: cloud.llamaindex.ai/signup?utm_...
That’s why we created LlamaParse and LiteParse Agent Skills, designed to give agents access to a deeper layer of document understanding, enabling more reliable knowledge extraction and automation across complex documents📝
When it comes to PDFs and other unstructured documents, most agents struggle. The tools they rely on often return only raw text, losing critical context like layout, tables, and images❌
Agents like OpenClaw are incredibly powerful, as long as the information they receive is clean and structured🦞
📚 Full breakdown: www.lancedb.com/blog/smart-...
🦙 Learn more about LiteParse: developers.llamaindex.ai/liteparse/?...
In our evaluations, the agent achieved near-perfect scores across most tasks, showing how strong parsing (LiteParse) plus multimodal storage (LanceDB) can significantly improve agentic search pipelines📈
1. LiteParse extracts structured text and captures page screenshots
2. We embed the text with Gemini 2 Embedding
3. Text, vectors, and images are stored in LanceDB
4. A Claude agent retrieves the relevant context and, if text isn’t enough, it falls back to image-based reasoning on the screenshots
Visually rich documents are especially challenging for agents.
Tables, charts, and images often break traditional document pipelines, making complex reasoning difficult📄
So we teamed up with LanceDB to build a structure-aware PDF QA pipeline🚀
Here’s how it works:
Open call to fintech leaders in NYC 🏦 May 13, in-person workshop with @jerryjliu0 on turning complex financial docs into LLM-ready data using agentic OCR. Build real pipelines. Hear from a Top 5 PE firm's production agent.
Make sure to bring your laptops→ luma.com/updli8i6
Try Extract v2 today → cloud.llamaindex.ai
And for those who need a transition period: Extract v1 will remain accessible via the UI under 'Settings → General' for a limited time.
✦ 𝗖𝗼𝗻𝗳𝗶𝗴𝘂𝗿𝗮𝗯𝗹𝗲 𝗱𝗼𝗰𝘂𝗺𝗲𝗻𝘁 𝗽𝗮𝗿𝘀𝗶𝗻𝗴: now you can control how your documents get parsed before extraction, giving you more flexibility and better results end to end.
✦ 𝗣𝗿𝗲-𝘀𝗮𝘃𝗲𝗱 𝗲𝘅𝘁𝗿𝗮𝗰𝘁 𝗰𝗼𝗻𝗳𝗶𝗴𝘂𝗿𝗮𝘁𝗶𝗼𝗻𝘀: load your saved extraction configs directly, so you can skip the setup and get straight to extracting.
✦ 𝗦𝗶𝗺𝗽𝗹𝗶𝗳𝗶𝗲𝗱 𝘁𝗶𝗲𝗿𝘀: we've replaced modes with cleaner, more intuitive tiers. (And stay tuned: agentic plus is coming to Extract too, very soon.)
After the release of Parse v2, Extract is also getting an upgrade — 𝗶𝗻𝘁𝗿𝗼𝗱𝘂𝗰𝗶𝗻𝗴 𝗘𝘅𝘁𝗿𝗮𝗰𝘁 𝘃2! 🎉
We've been reworking the experience from the ground up to make document extraction more powerful and easier to use than ever.
Here's what's new:
90+ leading investors and corporate development leaders. It recognizes the private companies wi th the most potential to shape the future of enterprise technology.
Thank you to Wing Venture Capital and Eric Newcomer, and congratulations to all the companies honored this year.
LlamaIndex is proud to be named to the 2026 Enterprise Tech 30, #3 in the Early Stage category.
The ET30 is an annual list by @Wing_VC and Eric Newcomer, voted on by
We’ve moved to a new office and it’s time to celebrate. Swing by this Thursday to meet our team, grab a bite, and make new friends. Note: Space is limited, so please RSVP early. luma.com/mkh44c7w
• Retrieval: Query stored files with optional path-based filtering and configurable relevance thresholds
• Runtime: @bun.sh for speed and versatility
💻 Check out the repository and try it yourself: github.com/AstraBert/l...
📚 LiteParse docs: developers.llamaindex.ai/liteparse?u...
• Parsing: LiteParse, the fast and accurate document parser we recently open sourced
• Chunking: @chonkie.bsky.social
• Embeddings: A local model via @hf.co transformers.js
• Vector storage: A local @qdrant.bsky.social edge shard (custom-built in Rust and compiled as a native add-on)
Our OSS engineer @cle-does-things.bsky.social recently built 𝗹𝗶𝘁𝗲𝘀𝗲𝗮𝗿𝗰𝗵, a fully local document ingestion and retrieval CLI/TUI application powered by LiteParse ⚡
litesearch demonstrates how developers can assemble a high-performance, local-first pipeline using tools from across the ecosystem:
We show a complete invoice processing example where complex line-item tables get converted to clean JSON with preserved relationships and validated totals - ready for immediate ERP integration.
💼 Real-world applications across financial services, healthcare, and logistics - from invoice processing to lab results
⚡ How LlamaParse handles multi-line rows, merged cells, and borderless tables while maintaining logical consistency
📊 Why table extraction is fundamentally harder than standard text OCR - spatial relationships matter more than character recognition
🔧 The three core phases: detection, structure recognition, and data extraction with validation
Tables in PDFs aren't just text - they're structured data trapped in visual formats. Our new deep dive explains how modern OCR for tables reconstructs spatial relationships, preserves header hierarchies, and ensures data integrity across complex documents.
Transform your document processing with intelligent table extraction that goes beyond basic OCR.