LlamaIndex (@llamaindex) Bsky

Build a Financial Research Agent with LiteParse

Raw financial PDFs → structured agent-ready data. We'll build it live.

Register → landing.llamaindex.ai/liteparse

1 week ago 3 0 0 0

LiteParse hit 4K+ GitHub stars in 3 weeks. ~500 pages in 2 seconds. No GPU. No API keys. 50+ file formats.

Now @LoganMarkewich, our Head of Open Source, will show you how to build with it.

Live workshop — April 28, 9 AM PST: Build a Financial Due Diligence Agent with LiteParse.

1 week ago 4 0 2 0

📚Learn more about the problem, and how the skills solve it: <blog-link>
🦙 Get started with LlamaParse: cloud.llamaindex.ai/signup?utm_...

1 week ago 0 0 0 0

That’s why we created LlamaParse and LiteParse Agent Skills, designed to give agents access to a deeper layer of document understanding, enabling more reliable knowledge extraction and automation across complex documents📝

1 week ago 0 0 1 0

When it comes to PDFs and other unstructured documents, most agents struggle. The tools they rely on often return only raw text, losing critical context like layout, tables, and images❌

1 week ago 0 0 1 0

Agents like OpenClaw are incredibly powerful, as long as the information they receive is clean and structured🦞

1 week ago 2 0 2 0

What is LiteParse? Fast, local PDF parsing with spatial text parsing, OCR, and bounding boxes.

📚 Full breakdown: www.lancedb.com/blog/smart-...
🦙 Learn more about LiteParse: developers.llamaindex.ai/liteparse/?...

1 week ago 1 0 0 0

In our evaluations, the agent achieved near-perfect scores across most tasks, showing how strong parsing (LiteParse) plus multimodal storage (LanceDB) can significantly improve agentic search pipelines📈

1 week ago 0 0 1 0

1. LiteParse extracts structured text and captures page screenshots
2. We embed the text with Gemini 2 Embedding
3. Text, vectors, and images are stored in LanceDB
4. A Claude agent retrieves the relevant context and, if text isn’t enough, it falls back to image-based reasoning on the screenshots

1 week ago 0 0 1 0

Visually rich documents are especially challenging for agents.
Tables, charts, and images often break traditional document pipelines, making complex reasoning difficult📄

So we teamed up with LanceDB to build a structure-aware PDF QA pipeline🚀

Here’s how it works:

1 week ago 1 1 2 0

Turn Complex Financial Docs into LLM-ready Data with Jerry Liu and LlamaIndex · Luma A hands-on workshop for engineers building VLM-powered OCR that works on real-world financial documents. Most document pipelines fail quietly. They work on a…

Open call to fintech leaders in NYC 🏦 May 13, in-person workshop with @jerryjliu0 on turning complex financial docs into LLM-ready data using agentic OCR. Build real pipelines. Hear from a Top 5 PE firm's production agent.

Make sure to bring your laptops→ luma.com/updli8i6

2 weeks ago 0 0 1 0

Try Extract v2 today → cloud.llamaindex.ai

2 weeks ago 0 0 0 0

And for those who need a transition period: Extract v1 will remain accessible via the UI under 'Settings → General' for a limited time.

2 weeks ago 0 0 1 0

✦ 𝗖𝗼𝗻𝗳𝗶𝗴𝘂𝗿𝗮𝗯𝗹𝗲 𝗱𝗼𝗰𝘂𝗺𝗲𝗻𝘁 𝗽𝗮𝗿𝘀𝗶𝗻𝗴: now you can control how your documents get parsed before extraction, giving you more flexibility and better results end to end.

2 weeks ago 0 0 1 0

✦ 𝗣𝗿𝗲-𝘀𝗮𝘃𝗲𝗱 𝗲𝘅𝘁𝗿𝗮𝗰𝘁 𝗰𝗼𝗻𝗳𝗶𝗴𝘂𝗿𝗮𝘁𝗶𝗼𝗻𝘀: load your saved extraction configs directly, so you can skip the setup and get straight to extracting.

2 weeks ago 0 0 1 0

✦ 𝗦𝗶𝗺𝗽𝗹𝗶𝗳𝗶𝗲𝗱 𝘁𝗶𝗲𝗿𝘀: we've replaced modes with cleaner, more intuitive tiers. (And stay tuned: agentic plus is coming to Extract too, very soon.)

2 weeks ago 0 0 1 0

After the release of Parse v2, Extract is also getting an upgrade — 𝗶𝗻𝘁𝗿𝗼𝗱𝘂𝗰𝗶𝗻𝗴 𝗘𝘅𝘁𝗿𝗮𝗰𝘁 𝘃2! 🎉

We've been reworking the experience from the ground up to make document extraction more powerful and easier to use than ever.

Here's what's new:

2 weeks ago 8 1 2 0

90+ leading investors and corporate development leaders. It recognizes the private companies wi th the most potential to shape the future of enterprise technology.

Thank you to Wing Venture Capital and Eric Newcomer, and congratulations to all the companies honored this year.

3 weeks ago 3 0 0 0

LlamaIndex is proud to be named to the 2026 Enterprise Tech 30, #3 in the Early Stage category.

The ET30 is an annual list by @Wing_VC and Eric Newcomer, voted on by

3 weeks ago 1 0 2 0

Startup Party Up for First Thursday · Luma We’ve moved to the 'AI Waterfront' and it’s time to celebrate. Swing by on April 2nd to see our new office on 2nd street, meet our team, and make new…

We’ve moved to a new office and it’s time to celebrate. Swing by this Thursday to meet our team, grab a bite, and make new friends. Note: Space is limited, so please RSVP early. luma.com/mkh44c7w

3 weeks ago 0 0 1 0

What is LiteParse? Fast, local PDF parsing with spatial text parsing, OCR, and bounding boxes.

• Retrieval: Query stored files with optional path-based filtering and configurable relevance thresholds
• Runtime: @bun.sh for speed and versatility

💻 Check out the repository and try it yourself: github.com/AstraBert/l...
📚 LiteParse docs: developers.llamaindex.ai/liteparse?u...

3 weeks ago 2 0 0 0

• Parsing: LiteParse, the fast and accurate document parser we recently open sourced
• Chunking: @chonkie.bsky.social
• Embeddings: A local model via @hf.co transformers.js
• Vector storage: A local @qdrant.bsky.social edge shard (custom-built in Rust and compiled as a native add-on)

3 weeks ago 1 0 1 0

Our OSS engineer @cle-does-things.bsky.social recently built 𝗹𝗶𝘁𝗲𝘀𝗲𝗮𝗿𝗰𝗵, a fully local document ingestion and retrieval CLI/TUI application powered by LiteParse ⚡

litesearch demonstrates how developers can assemble a high-performance, local-first pipeline using tools from across the ecosystem:

3 weeks ago 4 1 2 0

Talk to us If you have any questions about LlamaIndex please contact us and we will schedule a call as soon as possible.

👉 Signup to LlamaParse to try it out: www.llamaindex.ai/contact?utm...

3 weeks ago 0 0 0 0

OCR for Tables: How to Extract Structured Data from Documents OCR for tables converts complex document layouts into structured, machine-readable data. Learn how LlamaParse preserves table integrity.

Read the complete guide: www.llamaindex.ai/blog/ocr-fo...

3 weeks ago 0 0 1 0

We show a complete invoice processing example where complex line-item tables get converted to clean JSON with preserved relationships and validated totals - ready for immediate ERP integration.

3 weeks ago 0 0 1 0

💼 Real-world applications across financial services, healthcare, and logistics - from invoice processing to lab results
⚡ How LlamaParse handles multi-line rows, merged cells, and borderless tables while maintaining logical consistency

3 weeks ago 0 0 1 0

📊 Why table extraction is fundamentally harder than standard text OCR - spatial relationships matter more than character recognition
🔧 The three core phases: detection, structure recognition, and data extraction with validation

3 weeks ago 0 0 1 0

Tables in PDFs aren't just text - they're structured data trapped in visual formats. Our new deep dive explains how modern OCR for tables reconstructs spatial relationships, preserves header hierarchies, and ensures data integrity across complex documents.

3 weeks ago 0 0 1 0

Transform your document processing with intelligent table extraction that goes beyond basic OCR.

3 weeks ago 0 0 2 0

Posts by LlamaIndex