#DataIndexing hashtag - Bluesky

@cocoindex.bsky.social

1 year ago

On-premise structured extraction with LLM using Ollama | CocoIndex Learn to use CocoIndex extracting structured data from PDF/Markdown with Ollama's local LLM models. All running on premise without sending data to external APIs.

🚀 Extract Structured Data from PDFs with CocoIndex & Ollama! Convert PDFs into structured data using CocoIndex and Ollama, with LLM-powered extraction that works locally.

📖 Read the full guide 👉 cocoindex.io/blogs/cocoin...

#AI #PDFExtraction #DataIndexing #MachineLearning #CocoIndex #Ollama #LLM

5 0 0 0

CocoIndex

@cocoindex.bsky.social

1 year ago

Customizable Data Indexing Pipelines | CocoIndex Explain what customizable data indexing pipelines are through comparisons and examples.

🚀 Want to supercharge data indexing with custom logic? Check out our latest blog on CocoIndex! Learn how to structure, transform, and query data efficiently with AI-powered workflows.

📖 Read more 👉 cocoindex.io/blogs/data-i...

#AI #DataIndexing #MachineLearning #CocoIndex #GPT4o

4 0 0 0

CocoIndex

@cocoindex.bsky.social

1 year ago

Data Consistency in Indexing Pipelines | CocoIndex Data Consistency in Indexing Pipelines

Data freshness is often overlooked in indexing systems—leading to stale data exposure, compliance risks, AI missteps, and business disruptions. Watch out for your RAG applications where data is exposed to end users. cocoindex.io/blogs/indexi...

#DataIndexing #DataOps #AI #GDPR

8 2 1 0

CocoIndex

@cocoindex.bsky.social

1 year ago

Handling System Updates and CocoIndex Automatic Schema Inference | CocoIndex Explore how CocoIndex handles system updates in indexing flows and our approach to automatic schema inference. Learn about the challenges of managing data and logic evolution, infrastructure setup, an...

When building data indexing systems, one of the key challenges is handling system updates gracefully. These systems maintain state across multi components (Pinecone, PostgreSQL ...) and need to evolve over time.
cocoindex.io/blogs/handle...
#DataIndexing #AI #DataEngineering #RAG
#CocoIndex

7 1 0 0

CocoIndex

@cocoindex.bsky.social

1 year ago

Concurrent updates can be tricky for indexing pipelines—especially when processes stop and restart mid-execution. How to handle partial states, out-of-date versions, or data been updated in the meantime?
cocoindex.io/blogs/indexi...
#DataIndexing #AI #DataEngineering #TechInsights

5 0 0 0

CocoIndex

@cocoindex.bsky.social

1 year ago

What is data indexing? It’s all about transforming raw data into a retrieval-optimized format—while staying true to original source. This derivative nature poses unique challenges/requirements.

Learn more: cocoindex.io/blogs/data-i...
#DataIndexing #AI #DataEngineering #TechInsights

5 0 0 0

Javier Santoyo

@jsantoyo.bsky.social

1 year ago

Source: https://www.biorxiv.org/content/10.1101/2024.11.26.625346v1.full Fig 1. Analysis of Super-k-mer Statistics and Strategy Differences

Brisk: Exact resource-efficient dictionary for k-mers. #Kmers #Genomics #DataIndexing #KmerDictionary @biorxivpreprint.bsky.social
www.biorxiv.org/content/10.1...

1 0 0 0