I am running a workshop at QConSF on Nov 20, in SF.
"Open Source Rag Pipeline With Docling + Data Prep Kit + Milvus + Open LLMs"
You will walk away with working code you can build on.
qconsf.com/training/no...
#QConSF @qconferences.com #Milvus #RAG #DataPrepKit #Docling #NebiusAIStudio
- PDF processing with DPK
Code walkthrough on how to process PDF documents (parse, dedupe, filter out spam)
π₯video: youtu.be/u4OgkmG94fs?...
π» Data prep kit examples : github.com/data-prep-ki...
#dataprepkit
Data Prep Kit videos:
- Data Prep Kit Intro
Introduction and feature walk through (document parsing, exact and fuzzy de-duping, chunking, vectorizing, PII removal, document quality)
π₯ video : www.youtube.com/watch?v=wCbM...
π» Data prep kit examples : github.com/data-prep-ki...
#dataprepkit
Good read: "Mastering Data Cleaning for Fine-Tuning LLMs and RAG Architectures"
thealliance.ai/blog/masteri...
@aialliance.bsky.social @davenielsen.bsky.social
#dataprepkit #RAG #dataprep #finetuning
Check out Data Prep Kit (DPK) β an open-source tool to simplify your data wrangling tasks.
πΊ Intro video: www.youtube.com/watch?v=wCbM...
π GitHub: github.com/data-prep-ki...
#dataprepkit
my upcoming talk: Create High-Quality Datasets by Filtering Out Spam, HAP (Hate, Abuse, Profanity) Speech, and Sensitive Data
ποΈ: Thursday Mar 27, 2025
β°: 9am PST / 12pm EST
Register: lnkd.in/gfRQyzKZ
#dataprepkit #LLM #AIAlliance