OMG, I wish. Sad, I will not be able to attend this year. Let's meet up soon. Looking forward to meeting Henk, too.
Remember, only use vector databases when you actually need them. Most of the time, you can probably get away with something like zvec.org/en/
Posts by David Berenstein
π₯ Bespoke curator: Synthetic Data Curation for Post-Training & Structured Data Extraction
Create synthetic data pipelines with easy!
- Retries and caching included
- inference via LiteLLM, vLLM, and popular batch APIs
- asynchronous operations
π URL: buff.ly/ajPRT1l
π₯One > token > at > a > time < a < at < token < One π₯
token-explorer is a simple tool that lets you explore different possible paths that an LLM might sample!
- Arrow keys to navigate, pop and append tokens
- View the token probabilities and entropies.
GitHub: buff.ly/FQgsczM
π½οΈ Letβs dissect the Synthetic Dataset Generator
π¬ Natural language prompt to data
π¦ Ollama ensures secure local LLM inference
βπΌ Argillaβs data curation capabilities complete the workflow
π GitHub: buff.ly/5pX49Xc
π₯ Text2SQL, explore and share any data analysis!
π€ Hugging Face - Dataset Studio is an amazing new feature.
π Start yourself: buff.ly/pjpOKav
π₯ Vicinity: SEVEN semantic search BACK-ENDS, ONE single INTERFACE!
π«Έ New release to push vector search to the Hub and work with any serialisable objects.
π§βπ« KNN, HNSW, USEARCH, ANNOY, PYNNDESCENT, FAISS, and VOYAGER.
π Library:
π₯ NEW cool NO-CODE solution for clicking together AI WEB APPS!
π¨ Gradio released "gradio sketch"
πΌ Really easy way to create web apps with minimal code.
βοΈ Start with `pip install gradio` & `gradio sketch`
π Release: https://buff.ly/41aeLoA
Vector Search - let's keep it clean and lightweight! β‘οΈ
<100K records, no problem!
>100K, some scaling issues
ANN DuckDB index, sub-second response times
Notebook:
π₯ The smolagents module has arrived in the agents course!
π» Code agents optimised for software development
π§ Tool calling agents that create modular, function-driven workflows
π Retrieval agents designed to access and synthesise information
Course: https://buff.ly/4kcj6Ai
π§βπ« Awesome. My talk for PyCon Italy 2025 got accepted!
Got data problems? Relax. Synthetic data is here to help.
Talk: https://buff.ly/3QzoZKj
π³ Announcing docker support to Quickly set up your Synthetic Data Generator with (Gradio + Ollama + Argilla)!
π₯ Build genuinely useful datasets using natural language!
βοΈ Scale however you need.
π Use them privately or share them with the world!
π§βπ» GitHub: https://buff.ly/49IDSmd
With 80K agent builders joining the agents course, it is time to make agents explorable on the Hub!
You can now search and find the perfect agents and tools for your needs!
Powered by @Gradio!
Start searching:
Image Generation has landed in Arena form π¨π€!
1. Describe your desired imageπ¨
2. Two anonymous models output images
3. Vote for the winner!
Images have been sourced from our Open Image Preference dataset!
Dataset: https://buff.ly/4il0du9
Arena: https://buff.ly/4142NwH
Are you, the top of the Agents class?!
We just released a bonus unit on function calling (FC).
You will learn:
β΄ What is FC?
β΅ Thought β Act β Observe Cycle in FC
βΆ lightweight and efficient fine-tuning
Course: https://buff.ly/3Qn1DHB
πΉ In case you've missed the hype around smolagents, here is a presentation I gave yesterday at an MLOps community event!
library: https://buff.ly/4hj6PrJ
slides: https://buff.ly/3WUzZ8D
video:
Slides for my MLOps community talk on smolagents!
Slides: https://buff.ly/3WUzZ8D
π Find banger tools for your smolagents!
I created the Tools gallery, which makes tools specifically developed by/for smolagents searchable and visible. This will help with:
- inspiration
- best practices
- finding cool tools
Space: https://buff.ly/41cYctx
π₯ Come and get those AI agents certificates!
Join the cohort of 66K students: https://buff.ly/4hxb6rK
Documents or images to structured data using Vision Language Models
Outlines has an integration with transformers, which facilitates structured generation based on limiting token sampling probabilities.
Blog: https://buff.ly/4jFHMkr
Local docker deployments for the synthetic data generator π«±πΎβπ«²πΌ
We would love to hear your thoughts!
PR: https://buff.ly/4hRMny6
Curious about "Why π", you may wonder?
smolagents effortlessness combined with the power of 400,000 AI tools available on the Hub!
library: https://buff.ly/4hj6PrJ
WOW, this will rock the world! Hibiki is a model for simultaneous speech2speech translation.
And it actually works.
Available in French-English but super excited to see what the community will do.
Hub: https://buff.ly/3EtmM0f
Paper: https://buff.ly/4jIXNGd
Agentic RAG: Applied, visual, and step-by-step! πΎ
Get familiar with the Agents and tools, not the bells and whistles!
Retrieve - Augment and now GENERATE.
Parts:
1: https://buff.ly/40XNIxM
2: https://buff.ly/40HkB0x
3:
π€― Bring your own AI data, even if you have none!
Describe your dataset for RAG, LLMs or Text Classification
Bring your own context!
Press play and wait
Space: https://buff.ly/3Y1S99z
GitHub: https://buff.ly/49IDSmd
Anyone can create free hosted tools for their AI agents! π₯
Agentic RAG stack part 2 - augment
Augment retrieval results by reranking optimises content without increasing time too much
part2: https://buff.ly/40HkB0x
part1: https://buff.ly/40XNIxM
code: https://buff.ly/4hEajpj
π₯ How to find and install the latest AI apps from the AI app store
1. go to https://buff.ly/42CnUbU
2. search the app you like
3. go to the bottom settings
4. open the URL
5. press the search bar to install
More info: https://buff.ly/3Csqc2J