If you want to really assess your RAG system —
you need to go deeper.
Ask:
✅ Is your retriever surfacing the right chunks?
✅ Is your generator actually using them — or just hallucinating? 🤔
Here is how I like to think about my RAG metrics (inspired by RAGAS)
Posts by rajistics.bsky.socia
These are common patterns from Hugging Face and Anthropic
Hugging Face - SmolAgents: huggingface.co/blog/smolage...
Anthropic - Building effective agents: www.anthropic.com/research/bui...
Are you going to chat all day with LLMs? 🤐
Here are the essential agentic workflows. 👇
Dive into Parquet
It's a leading format for data engineering, data science, and machine learning.
youtube.com/shorts/_CnEK...
PyPI Name Squatting
This didn't happen to me recently :)
To learn more:
An Empirical Analysis of the Python Package Index (PyPI) - arxiv.org/pdf/1907.11073
blog.checkpoint.com/securing-the...
blog.orsinium.dev/posts/py/pyp...
My Video:
youtube.com/shorts/H1Uja...
Why do language models think 9.11 is greater than 9.9? 🤔
Mechanistic Interpretability is a useful tool for investigation and fixing the issue.
I am using Transluce's Monitor here:
My video summary: www.youtube.com/shorts/Kuh-i...
Try Monitor: monitor.transluce.org/dashboard
My ranking of the top 26 algorithms for practical data science, breaking down their strengths, quirks, and when (or if) you should use them.
youtube.com/shorts/dt4uX...
Polars verus Pandas
What is the best single node dataframe?
For Polars check out:
github.com/pola-rs/polars
Polars vs. pandas: What’s the Difference?
blog.jetbrains.com/pycharm/2024...
Database-like ops benchmark - duckdblabs.github.io/db-benchmark/
Short Video:
youtube.com/shorts/8DkIR...
The Physics of Language Models
Check out a scientific approach that experiments with model architecture, synthetic datasets, and tasks to understand how language models work.
My short intro: youtu.be/9saXkwHKaLs
Longer Video: ICML 2024 Tutorial by Zeyuan Allen-Zhu - youtu.be/yBL7J0kgldU
Communications here 🙋♂️
Ai2's 700k examples > Meta's 6B examples
the importance of data quality
My video: youtube.com/shorts/-_DGp...
Background:
Hannaneh Hajishirzi - OLMo: Accelerating the Science of Language Modeling (COLM)
www.youtube.com/watch?v=qMTz...
Molmo and PixMo paper -
arxiv.org/pdf/2409.17146
Are you smarter than GPT-3 (you don't have a chance against GPT-4)
Test yourself:
Are you smarter than a language model? -
joel.tools/smarter/
Language modeling game!
rr-lm-game.herokuapp.com
Are You Smarter Than An LLM?
d.erenrich.net/are-you-smar...
My video on the topic:
youtu.be/kXQGivEAF1U
Why do we use LogLoss as an error metric?
Exploring Mean Error, Mean Squared Error, and Log Loss
youtu.be/S_zxVfKI55c
Wow. I had no idea on BlueSky you can enable external media so you can watch YouTube videos on this platform. It overcomes the problem of only being able to upload 60 seconds of video here. It’s gets better & better
🔹 5th Place:
LightGBM + Time Series Foundation Model (TFM)
🔹 4th Place:
Temporal Fusion Transformer (TFT) from Neural Forecast
🔹 3rd:
🚀 LightGBM with recursive & direct forecasting
🔹 2nd:
🌠 LightGBM + Seasonal Theta
🔹 1st:
🧮 Anchored Multiplicative Seasonal Index + Seasonal ARIMA + LightGBM
What are the cutting-edge time series approaches? 📈✨
The VN1 Forecasting Competition showed winning techniques including time series foundation models, deep learning, statistical methods, machine learning, and ensembling.
Check out the techniques of the top 5 teams.
www.youtube.com/watch?v=CRGA...
4 Techniques for Dimensionality Reduction: PCA, AutoEncoder, TSNE, and UMAP
youtu.be/EHWBP-OQwHk