The hardest part about finetuning is that people don't have labeled data. Today, @databricks.bsky.social introduced TAO, a new finetuning method that only needs inputs, no labels necessary. Best of all, it actually beats supervised finetuning on labeled data. www.databricks.com/blog/tao-usi...
Posts by Ivan Zhou
Two years ago at Mosaic we had an idea for an "RLXF as a service"
One year and an acquisition later, the prototypes went into preview at Databricks
Today we share some results and findings of "what does it take to actually do enterprise RL and put it into a real product"
The vibe of sunset time in DC
Profiling code and hunting for 10x latency gains is pure joy 👨🏻💻
Nevertheless, the team has developed an exceptional inference system and commendably shared their expertise with the community. Kudos to them!
DeepSeek's inference system overview is truly impressive. However, we should not overinterpret its eye popping profit margin. The key takeaway is that with high traffic volumes, you can create extremely large batch sizes to maximize GPU utilization. The reported 545% profit margin comes with caveats
New open source OCR VLM!
We're probably a little too obsessed with zero-shot retrieval. If you have documents (you do), then you can generate synthetic data, and finetune your embedding. Blog post lead by @jacobianneuro.bsky.social shows how well this works in practice.
www.databricks.com/blog/improvi...
Making LLMs run efficiently can feel scary, but scaling isn’t magic, it’s math! We wanted to demystify the “systems view” of LLMs and wrote a little textbook called “How To Scale Your Model” which we’re releasing today. 1/n
It's remarkable to see the brightest minds and top talents from the US, China, and numerous other countries are all working to push AI's frontiers—advancing reasoning, efficiency, applications, etc.
While competition certainly exists, I'm finding more collaborative spirit in this coopetition state.
Qwen 2.5 VL seems to have great emphasis on document image analysis -- layout detection, special html output format, localization of objects -- and the performance on docvqa seems to be very strong.
Alibaba's Qwen group just shipped Qwen2.5-7B-Instruct-1M and Qwen2.5-14B-Instruct-1M - two Apache 2 licensed LLMs with an impressive 1 million token context limit!
Here are my notes, including my so-far unsuccessful attempts to run large context prompts on my Mac simonwillison.net/2025/Jan/26/...
I don’t care when “AGI” arrives I’m just out here having a good time with AI anyways.
I was reading @natolambert.bsky.social's RLHF book when I came across an unexpected chapter about his experience working with data labeling vendors: rlhfbook.com/c/06-prefere...). He shared several realistic, frustrating stories and data points. I strongly resonated with his experiences.
My girlfriend and I recently traveled to Europe for a conference as well as a vacation. We had a great time walking and exploring each of the cities that we visited. Here I am sharing my favorite photos captured in the trip:
www.ivanzhou.me/blog/2024/12...
The vision capability enriches interactions with the real world. The experience is quite delightful when it works.
I’m traveling in Vienna now and I found ChatGPT to be a fabulous travel companion. I can tap into its knowledge to learn about Austrian history and ask questions about historical figures or moments.
I gave a presentation at the Ray Summit on my work building Multimodal Foundation Models for Document Automation at Uber. It is always a great pleasure to publicly share what I have been building over the past year!
www.ivanzhou.me/blog/2024/11...