Posts by Wing Lian
The sandbox uses WebAssembly + Python multiprocessing to safely execute model-generated code in parallel, fully locally. This enables scalable, automated reward signals for GRPO fine-tuning without the complexity of Docker or remote eval infra.
Blog Post π€: hf.co/blog/axolotl...
π§΅(2/3)
We've implemented a simple toolkit for fine-tuning powerful coding models using only RL with an entirely local, zero-setup sandboxed code interpreter. We found very promising results using a tiny fraction of data & training time vs SFT. Check out our blogpost for more details! π
π§΅(1/3)
Some of my fav engineers / researchers in AI/ML.
If youβre a cracked eng / researcher I missed please comment and Iβll add you! π¦
go.bsky.app/H9nj9nJ
Needs backwards kernels so we can use it for finetuning π