Advertisement Β· 728 Γ— 90

Posts by Wing Lian

Preview
GitHub - axolotl-ai-cloud/grpo_code Contribute to axolotl-ai-cloud/grpo_code development by creating an account on GitHub.

GitHub: github.com/axolotl-ai-c...
Model πŸ€—πŸ€–: huggingface.co/axolotl-ai-c...

1 year ago 1 0 1 0
Preview
Training Large Language Models with Interpreter Feedback using WebAssembly A Blog post by Axolotl AI on Hugging Face

The sandbox uses WebAssembly + Python multiprocessing to safely execute model-generated code in parallel, fully locally. This enables scalable, automated reward signals for GRPO fine-tuning without the complexity of Docker or remote eval infra.

Blog Post πŸ€—: hf.co/blog/axolotl...
🧡(2/3)

1 year ago 2 0 1 0
Post image

We've implemented a simple toolkit for fine-tuning powerful coding models using only RL with an entirely local, zero-setup sandboxed code interpreter. We found very promising results using a tiny fraction of data & training time vs SFT. Check out our blogpost for more details! πŸ‘‡
🧡(1/3)

1 year ago 3 1 1 0

Some of my fav engineers / researchers in AI/ML.

If you’re a cracked eng / researcher I missed please comment and I’ll add you! πŸ¦‹

go.bsky.app/H9nj9nJ

1 year ago 28 5 9 2

Needs backwards kernels so we can use it for finetuning πŸ˜…

1 year ago 1 0 0 0