Complex cell-like structures in Flow Lenia
Posts by Gautier Hamon
🚀 Introducing 🧭MAGELLAN—our new metacognitive framework for LLM agents! It predicts its own learning progress (LP) in vast natural language goal spaces, enabling efficient exploration of complex domains.🌍✨Learn more: 🔗 arxiv.org/abs/2502.07709 #OpenEndedLearning #LLM #RL
we are recruiting interns for a few projects with @pyoudeyer
in bordeaux
> studying llm-mediated cultural evolution with @nisioti_eleni
@Jeremy__Perez
> balancing exploration and exploitation with autotelic rl with @ClementRomac
details and links in 🧵
please share!
1e9 steps on craftax with transformerXL PPO
4e9 steps on craftax with transformerXL PPO
8/ For the curious, here are the achievements success rate on craftax across training, training for 1e9 steps (left) and training for 4e9 steps (right).
7/ The JAX ecosystem in RL is currently blooming with wonderful open-sources projects from others that I linked at the bottom of the repository. github.com/Reytuag/tran...
This work was done at @FlowersINRIA
.
Also feel free to reach me if you have questions or suggestions !
6/ Potential next steps could be to test it on Xland-Minigrid
, to test it on an Open-Ended meta-RL environment github.com/dunnolab/xla...
I'm also curious to implement Muesli (arxiv.org/abs/2104.06159) with transformerXL as in arxiv.org/abs/2301.07608
5/Here is the training curve obtained from training for 1e9 steps, reporting the scores from PPO and PPO-RNN provided in the craftax repo.
Noting that PPO-RNN was already beating other baselines with Unsupervised Environment Design and intrinsic motivation. arxiv.org/pdf/2402.16801
4/ Testing it on the challenging Craftax from github.com/MichaelTMatt...
(with little hyperparameter tuning), it obtained higher returns in 1e9 steps than PPO-RNN.
Training it for longer, led to the 3rd floor in craftax, making it the first to get advanced achievements.
3/
Training a 3M parameters Transformer for 1e6 steps in MemoryChain-bsuite (from gymnax) takes 10s on a A100. (with 512 env)
Training a 5M parameters Transformer for 1e9 steps in craftax takes ~6h on a single A100. (with 1024 envs)
We also support multi-GPU training.
2/ We implement TransformerXL-PPO following "Stabilizing Transformers for Reinforcement
Learning" arxiv.org/abs/1910.06764
The code follows the template from PureJaxRL github.com/luchris429/p...
⚡️Training is fast thanks to JAX
1/⚡️Looking for a fast and simple Transformer baseline for your RL environment in JAX ?
Sharing my implementation of transformerXL-PPO: github.com/Reytuag/tran...
The implementation is the first to attain the 3rd floor and obtain advanced achievements in the challenging Craftax
The video encoding might not do it full justice.
Paper: direct.mit.edu/isal/proceed...
Putting some Flow Lenia here too
Now that @jeffclune.bsky.social and @joelbot3000.bsky.social are here, time for an Open-Endedness starter pack.
go.bsky.app/MdVxrtD
🚨New preprint🚨
When testing LLMs with questions, how can we know they did not see the answer in their training? In this new paper we propose a simple out of the box and fast method to spot contamination on short texts with @stepalminteri.bsky.social and Pierre-Yves Oudeyer !