Gautier Hamon (@hamongautier) Bsky

Complex cell-like structures in Flow Lenia

1 year ago 8 2 0 0

MAGELLAN: Metacognitive predictions of learning progress guide... Open-ended learning agents must efficiently prioritize goals in vast possibility spaces, focusing on those that maximize learning progress (LP). When such autotelic exploration is achieved by LLM...

🚀 Introducing 🧭MAGELLAN—our new metacognitive framework for LLM agents! It predicts its own learning progress (LP) in vast natural language goal spaces, enabling efficient exploration of complex domains.🌍✨Learn more: 🔗 arxiv.org/abs/2502.07709 #OpenEndedLearning #LLM #RL

1 year ago 9 3 1 4

we are recruiting interns for a few projects with @pyoudeyer
in bordeaux
> studying llm-mediated cultural evolution with @nisioti_eleni
@Jeremy__Perez

> balancing exploration and exploitation with autotelic rl with @ClementRomac

details and links in 🧵
please share!

1 year ago 6 6 1 0

1e9 steps on craftax with transformerXL PPO

4e9 steps on craftax with transformerXL PPO

8/ For the curious, here are the achievements success rate on craftax across training, training for 1e9 steps (left) and training for 4e9 steps (right).

1 year ago 1 0 0 0

GitHub - Reytuag/transformerXL_PPO_JAX Contribute to Reytuag/transformerXL_PPO_JAX development by creating an account on GitHub.

7/ The JAX ecosystem in RL is currently blooming with wonderful open-sources projects from others that I linked at the bottom of the repository. github.com/Reytuag/tran...
This work was done at @FlowersINRIA
.
Also feel free to reach me if you have questions or suggestions !

1 year ago 0 0 1 0

6/ Potential next steps could be to test it on Xland-Minigrid
, to test it on an Open-Ended meta-RL environment github.com/dunnolab/xla...
I'm also curious to implement Muesli (arxiv.org/abs/2104.06159) with transformerXL as in arxiv.org/abs/2301.07608

1 year ago 0 0 1 0

5/Here is the training curve obtained from training for 1e9 steps, reporting the scores from PPO and PPO-RNN provided in the craftax repo.
Noting that PPO-RNN was already beating other baselines with Unsupervised Environment Design and intrinsic motivation. arxiv.org/pdf/2402.16801

1 year ago 2 0 1 0

GitHub - MichaelTMatthews/Craftax: (Crafter + NetHack) in JAX. ICML 2024 Spotlight. (Crafter + NetHack) in JAX. ICML 2024 Spotlight. Contribute to MichaelTMatthews/Craftax development by creating an account on GitHub.

4/ Testing it on the challenging Craftax from github.com/MichaelTMatt...
(with little hyperparameter tuning), it obtained higher returns in 1e9 steps than PPO-RNN.
Training it for longer, led to the 3rd floor in craftax, making it the first to get advanced achievements.

1 year ago 0 0 1 0

3/
Training a 3M parameters Transformer for 1e6 steps in MemoryChain-bsuite (from gymnax) takes 10s on a A100. (with 512 env)
Training a 5M parameters Transformer for 1e9 steps in craftax takes ~6h on a single A100. (with 1024 envs)
We also support multi-GPU training.

1 year ago 0 0 1 0

Stabilizing Transformers for Reinforcement Learning Owing to their ability to both effectively integrate information over long time horizons and scale to massive amounts of data, self-attention architectures have recently shown breakthrough success in ...

2/ We implement TransformerXL-PPO following "Stabilizing Transformers for Reinforcement
Learning" arxiv.org/abs/1910.06764
The code follows the template from PureJaxRL github.com/luchris429/p...
⚡️Training is fast thanks to JAX

1 year ago 0 0 1 0

1/⚡️Looking for a fast and simple Transformer baseline for your RL environment in JAX ?
Sharing my implementation of transformerXL-PPO: github.com/Reytuag/tran...
The implementation is the first to attain the 3rd floor and obtain advanced achievements in the challenging Craftax

1 year ago 3 1 1 0

The video encoding might not do it full justice.
Paper: direct.mit.edu/isal/proceed...

1 year ago 0 0 0 0

1 year ago 0 0 1 0

Putting some Flow Lenia here too

1 year ago 4 1 1 0

Now that @jeffclune.bsky.social and @joelbot3000.bsky.social are here, time for an Open-Endedness starter pack.

go.bsky.app/MdVxrtD

1 year ago 105 32 16 5

🚨New preprint🚨
When testing LLMs with questions, how can we know they did not see the answer in their training? In this new paper we propose a simple out of the box and fast method to spot contamination on short texts with @stepalminteri.bsky.social and Pierre-Yves Oudeyer !

1 year ago 9 4 1 0

Posts by Gautier Hamon