Sian Gooding (@siangooding) Bsky

👋 We're building a new type of word processor at Marker, and we're hiring for React/ProseMirror engineers and full-stack AI engineers to join the team in London.

Are you an engineer who cares about writing? Or do you know someone who does?

See: writewithmarker.com/jobs

More details below 👇

10 months ago 23 10 2 1

Sorted, thanks!

1 year ago 0 0 0 0

Don't lie to your friends: Learning what you know from collaborative self-play To be helpful assistants, AI agents must be aware of their own capabilities and limitations. This includes knowing when to answer from parametric knowledge versus using tools, when to trust tool outpu...

We all want LLMs to collaborate with humans to help them achieve their goals. But LLMs are not trained to collaborate, they are trained to imitate. Can we teach LM agents to help humans by first making them help each other?

arxiv.org/abs/2503.14481

1 year ago 56 20 1 0

You’ll collaborate with a kind, curious, research-driven team—including the brilliant @joao.omg.lol & @martinklissarov.bsky.social —and get to shape work at the frontier of multi-agent learning.

If that sounds like you, apply!

DM me if you're curious or have questions

1 year ago 5 0 1 0

Some big questions we’re thinking about:
1⃣How do communication protocols emerge?
2⃣What inductive biases help coordination?
3⃣How can language improve generalisation and transfer?

1 year ago 6 0 1 0

We’re interested in:
🤖🤖 Multi-agent RL
🔠 Emergent language
🎲 Communication games
🧠 Social & cognitive modelling
📈 Scaling laws for coordination

1 year ago 4 0 1 0

The project explores how agents can learn to communicate and coordinate in complex, open-ended environments—through emergent protocols, not hand-coded rules.

1 year ago 3 0 1 0

🚨 I’m hosting a Student Researcher @GoogleDeepMind!

Join us on the Autonomous Assistants team (led by
@egrefen.bsky.social) to explore multi-agent communication—how agents learn to interact, coordinate, and solve tasks together.

DM me for details!

1 year ago 14 3 1 0

Our full paper:
arxiv.org/pdf/2503.19711

1 year ago 3 0 0 0

Our work highlights the need for LLMs to improve in areas like action selection, self-evaluation + goal alignment to perform robustly in open-ended tasks

Implications of this work extend beyond writing assistance to autonomous workflows for LLMs in general open-ended use cases

1 year ago 2 0 1 0

Finding: LLMs can lose track of the original goal during iterative refinement, leading to "semantic drift" - a divergence from the author's intent. This is a key challenge for autonomous revision. ✍️

1 year ago 4 0 1 0

Finding: LLMs struggle to reliably filter their own suggestions. They need better self-evaluation to work effectively in autonomous revision workflows. ⚖️

1 year ago 3 0 1 0

Finding: Gemini 1.5 Pro produced the highest quality editing suggestions, according to human evaluators, outperforming Claude 3.5 Sonnet and GPT-4o 🦾

1 year ago 4 0 1 0

Finding: LLMs tend to favour adding content, whereas human editors remove or restructure more. This suggests LLMs are sycophantic, reinforcing existing text rather than critically evaluating it. ➕

1 year ago 4 0 1 0

Why? There are many possible solutions and no single 'right' answer. Success is difficult to gauge!

We examine how LLMs generate + select text revisions, comparing their actions to human editors. We focus on action diversity, alignment with human prefs, and iterative improvement

1 year ago 2 0 1 0

Our paper explores this by analysing LLMs as autonomous co-writers. Work done with Lucia Lopez Rivilla, @egrefen.bsky.social ) 🫶

Open-ended tasks like writing are a real challenge for LLMs (even powerful ones like Gemini 1.5 Pro, Claude 3.5 Sonnet, and GPT-4o).

1 year ago 4 0 1 0

New paper from our team @GoogleDeepMind!

🚨 We've put LLMs to the test as writing co-pilots – how good are they really at helping us write? LLMs are increasingly used for open-ended tasks like writing assistance, but how do we assess their effectiveness? 🤔

arxiv.org/pdf/2503.19711

1 year ago 20 8 1 1

you're telling me a cherry picked this example?

1 year ago 157 15 2 0

Instead of listing my publications, as the year draws to an end, I want to shine the spotlight on the commonplace assumption that productivity must always increase. Good research is disruptive and thinking time is central to high quality scholarship and necessary for disruptive research.

1 year ago 1150 375 21 57

Posts by Sian Gooding