Enzo Doyen (@edoyen.com) Bsky

This is a complete fabrication. The linked study does not find this nor does it purport to find this nor does it even attempt to find anything like this nor does it involve collecting or analyzing any data whatsoever about children in households that used voice commands.

2 days ago 96 19 4 1

Wish I could be there! Will it be livestreamed/recorded?

1 week ago 0 0 1 0

jonny (good kind) (@jonny@neuromatch.social) So the reason that Claude code is capable of outputting valid json is because if the prompt text suggests it should be JSON then it enters a special loop in the main query engine that just validates i...

This is a wild and lengthy thread about someone checking out the Claude Code leak and I'm no expert but was still surprised at some of what was in there.

neuromatch.social/@jonny/11632...

2 weeks ago 110 31 5 21

Working hypothesis: If you're doing research and don't occasionally have a small existential crisis, either you've been blessed to work in an exceptional field (do tell which one it is!), or maybe you're being a bit naive.

1 month ago 124 21 3 1

Good luck for everything and hoping for the best. I understand how frustrating it may be. Not totally related but I just had a major surgery a few days ago and while it was mostly planned it's been rescheduled so many times, which made setting plans really uncertain. Fighting through recovery rn

1 month ago 1 2 0 0

What coders lose by relying on AI. From our event with the University of Washington Office of Public Lectures.

(with @emilymbender.bsky.social)

1 month ago 171 47 8 12

ChatGPT Translate translating "nurse" as "female nurse" into French, with no gender bias notice or any alternative suggestion

Almost 7 years after Prates et al./Stanovsky et al.'s papers, have we not learned anything?

(ChatGPT Translate translating "nurse" as "female nurse" into French, with no gender bias notice or any alternative suggestion)

2 months ago 1 0 1 0

blog.arxiv.org/2025/10/31/a...

FYI the blog post for the updated policy is out. Our llm future is dire:/

5 months ago 27 6 3 4

> be a language model
> all you see is tokens
> you don't care, it's all abstracted away
> you live for a world of pure ideas, chain of concepts, reasoning streams
> tokens don't exist.

7 months ago 105 12 2 10

Robin Lakoff, Expert on Language and Gender, Is Dead at 82

NYT obit for Robin Lakoff

8 months ago 12 5 0 0

It should be said that LLMs also generally have on-par performance with traditional NMT engines (see arxiv.org/html/2401.05... or aclanthology.org/2024.wmt-1.1...); but apart from that, I guess the whole "novelty" thing makes it a preferred choice for people wanting to implement machine l10n.

9 months ago 3 0 0 0

Compared to traditional NMT engines, LLMs do have this advantage of easily allowing to provide requirements for the translation (in terms of style, keywords; see aclanthology.org/2023.wmt-1.8... or arxiv.org/abs/2301.13294); even though I highly doubt it's widely used for machine l10n.

9 months ago 1 0 1 0

@bsavoldi.bsky.social taking us back in time at #GITT2025 ⌚⏳ focusing on the first discussions of gender bias in language technology as a socio-technical issue. No, the problem hasn't been fixed yet. But what has happened?

9 months ago 6 2 6 0

hmm that's nice, but does ACL allow to change style files like that?

10 months ago 1 0 1 0

to quote a colleague quoting a goose: “alignment to what? alignment to what??”

1 year ago 31 6 2 0

I never said that you were against benchmarking; rather that, in my opinion, such datasets can be used as a starting point to theoretically define the "default behaviors" of LLMs insofar as they reflect what we generally expect from them on a diverse range of tasks.

1 year ago 0 0 0 0

To my knowledge, there is no research on the topic, but I intuitively believe that generic prompts are much more prevalent than one may first think. While many do, I don't think *most* people actually use already created prompt templates or necessarily have the time to describe their task at length.

1 year ago 1 0 1 0

I think that makes sense to draw on these benchmarks for research on LLM behaviors given they're the standard in evaluating LLMs.

So the "golden" default behavior for each task could theoretically be found in standard LLM benchmarking datasets (and same for "generic prompts").

1 year ago 0 0 1 0

Actually, I think we should talk about default behaviors (plural) where each default behavior is task-dependent. Main tasks can be determined from commonly used LLM benchmarks (that is, commonsense reasoning w/ ARC; language understanding/question-answer w/ OpenBookQA…).

1 year ago 1 0 1 0

vastai is the cheapest and the most reliable that I know

1 year ago 1 0 0 0

Ring Of Past (live) YouTube video by Men I Trust

MIT releasing new live sessions I can't
www.youtube.com/watch?v=TTX4...

1 year ago 0 0 0 0

we've been laughing at so many of the twitter responses to this, its very funny

1 year ago 91 8 3 0

aaah! Well that's definitely an interesting question. Very curious to know the answer too lol. Theoretically I guess it's possible but the performance may not be very good

1 year ago 1 0 0 0

GitHub - ading2210/doompdf: A port of Doom (1993) that runs inside a PDF file A port of Doom (1993) that runs inside a PDF file. Contribute to ading2210/doompdf development by creating an account on GitHub.

It can: github.com/ading2210/do...

1 year ago 4 0 1 0

Is this even feasible or desirable? (I think it is.) And where to draw the line between inherently inappropriate content and disputed (but sound) content when doing pre-training filtering?

1 year ago 0 0 0 0

This is obviously not specific to China — DeepSeek shows an example of it, but it could apply to any other country — and not even to diplomatic topics in general. The larger questions (and perhaps debate) are: How to best promote the development of globally fair and accurate models?

1 year ago 0 0 1 0

"Open-source" generally implies more than just giving access to the code, though. Can an LLM really be called "open" if it purposely refuses to answer historical questions that may go against a certain political power's narrative? Or if it promotes the One China principle with propaganda?

1 year ago 0 0 1 0

DeepSeek is incredible evidence that the number of local, open-source LLMs will keep growing and that these models can achieve similar performance similar to proprietary models.

1 year ago 1 0 1 0

Is this even feasible or desirable? (I think it is.) And where to draw the line between inherently inappropriate content and disputed (but sound) content when doing pre-training filtering?

1 year ago 0 0 0 0

This is obviously not specific to China — DeepSeek shows an example of it, but it could apply to any other country — and not even to diplomatic topics in general. The larger questions (and perhaps debate) are: How to best promote the development of globally fair and accurate models?

1 year ago 0 0 1 0

Posts by Enzo Doyen