Advertisement · 728 × 90

Posts by Colin Doyle

Every day in my work with LLMs, I go back and forth between being astonished by their brilliance and astonished by their stupidity.

1 year ago 8 0 1 0

In my limited experience, the faculty who understand and interact with A.I. the most have the most mercurial opinions not just on what A.I. will be able to do in the short term but what A.I. can even do now.

1 year ago 1 0 1 0
Preview
Language models know Tom Cruise's mother, but not her son An experiment shows that language models cannot generalize the simple formula "A is B" to "B is A". But why is that?

I also wonder if it might be an example of this kind of effect:

the-decoder.com/language-mod...

Notably, o1-preview doesn't seem to share this problem with connections puzzles. Very curious about if this was a particular problem OpenAI targeted and how.

1 year ago 0 0 0 0

Yes, GPT-4o struggled the most with linguistic puzzles and puzzles in which the connection between the words was in the form of another word that could appear either immediately before or immediately after each of the four puzzle words.

1 year ago 1 0 1 0
Preview
LLMs as Method Actors: A Model for Prompt Engineering and Architecture We introduce "Method Actors" as a mental model for guiding LLM prompt engineering and prompt architecture. Under this mental model, LLMs should be thought of as actors; prompts as scripts and cues; an...

I just wrote a paper related to this: arxiv.org/abs/2411.057...

With a complicated prompt system, GPT-4o was able to solve 86% of puzzles. OpenAI's new o1 model is the strongest. When prompted to make one guess at a time and receiving feedback on bad guesses, it could solve 100%.

1 year ago 1 0 1 0