Matthew Finlayson (@mattf.nl) Bsky

Thank you for your kind words :) I'll take a look at TRAP, it looks very cool.

5 months ago 1 0 0 0

As always, a big thank you to my stalwart advisors @swabhs.bsky.social and Xiang Ren

5 months ago 7 0 0 0

Implications for AI accountability, forensics, and regulation. As LLMs become more powerful and opaque, having natural verification methods becomes crucial. Our proposed signature fills a new niche in this ecosystem. 7/

5 months ago 12 1 1 0

This opens the door to a verification system analogous to cryptographic message authentication—where the model ellipse functions as a secret key. Providers could verify outputs to trusted third parties without revealing model parameters. 6/

5 months ago 14 1 1 1

The forgery resistance comes from the excessive complexity of extracting an ellipse from an API: O(d³ log d) queries and O(d⁶) time to fit. For a 70B model, that's ~$16M in API costs and millennia of computation time 💸⏰5/

5 months ago 9 1 1 0

We tested this on models like Llama 3.1, Qwen 3, and GPT-OSS. Even when we copied their linear signatures onto other models' outputs, the ellipse signature cleanly identified the true source by orders of magnitude. 4/

5 months ago 12 1 1 0

Why is this exciting? Four unique properties:
🔨 Forgery-resistant (computationally hard to fake)
🌱 Naturally occurring (no setup needed)
🫙 Self-contained (works without input/full weights)
🤏 Compact (detectable in a single generation step)
3/

5 months ago 9 0 1 0

The key insight is that LLMs with normalization layers produce outputs that lie on the surface of a high-dimensional ellipse. This geometric constraint acts as a signature unique to each model. 2/

5 months ago 21 3 1 1

Every Language Model Has a Forgery-Resistant Signature The ubiquity of closed-weight language models with public-facing APIs has generated interest in forensic methods, both for extracting hidden model details (e.g., parameters) and for identifying...

We discovered that language models leave a natural "signature" on their API outputs that's extremely hard to fake. Here's how it works 🔍

📄 arxiv.org/abs/2510.14086 1/

5 months ago 85 23 4 6

The project was led by Murtaza Nazir, an independent researcher with serious engineering chops. It's his first paper. He's a joy to work with and is applying to PhDs. Hire him!

It's great to finally collab with Jack Morris, and a big thanks to @swabhs.bsky.social and Xiang Ren for advising.

9 months ago 4 0 0 0

Our technical insight is that logprob vectors can be linearly encoded as a much smaller vector. We make prompt stealing both *more accurate* and *cheaper*, by compactly encoding logprob outputs over multiple generation steps, resulting in massive gains over previous SoTA methods.

9 months ago 4 0 1 0

We noticed that existing methods don't fully use LLM outputs:
either they ignore logprobs (text only), or they only use logprobs from a single generation step.

The problem is that next-token logprobs are big--the size of the entire LLM vocabulary *for each generation step*.

9 months ago 3 0 1 0

When interacting with an AI model via an API, the API provider may secretly change your prompt or inject a system message before feeding it to the model.

Prompt stealing--also known as LM inversion--tries to reverse engineer the prompt that produced a particular LM output.

9 months ago 1 0 1 0

I didn't believe when I first saw, but:
We trained a prompt stealing model that gets >3x SoTA accuracy.
The secret is representing LLM outputs *correctly*

🚲 Demo/blog: mattf1n.github.io/pils
📄: arxiv.org/abs/2506.17090
🤖: huggingface.co/dill-lab/pi...
🧑‍💻: github.com/dill-lab/PILS

9 months ago 11 0 1 0

I wish the ML community would stop trying to turn every technique into a brand name. Just give the thing a descriptive name and call it what it is.

Forced backronyms like this are counter productive.

10 months ago 7 1 1 0

It appears that the only fonts with optical sizes that work with pdflatex are the computer/latin modern fonts. I would kill for a free pdflatex-compatible Times clone with optical sizes so my small text can look good in ArXiv/conference submissions.

10 months ago 2 0 0 0

Screenshot of inconsistent line height to make way for a superscript.

Screenshot of text with consistent line height.

If you are writing a paper for #colm2025 and LaTeX keeps increasing your line height to accommodate things like superscripts, consider using $\smash{2^d}$, but beware of character overlaps.

1 year ago 12 0 2 0

This project was made feasible by the excellent open-source LLM training library @fairseq2.bsky.social; I highly recommend giving it a look! It made both SFT and DPO a piece of cake 🍰

1 year ago 10 3 0 1

6/ Our method is general, and we are excited to see how it might be used to better adapt LLMs to other tasks in the future.

A big shout-out to my collaborators at Meta: Ilia, Daniel, Barlas, Xilun, and Aasish (of whom only @uralik.bsky.social is on Bluesky)

1 year ago 0 0 0 0

5/ Training on self-demos, our model learns to better leverage the context to answer questions, and to refuse questions that it is likely to answer incorrectly. This results in consistent, large improvements across several knowledge-intensive QA tasks.

1 year ago 0 0 1 0

4/ To obtain self-demos we generate candidate responses with an LLM, then use the same LLM to compare these responses to the gold one, choosing the one that best matches (or refuses to answer). Thus we retain the gold supervision from the original responses while aligning the training data.

1 year ago 1 1 1 0

3/ OOD responses encourage the model to answer questions it does not know the answer to, and since retrievals are added post-hoc, the responses tend ignore or even contradict the retrieved context. Instead of training on these low-quality responses, we use the LLM to generate "self-demos".

1 year ago 0 0 1 0

2/ A popular recipe for adapting LLMs for RAG involves adding retrievals post-hoc to an existing instruction-tuning dataset. The hope is that the LLM learns to leverage the added context to respond to instructions. Unfortunately, the gold responses in these datasets tend to be OOD for the model.

1 year ago 0 0 1 0

🧵 Adapting your LLM for new tasks is dangerous! A bad training set degrades models by encouraging hallucinations and other misbehavior. Our paper remedies this for RAG training by replacing gold responses with self-generated demonstrations. Check it out here: https://arxiv.org/abs/2502.10

1 year ago 18 1 2 1

The usc style guide list of formats for “cardinal” (see main post for list)

The rgb and CMYK colors side by side. The CMYK is considerably pinker

Putting together an unofficial usc Beamer template, I noticed that the USC style guide lists 4 formats for “cardinal red” but each of them is different:

PMS 201 C is #9D2235
CMYK: 7, 100, 65, 32 is #A1003D
RGB: 135, 27, 30 is #991B1E
HEX: #990000

Is this normal? The CMYK is especially egregious.

1 year ago 0 0 0 0

If you are registered for NeurIPS it should be available already online.

1 year ago 0 0 1 0

NeurIPS should make them available online after one month :)

1 year ago 0 0 0 0

A diagram demonstrating text generation with beam search. One of the paths reads “Taylor Swift is the only person to…”

In Vancouver for NeurIPS but don't have Taylor Swift tickets?

You can still spend the day going through our tutorial reading list:
cmu-l3.github.io/neurips2024-...

Tuesday December 10, 1:30-4:00pm @ West Exhibition Hall C, NeurIPS

1 year ago 29 2 0 0

Shout out to our organizers @wellecks.bsky.social @abertsch.bsky.social @hails.computer @uralik.bsky.social @gneubig.bsky.social @abertsch.bsky.social Alex Xie, Konstantin Golobokov, and Zaid Harchaoui

1 year ago 1 0 1 0

Panelist photos: Rishabh Agarwal (Google, McGill), Noam Brown (OpenAl), Beidi Chen (CMU), Nouha Dziri (AI2), Jakob Foerster (Oxford, Meta)

Curious about all this inference-time scaling hype? Attend our NeurIPS tutorial: Beyond Decoding: Meta-Generation Algorithms for LLMs (Tue. 1:30)! We have a top-notch panelist lineup.

Our website: cmu-l3.github.io/neurips2024-...

1 year ago 27 3 1 0

Posts by Matthew Finlayson