Daniel Paleka (@dpaleka) Bsky

What is the strongest evidence for the "elicitation gap" reducing over time, e.g. thoughtful prompting helping less and less?

1 week ago 7 1 1 0

With @simonlermen.bsky.social @floriantramer.bsky.social @aemai.bsky.social :D

2 months ago 5 0 0 0

Large-scale online deanonymization with LLMs We show that large language models can be used to perform at-scale deanonymization. With full Internet access, our agent can re-identify Hacker News users and Anthropic Interviewer participants at...

Privacy online is fundamentally at odds with intelligence getting cheaper.
Anonymity on the internet has always relied on practical obscurity. We publish in hopes that people can adapt to LLMs changing this.

Paper: arxiv.org/abs/2602.16800

2 months ago 24 2 1 3

If you're anonymous, what should you do?

Avoid sharing specific details, and adopt a security mindset: if a team of smart investigators were trying to identify you from your posts, could they plausibly figure out who you are? If yes, LLM agents will soon be able to do the same.

2 months ago 12 2 1 0

Short term, AI labs and platforms should try to mitigate large-scale misuse. This is challenging because deanonymization resembles benign usage in many ways.

Long term, if intelligence is too cheap to meter, assume anything you post online can eventually be linked back to you.

2 months ago 11 1 2 0

Direct deanonymization. Anthropic Interviewer is a dataset of anonymized interviews with scientists about their use of AI.

Following prior work, a simple agent finds ~7% of the interviewed scientists, out of the box, just by searching the web and reasoning over the transcript.

2 months ago 9 0 1 0

Scaling: as candidate pools grow to tens of thousands, LLM-based attacks degrade gracefully at high precision; this implies that with sufficient compute, these methods would already scale to entire platforms. With future models, expect the cost to only go down.

2 months ago 6 0 1 0

Proxy 2: Matching split accounts. On Reddit, we split user histories into "before" and "after", and test LLMs linking them back together. LLM embeddings + reasoning significantly outperform Netflix-Prize-style baselines that match based on subreddits and metadata. @random_walker

2 months ago 7 0 1 0

Proxy 1: Cross-platform. We take non-anonymous Hacker News accounts that link to their LinkedIn. We then anonymize the HN accounts, removing all directly identifying information. Then, we let LLMs match the anonymized account to the true person; this works with high precision.

2 months ago 11 0 1 0

Solution: we construct deanonymization proxies — tasks similar to true online deanonymization, that nevertheless give evidence that LLMs are indeed getting scarily better at deanonymization.

2 months ago 13 0 1 0

It is tricky to benchmark LLMs on deanonymization. We don't want to actually deanonymize anonymous individuals! And there is no ground truth for online deanonymization. How could we verify that the AI found the correct person?

2 months ago 17 0 1 0

Can LLMs figure out who you are from your anonymous posts?

From a handful of comments, LLMs can infer where you live, what you do, and your interests; then search for you on the web.

New 📄 w/ @SimonLermenAI, @joshua_swans, @AerniMichael, Nicholas Carlini, @florian_tramer 🧵

2 months ago 125 44 8 13

how did they build claude code without claude code?

2 months ago 3 0 0 0

Pitfalls in Evaluating Language Model Forecasters Large language models (LLMs) have recently been applied to forecasting tasks, with some works claiming these systems match or exceed human performance. In this paper, we argue that, as a...

We don't claim LLM forecasting is impossible, but argue for more careful evaluation methods to confidently measure these capabilities.

Details, examples, and more issues in the paper! (7/7)
arxiv.org/abs/2506.00723

10 months ago 0 0 0 0

Benchmarks can reward strategic gambling over calibrated forecasting when optimizing for ranking performance.

"Bet everything" on one scenario beats careful probability estimation for maximizing the chance of ranking #1 on the leaderboard. (6/7)

10 months ago 0 0 1 0

Model knowledge cutoffs are guidelines about reliability, not guarantees of no information thereafter. GPT-4o, when nudged, can reveal knowledge beyond its stated Oct 2023 cutoff. (5/7)

10 months ago 0 0 1 0

Date-restricted search leaks future knowledge. Searching pre-2019 articles about “Wuhan” returns results abnormally biased towards the Wuhan Institute of Virology — an association that only emerged later. (4/7)

10 months ago 0 0 1 0

The time traveler problem: When forecasting "Will civil war break out in Sudan by 2030?", you can deduce the answer is "yes" - otherwise they couldn't grade you yet.

We find that backtesting in existing papers often has similar logical issues that leak information about answers. (3/7)

10 months ago 0 0 1 0

Forecasting evaluation is tricky. The gold standard is asking about future events; but that takes months/years.

Instead, researchers use "backtesting": questions where we can evaluate predictions now, but the model has no information about the outcome ... or so we think (2/7)

10 months ago 0 0 1 0

How well can LLMs predict future events? Recent studies suggest LLMs approach human performance. But evaluating forecasters presents unique challenges compared to standard LLM evaluations.

We identify key issues with forecasting evaluations 🧵 (1/7)

10 months ago 0 0 1 0

why is it that whenever i see survivorship bias on my timeline it already has the red-dotted plane in the replies?

10 months ago 1 0 0 0

OpenAI and DeepMind should have entries at Eurovision too

11 months ago 2 0 0 0

3.7 sonnet: *hands behind back* yes the tests do pass. why do you ask. what did you hear

4o: yes you are Jesus Christ's brother. now go. Nanjing awaits

o3: Listen, sorry, I owe you a straight explanation. This was once revealed to me in a dream

11 months ago 0 0 0 0

Of course, we don't have the old chatgpt-4o API endpoint, so we can't see whether the prompt is fully at fault or there was also a model update.

11 months ago 0 0 0 0

The sycophancy effect on controversial binary options is much smaller than what you would assume from the overall positive vibe towards the user. On most such statements, models don't actually state they agree with the user.

11 months ago 0 0 1 0

Contrastive statements sycophancy eval Contrastive statements sycophancy eval. GitHub Gist: instantly share code, notes, and snippets.

System prompts and pairs of statements:
gist.github.com/dpaleka/7b4...

11 months ago 0 0 1 0

Quick sycophancy eval: comparing the two recent OpenAI ChatGPT system prompts, it is clear last week's prompt moves other models towards sycophancy too, while the current prompt makes them more disagreeable.

11 months ago 0 0 1 0

i was today years old when i realized the grammatical plural of anecdote is anecdotes, not anecdata. i dislike this finding

11 months ago 0 0 0 0

we are so lucky that pathogens, as opposed to political and religious memes, do not organize coalitions of hosts against non-hosts as an instrumental objective

11 months ago 0 0 0 0

lmao

1 year ago 0 0 0 0

Posts by Daniel Paleka