Weijie Su (@wjsu) Bsky

We're excited to announce the call for papers for #ICML 2026:

icml.cc/Conferences/...

See you in Seoul next summer!

5 months ago 2 1 0 0

Evaluating the Unseen Capabilities: How Many Theorems Do LLMs Know? Accurate evaluation of large language models (LLMs) is crucial for understanding their capabilities and guiding their development. However, current evaluations often inconsistently reflect the actual ...

Great minds think alike!

Alan Turing cracked Enigma in WWII; Brad Efron asked how many words Shakespeare knew. They used the same method.

We use this method for LLM evaluation—to evaluate certain unseen capabilities of LLMs:

arxiv.org/abs/2506.02058

10 months ago 1 1 0 0

Fundamental Limits of Game-Theoretic LLM Alignment: Smith Consistency and Preference Matching Nash Learning from Human Feedback is a game-theoretic framework for aligning large language models (LLMs) with human preferences by modeling learning as a two-player zero-sum game. However, using raw ...

Another new paper that is follow-up:

arxiv.org/abs/2505.20627

It studies an alternative to RLHF: Nash learning from human feedback.

10 months ago 0 1 0 0

Statistical Impossibility and Possibility of Aligning LLMs with Human Preferences: From Condorcet Paradox to Nash Equilibrium Aligning large language models (LLMs) with diverse human preferences is critical for ensuring fairness and informed outcomes when deploying these models for decision-making. In this paper, we seek to ...

A (not so) new paper on #LLM alignment from a social choice theory viewpoint:

arxiv.org/abs/2503.10990

It reveals fundamental impossibility results concerning representing (diverse) human preferences.

10 months ago 1 1 1 0

Our analysis shows that it is natural to use the polar decomposition from a defining viewpoint. This gives rise to nuclear norm scaling: the update will vanish as the gradient becomes small, automatically! In contrast, Muon needs to manually tune the factor for the ortho matrix to achieve this.

10 months ago 1 0 0 0

PolarGrad: A Class of Matrix-Gradient Optimizers from a Unifying Preconditioning Perspective The ever-growing scale of deep learning models and datasets underscores the critical importance of efficient optimization methods. While preconditioned gradient methods such as Adam and AdamW are the ...

We posted a paper on optimization for deep learning:

arxiv.org/abs/2505.21799

Recently there's a surge of interest in *structure-aware* optimizers: Muon, Shampoo, Soap. In this paper, we propose a unifying preconditioning perspective, offer insights into these matrix-gradient methods.

10 months ago 3 1 1 0

Statistical Foundations of Large Language Models

Some context: www.weijie-su.com/llm/

10 months ago 0 0 0 0

Do Large Language Models (Really) Need Statistical Foundations? Large language models (LLMs) represent a new paradigm for processing unstructured data, with applications across an unprecedented range of domains. In this paper, we address, through two arguments, wh...

I just wrote a position paper on the relation between statistics and large language models:

Do Large Language Models (Really) Need Statistical Foundations?

arxiv.org/abs/2505.19145

Any comments are welcome. Thx!

10 months ago 1 0 1 0

How to Prevent a Tragedy of the Commons for AI Research?

The ranking method was tested at ICML in 2023, 2024, and 2025. I hope we'll finally use it to improve ML/AI review processes soon. Here's an article about the method, from its conception to experimentation:

www.weijie-su.com/openrank/

10 months ago 1 0 0 0

The ICML 2023 Ranking Experiment: Examining Author Self-Assessment in ML/AI Peer Review We conducted an experiment during the review process of the 2023 International Conference on Machine Learning (ICML), asking authors with multiple submissions to rank their papers based on perceived q...

Our paper "The ICML 2023 Ranking Experiment: Examining Author Self-Assessment in ML/AI Peer Review" will appear in JASA as a Discussion Paper:

arxiv.org/abs/2408.13430

It's a privilege to work with such a wonderful team: Buxin, Jiayao, Natalie, Yuling, Didong, Kyunghyun, Jianqing, and Aaroth.

10 months ago 1 1 1 0

Statistical Foundations of Large Language Models

We're hiring a postdoc focused on the statistical foundations of large language models, starting this fall. Join our team exploring the theoretical and statistical underpinnings of LLMs. If interested, check our work: weijie-su.com/llm/ and drop me an email. #AIResearch #PostdocPosition

11 months ago 1 1 0 0

Tips on How to Connect at Academic Conferences I was a kinda awkward teenager. If you are a CS researcher reading this post, then chances are, you were too. How to navigate social situations and make friends is not always intuitive, and has to …

I wrote a post on how to connect with people (i.e., make friends) at CS conferences. These events can be intimidating so here's some suggestions on how to navigate them

I'm late for #ICLR2025 #NAACL2025, but in time for #AISTATS2025 #ICML2025! 1/3
kamathematics.wordpress.com/2025/05/01/t...

11 months ago 68 19 3 2

The #ICML2025 @icmlconf.bsky.social deadline has just passed!

Peer review is vital to advancing AI research. We've been conducting a survey experiment at ICML since 2023. Pls take a few minutes to participate in it, sent via email with the subject "[ICML 2025] Author Survey". Thx!

1 year ago 1 0 0 0

Stat Click on the title to browse this journal

A special issue on large language models (LLMs) and statistics at Stat (onlinelibrary.wiley.com/journal/2049...). We're seeking submissions examining LLMs' impact on statistical methods, practice, education, and many more @amstatnews.bsky.social

1 year ago 3 1 0 0

Departmental Postdoctoral Researcher Position

A departmental postdoc position opening in my dept: statistics.wharton.upenn.edu/recruiting/d...

1 year ago 4 0 0 0

Heading to Vancouver tomorrow for #NeurIPS2024, Dec 10-14! Excited to reconnect with colleagues and enjoy Vancouver's seafood! 🦐

1 year ago 1 0 0 0

Add me plz. Thx!

1 year ago 1 0 1 0

How Is AI Changing the Science of Prediction? Podcast Episode · The Joy of Why · 11/07/2024 · 37m

Machine learning has led to predictive algorithms so obscure that they resist analysis. Where does the field of traditional statistics fit into all of this? Emmanuel Candès asks the question, “Can I trust this?” Tune in to this week’s episode of “The Joy of Why” listen.quantamagazine.org/jow-321-s

1 year ago 32 8 0 0

Knew nothing about bluesky until today. Immediately stop using X or gradually migrate to bluesky? Is there an optimal switching strategy?

1 year ago 2 0 0 0

Posts by Weijie Su