Zining Zhu (@zhuzining) Bsky

Nowadays LLMs are used a lot in reasoning. When we use them in regular tasks (more specifically: those that are covered in the model's training data), it's fine. However, using the models with new information, new rules, and new capabilities would require more caution. 5/n. n=5

1 year ago 0 0 0 0

Can LLMs reason? When the problems are grounded in the real world, the performance is good. Otherwise, there's a huge performance drop. 4/n

1 year ago 0 0 1 0

These are common properties for formal reasoning datasets, but have been very hard to incorporate in commonsense reasoning (which is usually considered a type of informal reasoning). 3/n

1 year ago 0 0 1 0

ACCORD allows (1) controllable reasoning path length, (2) controllable distraction items on the reasoning tree. These controls are (3) automatic and (4) scalable. 2/n

1 year ago 0 0 1 0

$\texttt{ACCORD}$: Closing the Commonsense Measurability Gap We present $\texttt{ACCORD}$, a framework and benchmark suite for disentangling the commonsense grounding and reasoning abilities of large language models (LLMs) through controlled, multi-hop counterf...

Let's bring in more formal reasoning properties in the commonsense reasoning datasets! Introducing ACCORD arxiv.org/abs/2406.02804, to be presented at #NAACL2025 w/ François Roewer-Després, Jinyue Feng and Frank Rudzicz. 1/n

1 year ago 1 0 1 0

A uniquely interesting book with a lot of new information, and I feel the urge to take notes (either to echo or to debate) while reading. Highly recommend.

1 year ago 2 1 0 0

Behind the graduate mental health crisis in science - Nature Biotechnology Survey results identify how scientific research and teaching contribute to the graduate student mental health crisis.

Nature Biotechnology

Behind the graduate mental health crisis in science
www.nature.com/articles/s41...

1 year ago 4 4 1 0

I know there are already plenty of tips out there on how to write an effective rebuttal, but I thought I’d share mine as well. I’m not claiming to be an expert or to have a perfect success rate, but I hope these suggestions might be helpful for anyone who could use them.

1 year ago 16 2 1 0

What are some recent papers that show making models explainable can also make them safer?

1 year ago 1 0 0 0

Hi I'm starting to use Bluesky!

1 year ago 4 0 0 0

Posts by Zining Zhu