Advertisement · 728 × 90

Posts by Joshua Ong

MMLU-Redux Poster at NAACL 2025

MMLU-Redux Poster at NAACL 2025

MMLU-Redux just touched down at #NAACL2025! 🎉
Wish I could be there for our "Are We Done with MMLU?" poster today (9:00-10:30am in Hall 3, Poster Session 7), but visa drama said nope 😅
If anyone's swinging by, give our research some love! Hit me up if you check it out! 👋

11 months ago 17 11 0 0

Thanks @nolovedeeplearning.bsky.social for the picture!!! 🥰

1 year ago 20 3 1 1

Very cool work! 👏🚀 Unfortunately, errors in the original dataset will propagate to all new languages 😕

We investigated the issue of existing errors in the original MMLU in
arxiv.org/abs/2406.04127

@aryopg.bsky.social @neuralnoise.com

1 year ago 4 2 0 1

For clarity -- great project, but most of the MMLU errors we found (and fixed) in our MMLU Redux paper (arxiv.org/abs/2406.04127) are also present in this dataset. We also provide a curated version of MMLU, so it's easy to fix 😊

1 year ago 15 4 1 0

Super Cool work from Cohere for AI! 🎉 However, this highlights a concern raised by our MMLU-Redux team (arxiv.org/abs/2406.04127): **error propagation to many languages**. Issues in MMLU (e.g., "rapid intervention to solve ebola") seem to persist in many languages. Let's solve the root cause first?

1 year ago 9 3 1 0
Post image

Sohee (@soheeyang.bsky.social) in the house! 🚀🚀🚀

1 year ago 9 1 0 0
The OLMo 2 models sit at the Pareto frontier of training FLOPs vs model average performance.

The OLMo 2 models sit at the Pareto frontier of training FLOPs vs model average performance.

Meet OLMo 2, the best fully open language model to date, including a family of 7B and 13B models trained up to 5T tokens. OLMo 2 outperforms other fully open models and competes with open-weight models like Llama 3.1 8B — As always, we released our data, code, recipes and more 🎁

1 year ago 151 36 5 12
Post image

This papers' findings about testing LLMs on NLI aligns with many of personal thoughts:

1) NLI remains a difficult task for LLMs
2) Having more few-shot examples is helpful (in my view, helping LLMs better understand class boundaries)
3) Incorrect predictions are often a result of ambiguous labels

1 year ago 27 3 1 0
Advertisement

Hey John! Thanks for reaching out—I’ve sent you a DM to discuss this further!

1 year ago 0 0 1 0
Preview
rebuttal template

Since friends are doing NAACL / ICLR rebuttals, sharing my rebuttal template.
It works for me because it allows me to visually break down comments across reviewers into common themes, things that I can easily address v those that I can't, and also filter across these.

You all've got this!!!

1 year ago 16 5 1 0

Hii I’d love to join as well!!!🙋🏼‍♀️

1 year ago 0 0 0 0

Hii I’d love to join as well!!

1 year ago 1 0 0 0

Check out our CoMAT: Chain of Mathematically Annotated Thought, which improves mathematical reasoning by converting mathematical questions into structured symbolic representations and performing step-by-step reasoning🎉 works on various languages and challenging benchmarks

arxiv.org/pdf/2410.103...

1 year ago 0 0 1 0

The main question about the current LLM “reasoning” research is what to do next. Most go into synthetic generation and training on maybe with self-Refinement in hopes the model becomes better. I think we are missing controlled task formalization, step by step reasoning and strict step verification.

1 year ago 24 3 5 1

Thanksss!!!!!

1 year ago 1 0 0 0
Advertisement
Video

1/ Introducing ᴏᴘᴇɴꜱᴄʜᴏʟᴀʀ: a retrieval-augmented LM to help scientists synthesize knowledge 📚
@uwnlp.bsky.social & Ai2
With open models & 45M-paper datastores, it outperforms proprietary systems & match human experts.
Try out our demo!
openscholar.allen.ai

1 year ago 161 39 6 8

Hi I’d love to be added as well!🙋🏼‍♀️

1 year ago 0 0 1 0

Hey, I’m available! However, I can’t send you a dm since it’s restricted to followers. If you could send me a message instead, that’d be great!

1 year ago 0 0 0 0
Post image

I’ll be travelling to London from Wednesday to Friday for an upcoming event and would be very happy to meet up! 🚀
I'd love to chat about my recent works (DeCoRe, MMLU-Redux, etc.). DM me if you’re around! 👋

DeCoRe: arxiv.org/abs/2410.18860
MMLU-Redux: arxiv.org/abs/2406.04127

1 year ago 11 7 0 0

dm-ed you!

1 year ago 1 0 0 0

Added! Thanks!!

1 year ago 0 0 0 0

I made a starter pack with the people doing something related to Neurosymbolic AI that I could find.

Let me know if I missed you!
go.bsky.app/RMJ8q3i

1 year ago 92 36 16 2

Hi I would love to be added as well!!

1 year ago 1 0 1 0

Hi, I would love to be added as well!

1 year ago 1 0 0 0
Advertisement

Hi, I’d love to be added as well!

1 year ago 1 0 0 0

Hi, I’d love to be added, thanks!!!

1 year ago 0 0 0 0