Advertisement · 728 × 90

Posts by Dan Roy

Tian and Karolina and team are at ICLR. Come say hi.

1 year ago 8 1 0 0

Curious. Didn’t know meta had a PPL team.

1 year ago 1 0 0 0

I like to think about non-reasoning model responses as vibes.

1 year ago 2 0 0 0

So who’s read the 2027 article? What do you think?

1 year ago 1 0 3 0

Someone has suggested I check out bsky again. So I'm back looking around here. Notification list is kinda boring. So any good conversations going on? Perhaps about LLM/AI reasoning?

1 year ago 34 0 8 1

Of course.

1 year ago 0 0 0 0

Anyone else have the worry that a lot of LLM research is .... just bad psychology?

1 year ago 69 2 10 2

And, to achieve the results in this paper, what was the most challenging part? Why had previous attempts fallen short? What was your key new insight?

1 year ago 2 0 2 0

Very interesting. So, what was the biggest hole to fill, in terms of hypotheses?

1 year ago 2 0 1 0
Advertisement

Okay, so just a few* thoughts (*this got longer as I wrote 😅….long thread)-

1 year ago 52 32 1 7

Acknowledgments.

1 year ago 7 0 0 0
Post image

I got to ski Revelstoke this winter break.

Couple observations: the price of receiving 600 cm of snow by Jan 8 is that it is constantly snowing. Saw almost no sun the whole time and the peak was often in whiteout conditions (though North Bowl was always clear…).

See image for more.

1 year ago 10 0 0 0

Multiple friends have likely lost their homes in Los Angeles. Can’t imagine how disorienting this would be. They had only minutes to flee and grab belongings.

1 year ago 7 0 0 0

What are the key papers to read?

1 year ago 6 0 1 0

OK. Practical question times. How are you adjusting your research given progress in reasoning style models? Also how are you adjusting the way you work?

1 year ago 60 4 11 1

A $100,000,000 experiment is no longer "consequence" free. Ilya is saying "scaling is over", but this may simply be that the scaling "laws" (not laws) are no longer accurate. Also, those laws are tied to hyperparameter tunings.

1 year ago 3 0 1 0

Sure some were empirical. Some were not.

1 year ago 0 0 0 0

I'd say no in a sense. Xavier-He initialization was theoretical work. And that was absolutely critical.

1 year ago 2 0 1 0
Advertisement

Pretraining is not done. It's just that theorists haven't told the hackers how to do it better.

1 year ago 35 0 3 0

Annoying. If it could be automatic, sure.

1 year ago 0 0 1 0

I'd say wait then.

1 year ago 1 0 0 0

That's part of the spec. I don't think this is too problematic. The example they give is problems in NP, where there is a polynomial time checker (i.e., a polytime EV), but generating an instance that passes the checker is hard in the worst case.

1 year ago 1 0 0 0

Now that I've had a taste of X without post length limitations, I've got to say that it is quite annoying have to fit tweets into 256 characters here on bsky. On X, when they get to long, they go below the fold, and so you're still incentivized to make it short. Can't we have that here?

1 year ago 24 1 9 2

Lottery ticket?

1 year ago 3 0 1 0

@gkdziugaite.bsky.social. Works at GDM and Mila. Influential, technical work.

1 year ago 3 0 1 0

OK

1 year ago 0 0 1 0

Many of these sound to be very problematic if you hope that the result would be accepted by the mathematical community. E.g. "The proof appears to use computational evidence (listing out cases) as a substitute for theoretical proof." It seems you're not meeting the usual standard.

1 year ago 0 0 1 0

Please ask Claude. What would likely be the chief criticisms of my argument above were I to submit it to a traditional mathematical journal.

1 year ago 1 0 1 0
Advertisement

I've now read this paper carefully if anyone wants to discuss it.

1 year ago 7 0 4 0

Great analogy.

1 year ago 0 0 0 0