π¨ You are only evaluating a slice of your test-time scaling model's performance! π¨
π We consider how modelsβ confidence in their answers changes as test-time compute increases. Reasoning longer helps models answer more confidently!
π: arxiv.org/abs/2502.13962
Posts by Marc Marone
This post has a bit more context: bsky.app/profile/marc...
But you should be able to use this list bsky.app/profile/did:... - it seems like you can view feeds for individual starter packs, but they're limited in size. I use a script to sync 2 starter packs and the list instead
added!
New Workshop on Multimodal Augmented Generation via MultimodAl Retrieval (MAGMaR) to be held at @aclmeeting.bsky.social ACL in Vienna this summer. We have a new shared task that stumps most LLMs - including ones pretrained on our test collection. nlp.jhu.edu/magmar/
All added now! π Unfortunately it seems a lot of the starter pack momentum is gone, but the feed still shows everyone's activity
Lots of people had asked to be added, so I played around with the API and made a script that lists everyone who engaged on a post. My API usage probably isn't ideal, but some example scripts here: github.com/ruyimarone/b...
And here are links each of the starter packs:
V2: go.bsky.app/Db3Vjs3
V1 [Full]: go.bsky.app/vju2ux
The second one has plenty of spots to add more folks!
The ML/NLP grad student starter packs grew fast! I had to make a second one. Here's a list you can use to view combined posts from both packs: bsky.app/profile/marc...
You can "Pin to home" to see it as a tab. Looks like students are getting ready for NeurIPS soon π
Added! I also just synced the list: bsky.app/profile/marc...
Which can be viewed as a feed. Everyone in this list is part of one of the packs.
oh no, who did I mess up? No dunks intended, there might be some postdocs on here too π
π¦ (playing with the apis to curate lists)
Thanks to everyone who reached out. I have a script to gather everyone who engaged and I'll add the next batch of folks soon. If the first starter pack fills up I'll make another and keep all synced to the list!
I think I got every student who commented or engaged, dm me if I missed you
Also made a corresponding list to view the posts: bsky.app/profile/did:...
There's some overlap with other starter packs, but many of those are full and I've seen a BUNCH of grad students joining bluesky this week, myself included!
I noticed a lot of starter packs skewed towards faculty/industry, so I made one of just NLP & ML students: go.bsky.app/vju2ux
Students do different research, go on the job market, and recruit other students. Ping me and I'll add you!
Giving this a shot, made a starter pack of just students! go.bsky.app/vju2ux
And a matching list to get a feed view: bsky.app/profile/did:... (maybe there's an easier way to set this up?)
Meeting notes with Cody this week: "do you think factorio space age was a psyop for ai slowdown?"