Advertisement · 728 × 90

Posts by UniverseTBD

Post image

New AstroPT models are out 🔭🎉 This time trained with an improved DESI galaxy image dataset. Link here: huggingface.co/Smith42/astr...

Check out these new scaling curves!

We are still seeing improvement at 800M parameters where before we stalled at 100M. Maybe high quality data is all you need 🤔

10 months ago 4 1 1 0

HypoGen concludes our 2^2 fest and we truly hope you enjoyed it 🎇✨. Thank you for your great support with our mission to democratise science for everyone🌍.

1 year ago 0 0 0 0

8/n A huge thank you to our partners at @msftresearch.bsky.social for enabling our research through the AFMR grant. And to our many friends around the world in academia and industry - this wouldn't be possible without your support 🙏.

1 year ago 0 0 1 0

Huge thanks to Pranav Agarwal for the last minute eval request, we couldn't have done this without you! 💫

1 year ago 0 0 1 0

7/n HypoGen was led by the absolute star @charlesoneill.bsky.social working with our wonderful mentors Tirthankar Ghosal, Roberta Raileanu, Mike Walmsley, Thang Bui, @kevinschawinski.bsky.social, @errai34.bsky.social and our team🚀.

1 year ago 1 0 1 0

6/n Future directions: expand HypoGen to domains like astrophysics, biology, materials science, and build AI that doesn’t just answer questions but sparks them. 🔭🚀

Let us know here if you want to dive in & let’s push scientific discovery forward!
#HypoGen #AI4Science #DemocratisingScience

1 year ago 1 0 1 0

5/n Humans come out on top (~85% win rate) - a comforting result that hints at a vision for the future when AI and human researchers work together to advance scientific discovery 🤝.

1 year ago 1 0 1 0

4/n We fine‑tuned LLaMA 3.1 8B and its R1‑distilled variant on HypoGen (4‑bit quant + LoRA), then evaluated with perplexity, IAScore, and a couple of LLM judges coupled with human verification. We obtain significant gains in hypothesis novelty & feasibility with transparent reasoning steps! 🚀

1 year ago 1 0 1 0
Advertisement

3/n HypoGen deets:

• 5,478 samples from NeurIPS 2023 & ICLR 2024
• JSON fields: bit, spark, flip, chain_of_reasoning
• Extraction courtesy of @OpenAI's tireless o1 model (no coffee required… maybe). 🤖☕

1 year ago 0 0 1 0

Paper📄: arxiv.org/pdf/2504.12976
Dataset 🤗: huggingface.co/datasets/Uni...

1 year ago 0 0 1 0

2/n Where’s the creativity? HypoGen reframes scientific hypothesis generation as a conditional LM task: feed it the Bit (problem) → get the Spark (4–6 word insight), Flip (solution), plus an explicit Chain‑of‑Reasoning (How did the Bit turn into the Flip). 🧠🔗

1 year ago 3 0 1 0
Post image

📢 New dataset out!

We introduce HypoGen💥, a dataset of ~5.5K structured problem–hypothesis pairs (Bit–Flip–Spark + Chain‑of‑Reasoning) to advance LLM-driven scientific ideation💡.

Fine‑tuned LLaMA 3.1 8B & R1‑distilled models show significant gains. Humans are still the best🥇.

1 year ago 5 4 1 1

This work was supported by Microsoft's Accelerating Foundation Models Research program and the ITER Teide HPC cluster. Thanks to all collaborators across our many institutions!

1 year ago 3 0 0 0
Join the UniverseTBD Discord Server! Check out the UniverseTBD community on Discord - hang out with 161 other members and enjoy free voice and text chat.

For researchers wanting to collaborate, we're available at discord.gg/PUR2FbFRZ4 and our DMs are open. Check out our code at w3id.org/UniverseTBD/..., and come find us at SCI-FM@ICLR if you would like to chat in person!

1 year ago 3 0 1 0
Join the UniverseTBD Discord Server! Check out the UniverseTBD community on Discord - hang out with 161 other members and enjoy free voice and text chat.

We see a future where multimodal models can reason across astronomical data types beyond just imagery: from spectra to light curves to data cubes.

1 year ago 3 0 1 0

We've evaluated AstroLLaVA on the Galaxy 10 DECaLS dataset and are releasing the model weights, code, and training dataset under the MIT license to support open science and further development by the community.

1 year ago 3 0 1 0
Post image

Our two-stage fine-tuning process adapts the model for both image captioning and visual question answering in the astronomy domain, making complex astronomical concepts more accessible through natural conversation

1 year ago 3 0 1 0
Post image

We fine-tuned LLaVA on ~30k astronomical images with captions & QA pairs from NASA APOD, ESO, and Hubble archives to create a model that understands astronomical concepts in visual form 👉 hf.co/datasets/UniverseTBD/AstroLLaVA_convos

1 year ago 3 0 1 0
Advertisement
Post image

Excited to announce our new paper as part of our 2^2 week: AstroLLaVA, a vision language model for astronomy that enables natural dialogue with astronomical imagery! Shout out to Sharaf Zaman for leading this work arxiv.org/abs/2504.08583 🔭☄️

1 year ago 7 3 1 2

The UniverseTBD is extremely grateful to our partners at @msftresearch.bsky.social for their continuous support that enables our research.

1 year ago 1 0 0 0

For more updates and behind-the-scenes breakthroughs, follow us at bsky.app/profile/univ... as we continue to break through the barriers of the sky! 🌌

1 year ago 1 0 1 0

Dive into the full paper and explore the future of hypothesis generation here 👉 arxiv.org/pdf/2504.054...
#HypoGen #AI

1 year ago 0 0 1 0

We break down innovative approaches like direct prompting, adversarial methods, fine-tuning, knowledge integration, and even multi-agent systems, transforming how we turn vast scientific literature into actionable, testable ideas

1 year ago 2 0 1 0

Authored by Atilla Kaan Alkan and the UniverseTBD team, this paper provides a single, comprehensive resource that covers everything from human-centric methods to cutting-edge LLM-driven techniques

1 year ago 0 0 1 0
Post image

Our 2^2 celebration is still in full swing! 🎉
Today we’re launching our latest, must-read survey paper:
“A Survey on Hypothesis Generation for Scientific Discovery in the Era of Large Language Models.”

Check it out! arxiv.org/pdf/2504.054...
🔭

1 year ago 6 0 1 1

bsky.app/profile/astr...

1 year ago 2 0 0 0
Advertisement

For a deep dive into AstroCoder’s capabilities and the journey behind its creation, check out Nolan’s detailed thread. We’ll be tagging him and his thread below – be sure to give it a read for the full story! 📝🔗
#AstroCoder #UniverseTBD

1 year ago 3 0 1 0

This exciting project is the result of a fantastic collaboration with innovative minds. We’re thrilled to have
@astronolan.bsky.social on board, whose expertise has been key to AstroCoder’s creation. His vision is shaping the future of AI-driven research. 🌌✨

1 year ago 1 0 1 0

AstroCoder was built to empower astronomers and tech enthusiasts alike. It not only uncovers hidden gems in specialised repositories, but also helps evaluate how modern AI interacts with the tools that drive astronomical research. 🔭

1 year ago 2 0 1 0
AstroCoder

Our 2^2 celebration continues! 🎉 Today we’re excited to introduce AstroCoder – an AI tool that makes niche astronomy software discoverable, scanning over 2,270+ codebases to deliver technical summaries, installation guides, and code examples.
🚀 Try it now at nolank.ca/astrocoder/

1 year ago 9 0 1 1