Advertisement · 728 × 90

Posts by Mike Smith

Preview
g-Harmony - a Hugging Face Space by astrohayley Which galaxy is right for you?

language ai in space sciences hackathon day 1: given access to real euclid telescope data

day 2: we made the galaxies flirt

👉 hf.co/spaces/astrohayley/gHarmony

1 month ago 1 0 0 0
Post image

On Apple M3, a Linux KDE plasma desktop under Fedora Asahi Remix is now WORKING! Super excited to share this update and happy to answer any questions! Co-credits to noopwafel and Shiz. :)

2 months ago 412 69 15 11
Post image

New AstroPT models are out 🔭🎉 This time trained with an improved DESI galaxy image dataset. Link here: huggingface.co/Smith42/astr...

Check out these new scaling curves!

We are still seeing improvement at 800M parameters where before we stalled at 100M. Maybe high quality data is all you need 🤔

10 months ago 4 1 1 0
Preview
Euclid Quick Data Release (Q1) Exploring galaxy properties with a multi-modal foundation model Modern astronomical surveys, such as the Euclid mission, produce high-dimensional, multi-modal data sets that include imaging and spectroscopic information for millions of galaxies. These data serve a...

Anyways, here's the paper - it's one of the first big uses of foundation models in astronomy that I'm aware of, and it seems to have worked really well! #extragalactic #astrocode 🧪

1 year ago 14 3 0 0
Post image

Ooh we scalin'

10 months ago 0 0 0 0
This side-by-side plot compares two visualizations of the same dataset’s embeddings: on the left, the original embeddings; on the right, latent representations generated via a transformation method (here labeled vec2vec).

Left: Original Embeddings
	•	Two clearly separated clusters of red and green points.
	•	The clusters represent two distinct groups (e.g., classes, domains, or modalities).
	•	Gray lines show strong alignment or correspondences between red and green points—suggesting some shared structure or matched pairs.
	•	However, the clusters are far apart, meaning the original embedding space encodes strong domain-specific separation (e.g., red and green are treated as different).

Right: Latent Representations (vec2vec)
	•	The same points are now more uniformly mixed in latent space.
	•	The tight clustering by color is gone; red and green points are distributed throughout.
	•	This suggests the vec2vec method has projected both groups into a shared latent space, removing domain bias and aligning semantically similar items regardless of origin.
	•	It’s indicative of embedding alignment, domain adaptation, or representation unification, where cross-domain items are mapped closer together based on semantic similarity.

Implication:

vec2vec successfully transforms the original domain-specific embeddings into a common space where structural similarity dominates over origin (color), enabling better transfer, comparison, or fusion between domains.

This side-by-side plot compares two visualizations of the same dataset’s embeddings: on the left, the original embeddings; on the right, latent representations generated via a transformation method (here labeled vec2vec). Left: Original Embeddings • Two clearly separated clusters of red and green points. • The clusters represent two distinct groups (e.g., classes, domains, or modalities). • Gray lines show strong alignment or correspondences between red and green points—suggesting some shared structure or matched pairs. • However, the clusters are far apart, meaning the original embedding space encodes strong domain-specific separation (e.g., red and green are treated as different). Right: Latent Representations (vec2vec) • The same points are now more uniformly mixed in latent space. • The tight clustering by color is gone; red and green points are distributed throughout. • This suggests the vec2vec method has projected both groups into a shared latent space, removing domain bias and aligning semantically similar items regardless of origin. • It’s indicative of embedding alignment, domain adaptation, or representation unification, where cross-domain items are mapped closer together based on semantic similarity. Implication: vec2vec successfully transforms the original domain-specific embeddings into a common space where structural similarity dominates over origin (color), enabling better transfer, comparison, or fusion between domains.

Strong Platonic Representation Hypothesis

All embedding models, given large enough scale, can be translated between them without paired data

Security implication: Embeddings aren’t encryption, they’re basically plain text

arxiv.org/abs/2505.12540

10 months ago 49 8 6 4
Post image
11 months ago 1 0 0 0
Post image Post image

If you add "also Cthulhu-y" to the prompt, the results are pretty great.

11 months ago 65 6 3 0

A great dataset to round out UTBD's 2^2 week 😎

1 year ago 0 0 0 0
Advertisement
Post image

📢 New dataset out!

We introduce HypoGen💥, a dataset of ~5.5K structured problem–hypothesis pairs (Bit–Flip–Spark + Chain‑of‑Reasoning) to advance LLM-driven scientific ideation💡.

Fine‑tuned LLaMA 3.1 8B & R1‑distilled models show significant gains. Humans are still the best🥇.

1 year ago 5 4 1 1

Was great fun cooking this up with Sharaf and team! Check out all the code at github.com/UniverseTBD/... and paper at arxiv.org/abs/2504.08583

1 year ago 2 2 0 0
Post image

🎉 HAPPY BIRTHDAY, UniverseTBD! 🚀
As we turn 2, we’re going 2^2.
Launching a new project per day for the next four days.
We hope that you all enjoy these works as much as we have enjoyed working on them. Stay tuned for the big reveals!

1 year ago 3 1 1 0

tariffs getting so bad you can't even import numpy 🥲

1 year ago 2 0 0 0
Post image

me: i didn't know you were cool like that
val loss:

1 year ago 0 0 0 0

me: go left! ←←
my computer: best i can do is ^[[D

1 year ago 0 0 0 0

Going to be a great talk 😎

1 year ago 1 0 0 0
Preview
Multiband Embeddings of Light Curves In this work, we propose a novel ensemble of recurrent neural networks (RNNs) that considers the multiband and non-uniform cadence without having to compute complex features. Our proposed model consis...

arxiv.org/abs/2501.12499 super cool paper! Extracting useful information from astro time series via RNNs

1 year ago 0 0 0 0
Advertisement
Post image

With r1 and o1, Yann Lecun's cake is now baked and ready

1 year ago 0 0 0 0

The final frontier for AI will be anything that can't be captured via a quantitative benchmark

1 year ago 0 0 0 0

broke: reading 500 page AI safety papers

woke: learning AI alignment best practices from "wallace & gromit: vengeance most fowl"

1 year ago 3 0 0 0
Post image

first time spotting AI art in the wild

just look at that floating ship!

1 year ago 1 0 0 0
Post image Post image

the nvidia digits case is so british-housing-core

1 year ago 0 0 0 0
Preview
Patience is a virtue: Cooperative people have lower discount rates Reciprocal altruism involves foregoing an immediate benefit for the sake of a greater long-term reward. It follows that individuals who exhibit a stro…

This makes so much sense – more cooperation when more willing to wait for rewards ~= less risk averse.

www.sciencedirect.com/science/arti...

1 year ago 6 2 0 0
An M dwarf star with the text: "I am not a toy. I am not a Christmas present. I am a 10 trillion year commitment"

An M dwarf star with the text: "I am not a toy. I am not a Christmas present. I am a 10 trillion year commitment"

Think twice before gifting someone an M dwarf this holiday season

1 year ago 7691 1390 117 52

I move to refer to `'Gold OA' as 'Pay to Publish'

1 year ago 13 5 0 1

o3 got me thinking about the future of selling our labour as code... how many more iterations until that's transformed? 😅

1 year ago 1 0 0 0
Advertisement
Post image

new shoggoth just dropped 😤😤

1 year ago 2 0 0 0
Hatfield: The Fifth Most Boring 'City' in the World | Documentary
Hatfield: The Fifth Most Boring 'City' in the World | Documentary YouTube video by Callum Oakaby Wright

lets goo hatfield, great doc about my hometown: www.youtube.com/watch?v=IQYt...

1 year ago 0 0 0 0
Post image

POV: me getting on social media

1 year ago 13 1 0 0