Announcing AlphaEvolve, our new LLM coding agent that has
- made new scientific discoveries
- discovered algorithms that are now deployed at Google (in Gemini, Transformers, TPU hardware design & data centers)
Blog: deepmind.google/discover/blo...
White paper:
storage.googleapis.com/deepmind-med...
Posts by Swarat Chaudhuri
One of my PhD students got their visa revoked. I know of other cases amongst my AI colleagues. This is not what investing in US leadership in AI looks like.
www.aljazeera.com/news/2025/4/...
Congrats to UT computer scientist Swarat Chaudhuri & UT physicist Feliciano Giustino who were named as Guggenheim Fellows for 2025!
#GuggFellows2025 @guggfellows.bsky.social @utaustin.bsky.social @swarat.bsky.social
cns.utexas.edu/news/accolad...
I am honored to be part of the #guggfellows2025 class. My Guggenheim project is on AI systems that can discover new math in an open-ended way. Many thanks to my students, colleagues, and mentors, who inspire me every day and without whom this work wouldn't be possible. www.gf.org/stories/anno...
Harvard has set an example for other higher-ed institutions - rejecting an unlawful and ham-handed attempt to stifle academic freedom, while taking steps to make sure students can benefit from an environment of intellectual inquiry, rigorous debate and mutual respect. Let’s hope others follow suit.
The #LeanLang Standard Library, under active development at the Lean FRO, envisions providing a reliable and extensible basis for #softwaredevelopment, #softwareverification and #mathematics through verified components, a high-quality API, performance optimization, and best-in-class documentation.
Calling all scientists and students based in London!
standupforscience2025.org and local groups are organizing rallies around the US to protest against the new administration’s massive and indiscriminate funding cuts to all manner of scientific research…👉🏼🧵
#sciencematters #standupforscience #london
Congrats to @amitayush.bsky.social for leading this effort. And thanks to my student George Tsoukalas and collaborator extraordinaire @gregdnlp.bsky.social, who made critical contributions to the work. (3/3)
It also has built-in machinery for large-scale, neurally guided proof search. We show that Proofwala's multilingual capabilities can enable transfer across proof assistants. Specifically, our multilingual model can outperform Coq- and Lean-only models at standard proof synthesis metrics. (2/3)
Excited about Proofwala, @amitayush.bsky.social's new framework for ML-aided theorem-proving.
* Paper: arxiv.org/abs/2502.04671
* Code: github.com/trishullab/p...
Proofwala allows the collection of proof-step data from multiple proof assistants (Coq and Lean) and multilingual training. (1/3)
Upon learning that yesterday would be my last day as a program officer at the National Science Foundation, I shared this parting message with my colleagues. The next few months will be frenetic and stressful for them. Here are some things that you can do to help them with the mission ahead. (1)
DARPA released a Request for Information (RFI) that seeks community feedback on the draft DARPA Guide to Formal Methods to Deliver Resilient Systems for Proposals (“the FMDRS Guide”). You can find the RFI here on Sam.gov.
Details in the image...
Proving the Coding Interview: A Benchmark for Formally Verified Code Generation
“We introduce the Formally Verified Automated Programming Progress Standards, or FVAPPS, a benchmark of 4715 samples […] including 1083 curated and quality controlled samples”
arxiv.org/abs/2502.05714
Can LLMs be used to discover interpretable models of human and animal behavior?🤔
Turns out: yes!
Thrilled to share our latest preprint where we used FunSearch to automatically discover symbolic cognitive models of behavior.
1/12
This is the most relevant article to NIH and research cuts I’ve seen.
Imagine if this was today , how many people would be saying “Why are we studying Gila Monsters and their impact on diabetes ? That’s wasted money !”
globalnews.ca/news/9793403...
Super excited: my new @darpa program on AI for pure mathematics!
Exponentiating Mathematics (expMath) aims to accelerate the rate of progress in pure math through the development of an AI collaborator and new professional-level math benchmarks.
sam.gov/opp/4def3c13...
Mathematical proof assistants like Coq and Lean were made possible by a correspondence that established the equivalence between proofs and computation. Read the explainer from our archive:
New: The largest medical A.I. randomized controlled trial yet performed, enrolling >100,000 women undergoing mammography screening
The use of AI led to 29% higher detection of cancer, no increase of false positives, and reduced workload compared with radiologists w/o AI thelancet.com/journals/lan...
This is one more, and such a profound, way of distinguishing between science and technology: "Technology shouts for itself; science [does not]." (And these days, some technologies truly do themselves shout…)
2025 will be #mathsky interesting year!
@ayushkhaitan.bluesky.social, Amitayush Thakur, and I are organizing an #AI4Math panel at the Joint Mathematics Meeting this month. Please spread the word among your math friends! We will post a summary of the discussion after the event.
I really enjoy NASA Administrator (!!!) Michael Griffin on the "Real Reasons" versus the "Acceptable Reasons" to go to the moon: spaceref.com/status-repor...
You make a good point. Alphaproof will evolve just as the informal approaches have, though.
Yeah, I think so, especially if search is permitted at test-time.
From what I have seen, LLMs are quite good at that. There are plenty of examples of definitions being used in various contexts in the training data.
An excellent post by Kevin Buzzard on informal reasoning methods like o3. The key point, one I wholeheartedly agree with, is that informal methods continue to struggle with proof even when they give the correct answers, and this is a critical liability. xenaproject.wordpress.com/2024/12/22/c...
Hmm, I wasn’t imagining they would be connected to the account security people at X. But maybe worth a shot. Thank you!
We are excited about the potential of this approach in
✅ hard, research-level math tasks
✅ deep assurance of software and hardware systems.
This was a team effort with Kaiyu Yang, Gabriel Poesia, Jingxuan He, Wenda Li, Kristin Lauter, and Dawn Song. Please reach out to us with feedback! (2/2)