Hardware performance counters are great but you usually only use them on one-off whole program debug runs. I've prototyped sampling them at every OxCaml Runtime Events span instead, at almost no cost. Eventually this could surface issues across a whole fleet of services. toao.com/blog/free-pe...
Posts by Sadiq Jaffer
Last week Frank Feng and I joined Robin Cole on his satellite image deep learning podcast to talk about Tessera: how it works, how it differs from existing models, and our future plans. Thanks for hosting us, Robin!
www.satellite-image-deep-learning.com/p/tessera-a-...
It still surprises me you can get good performance with even tiny models on top of TESSERA embeddings. Here's how to find solar farms in the UK with a small ~42k CNN: toao.com/blog/earth-o...
There's relatively little LLM training data for niche languages and this causes poorer coding agent performance. I think this is an existential threat for smaller language communities like OCaml.
My talk at the OCaml workshop gave some actionable steps to mitigate that: toao.com/blog/ai-exis...
Every OCaml talk needs a pun, and @sadiq.toao.com is no exception #icfpsplash25
Our lightning talks session opens with @sadiq.toao.com demonstrating TESSERA, their new geospatial foundation model that is FAIR and global #icfpsplash25
Not how I expected to make my @arstechnica.com debut but I'll take it arstechnica.com/ai/2025/09/c...
Fun field trip today trying to validate a colleague's bramble detecting model: toao.com/blog/can-we-... with @anil.recoil.org
A good point. Was being generated but not linked anywhere. Fixed now. Thanks!
Some fun OCaml GC projects here with @sadiq.toao.com and @kcsrk.info if any students are looking for projects involving programming languages toao.com/blog/ocaml-0...
The most incredibly fun part of this Nature comment on evidence synthesis we published today is that the cartoonist (David Parkins) also did Beano and Dennis the Menace (!) A true legend. www.nature.com/articles/d41...
The rapid rise in AI-generated fraudulent academic papers is "poisoning" scientific literature, say Cambridge researchers in Nature magazine today. But though AI is the problem, it could also help in ensuring the integrity of scientific discovery... buff.ly/AuSNcGd
@anil.recoil.org @sadiq.toao.com
I'm pleased to announce OxCaml!
OxCaml is Jane Street's branch of OCaml. We've given it a new name and a snazzy logo, and done a bunch of work to make it easy for people to try.
New paper out today on how the careful design of LLMs is crucial for expert-level evidence retrieval in conservation (but with implications for any evidence synthesis pipeline across other fields) 🌍 doi.org/10.1371/jour... and anil.recoil.org/news/2024-ce... for a summary
One thing I probably should highlight more in the post is that the proprietary models (like Claude and Gemini) that most students currently have access to can already ace the assignments.
This is a thorny question and mostly comes down to what we're trying to teach. I wonder if a progressive approach where at early stages of teaching there is no automatic tooling but as critical skills are learnt more can be automated. It's a bit of a moving target at the moment though.
Just how good are locally hostable code models on Cambridge first year OCaml assignments? @anil.recoil.org , @jon.recoil.org and I wanted to find out, so ran some tests. TL;DR Qwen3 means we might need new assignments. toao.com/blog/ocaml-l...
If you are using llama.cpp, here's a workaround using grammars for getting JSON structured output from Deepseek R1 and distills: toao.com/blog/json-ou...
Part of our @ai.cam.ac.uk project on AI in Conservation was published in TREE today. We gathered conservation scientists and AI experts and looked at the key conservation areas AI could revolutionise: www.cell.com/trends/ecolo...
Working to surface challenges faced by folks at the coal face.
Data in research contributions from @orbenamy.bsky.social @sadiq.toao.com @scotthosking.bsky.social Stefan Scholtes, Vasco Carvalho, Mireia Crispin and a foreward with Jess Montgomery @dianecoyle1859.bsky.social @ginasue.bsky.social
New preprint from our work on using LLMs to accelerate conservation evidence synthesis across millions of papers. We crosscheck 3 retrieval strategies against 10 LLMs and benchmark against human experts and find quite a bit of variance https://www.researchsquare.com/article/rs-5409185/v1