Daphne Kontogiorgos-Heintz (@daphnekh) Bsky

RawBench: A Comprehensive Benchmarking Framework for Raw Nanopore Signal Analysis Techniques www.biorxiv.org/content/10.1101/2025.10....

6 months ago 4 2 0 0

Happy to share that ShapeEmbed has been accepted at @neuripsconf.bsky.social 🎉 SE is self-supervised framework to encode 2D contours from microscopy & natural images into a latent representation invariant to translation, scaling, rotation, reflection & point indexing
📄 arxiv.org/pdf/2507.01009 (1/N)

6 months ago 71 26 3 5

$We present our new preprint titled "Large Language Model Hacking: Quantifying the Hidden Risks of Using LLMs for Text Annotation". We quantify LLM hacking risk through systematic replication of 37 diverse computational social science annotation tasks. For these tasks, we use a combined set of 2,361 realistic hypotheses that researchers might test using these annotations. Then, we collect 13 million LLM annotations across plausible LLM configurations. These annotations feed into 1.4 million regressions testing the hypotheses. For a hypothesis with no true effect (ground truth $p > 0.05$), different LLM configurations yield conflicting conclusions. Checkmarks indicate correct statistical conclusions matching ground truth; crosses indicate LLM hacking -- incorrect conclusions due to annotation errors. Across all experiments, LLM hacking occurs in 31-50\% of cases even with highly capable models. Since minor configuration changes can flip scientific conclusions, from correct to incorrect, LLM hacking can be exploited to present anything as statistically significant.$

We present our new preprint titled "Large Language Model Hacking: Quantifying the Hidden Risks of Using LLMs for Text Annotation". We quantify LLM hacking risk through systematic replication of 37 diverse computational social science annotation tasks. For these tasks, we use a combined set of 2,361 realistic hypotheses that researchers might test using these annotations. Then, we collect 13 million LLM annotations across plausible LLM configurations. These annotations feed into 1.4 million regressions testing the hypotheses. For a hypothesis with no true effect (ground truth $p > 0.05$), different LLM configurations yield conflicting conclusions. Checkmarks indicate correct statistical conclusions matching ground truth; crosses indicate LLM hacking -- incorrect conclusions due to annotation errors. Across all experiments, LLM hacking occurs in 31-50\% of cases even with highly capable models. Since minor configuration changes can flip scientific conclusions, from correct to incorrect, LLM hacking can be exploited to present anything as statistically significant.

🚨 New paper alert 🚨 Using LLMs as data annotators, you can produce any scientific result you want. We call this **LLM Hacking**.

Paper: arxiv.org/pdf/2509.08825

7 months ago 303 106 6 23

For many of those who were asking on BLOW5 vs POD5 for nanopore signal data, here is a finally detailed benchmark we did:
biorxiv.org/content/10.1...
Summary: performance of BLOW5 is >= POD5 (from ~= to 100X, see below), with benefit of having ~3 dependencies instead of >50.

9 months ago 14 9 1 0

Lossless data compression by large models - Nature Machine Intelligence Effective lossless compression requires that frequent patterns in the data can be identified. Li et al. explore using deep learning models to more effectively compress text, audio and video data.

"LMCompress shatters all previous lossless compression records on four media types: text, images, video and audio."

www.nature.com/articles/s42...

11 months ago 30 6 0 1

Analysis by altmetric shows increasing posting of research content on Bluesky but more sharing (reposting) on X.

We need to increase Bluesky connectivity and share more.

1 year ago 16 8 0 0

Toward single-molecule protein sequencing using nanopores - Nature Biotechnology Maglia and colleagues discuss advances in nanopore technology en route to single-molecule protein sequencing

Toward single-molecule protein sequencing using nanopores
www.nature.com/articles/s41...

1 year ago 17 5 0 0

Sequencing by Expansion (SBX) — a novel, high-throughput single-molecule sequencing technology Remarkable advances in high-throughput sequencing have enabled major biological discoveries and clinical applications, but achieving wider distribution and use depends critically on further improvemen...

Roche SBX preprint out

www.biorxiv.org/content/10.1...

1 year ago 31 15 0 1

Roche Xpounds on New Sequencing Technology Bar bets can be a powerful force in human society. One of the best known books on the planet, The Guinness Book of World Records, originate...

Roche Xpounds on New Sequencing Technology

My deep dive on this exciting new entrant

omicsomics.blogspot.com/2025/02/roch...

🧬🖥️

1 year ago 19 11 1 0

Verena Rukes telling us about: Charge-based fingerprinting of unlabeled full-length proteins using an
aerolysin nanopore

1 year ago 2 2 0 0

Timescales in Cell Biology

1 year ago 366 112 8 17

Dr. Margaret Oakley Dayhoff

I took biochem in 2001, and for nearly 20 years read amino acid sequences daily… and I never knew Dayhoff named them or even the logic behind things like Q until last Friday (h/t Mike Janech). Also, this is another big Dayhoff moment for me. She was incredible!

#proteomics #bioinformatics

1 year ago 198 79 14 7

Sky Follower Bridge - Chrome Web Store Instantly find and follow the same users from your Twitter follows on Bluesky.

This worked like a charm to import accounts following on Twitter chromewebstore.google.com/detail/sky-f...

1 year ago 3 0 1 0

Posts by Daphne Kontogiorgos-Heintz