Mathys Grapotte (@grapottem) Bsky

🎉 That’s a wrap! The International CRG-BI Postdoc Symposium has come to an end after three incredible days of inspiring talks, engaging discussions, and valuable connections. I have to say I couldn’t be happier! #CRGBIpostdocs

A 🧵 below 👇🏼

1 year ago 15 5 2 4

Super impressive work by @angelp.bsky.social and colleagues at AWS / ARM on porting #Bioconda packages to #arm64 - there's potential for some big savings on compute as we scale this up!

Check out the @nf-co.re #arm64 channel for more and to get involved in the effort...

1 year ago 5 3 1 0

Funny thing is that if you type TRUE + !TRUE you get the proper answer (1).

1 year ago 0 0 0 0

Mathys Grapotte: STIMULUS : A nextflow-based pipeline for training deep learning models YouTube video by Nextflow

here is the talk if you are curious

1 year ago 0 0 0 0

Wow honored to get a shoutout on the podcast ! Talking at the nextflow summit was a great experience, and I encourage you to apply for a talk at the next iterations

1 year ago 4 2 1 0

thanks Ian !

1 year ago 1 0 0 0

So no stable release yet (still in dev), we are trying to make this extremely easy to contribute, add stuff to, we have an active slack channel, open dev hours every Wednesday from 2pm to 6 pm CET, we participate to all nf-core hackathons, so a big community effort and super easy to join in !

1 year ago 0 0 0 0

GitHub - nf-core/deepmodeloptim: Stochastic Testing and Input Manipulation for Unbiased Learning Systems Stochastic Testing and Input Manipulation for Unbiased Learning Systems - nf-core/deepmodeloptim

Happy to discuss this further :) maybe a slack/discord channel or zoom talk etc.

(Also, we are trying to build such framework already for a while at github.com/nf-core/deep... , currently pipeline is moving a lot because we are in the process of porting of our code to nf-core)

1 year ago 2 0 1 0

nf-core in numbers Measuring activity across the nf-core community

I think @nextflow.io and @nf-co.re is the best place to build this at because it has all the qualities (open source, large community already - 8k developers on slack,, performant, easy to use etc.). and already has all the bio software we need to process raw data (mappers, aligners, etc.)

1 year ago 3 2 1 0

If framework is performant, easy to use, flexible, easy to contribute to, easy to understand, good looking etc. I bet proper guidelines will come naturally (and will vary based on use cases, which has always been in software).

1 year ago 0 0 1 0

Such a framework shouldn’t impose guidelines on users, only provide a convenient way to run all kinds of tests on a research prototype (so different from the models folks use in clinic applications).

1 year ago 0 0 1 1

So I think the best solution will come in the form of a robust test framework (analog to pytest in python for instance), which could do both unittests (theory tests on architecture + training “soundness” ) and integration tests (downstream tests).

1 year ago 0 0 1 0

I also disagree with these takes, seeing Meta’s talk at the ray summit for instance, I think those companies have robust stat eval pipelines in place (one does not press a $100m button without knowing what will come out of it).

1 year ago 0 0 1 0

However (probably an unpopular opinion), I think that a paper is not a good enough vessel for those “guidelines”. I agree with the OP’s point here even though I think the guidelines of that paper are quite superficial.

1 year ago 0 0 1 0

The Principles of Deep Learning Theory This book develops an effective theory approach to understanding deep neural networks of practical relevance. Beginning from a first-principles component-level picture of networks, we explain how to d...

The principles of Deep Learning theory book is also a test gold mine. Generally, I think NTK is a good probabilistic framework for designing evaluations of DL models.

1 year ago 1 0 1 0

Or here cosine similarity between sorted singular vectors detecting structural shift after fine tuning.

1 year ago 0 0 1 0

Inheritune: Training Smaller Yet More Attentive Language Models Large Language Models (LLMs) have achieved remarkable performance across various natural language processing tasks, primarily due to the transformer architecture and its self-attention mechanism. Howe...

Thankfully, with theory making progress, there are many things we can test prior and during training, for example here : evolution of matrix rank as a proxy for information quantity in layers

1 year ago 1 0 1 0

So, I think we should have a “software” approach i.e. deep learning code being vetted before or as it is running.. This could still be useful because training from scratch on a couple of batches might be enough to detect issues (and cost effective).

1 year ago 0 0 1 0

However, downstream tests do not pinpoint issues well enough (i.e. is the training data the issue? is it the code? instability?...)

1 year ago 0 0 1 0

First from this post and @cwognum.bsky.social , @ihaque.bsky.social :
I agree that downstream test development is useful, and extra convenient. (no need to retrain, can think of model as black box, gives interesting bio insights etc.)

1 year ago 0 0 1 0

I saw the discussion on #BioMLeval pop up thanks to this post and @ianholmes.org. I think this is an interesting + extremely valuable discussion - super happy to see people interested in bioML eval.

1 year ago 1 0 1 0

I am very interested, actually at CRG and within the @nf-co.re organisation, we are building an open source framework that will have all those tests built in (it's in my bio). For that purpose, I collected many papers from various ML fields and would love to share/discuss

1 year ago 4 0 0 0

There are many more intricate hypothesis that I could think of, I think this is the right application of LLMs in bioml.

1 year ago 0 0 0 0

There are lots of things we could do with this
- Are pathogenic variants less expected by *insert LLM method* than non-pathonegic variants?
- Is perplexity lower for notably conserved regions ?
- Can this be used to find conserved regions in new genomes ?

1 year ago 0 0 1 0

Imo, one of the most interesting figures (S6.b) is buried in the supp. data.

From how I understand it, "if evo is good at predicting the next token in that sequence, then if it makes a mistake, it is likely due to an unexpected and impactful variant"

1 year ago 0 0 1 0

Posts by Mathys Grapotte