Posts by Andrew Roger
This will be really interesting!
New OpenFold3 preview out! (OF3p2)
It closes the gap to AlphaFold3 for most modalities.
Most critically, we're releasing everything, including training sets & configs, making OF3p2 the only current AF3-based model that is functionally trainable & reproducible from scratch🧵1/9
Nevermind...in NCBI it says ~105 Mb in 1400 scaffolds.
What is the Trichoplax genome size estimate?
New preprint out! Using ~75k environmental OTUs + 77 fossil calibrations, we reconstructed a Proterozoic timeline of eukaryote evolution. Our results show crown eukaryotes were already diversifying >1.6 Ga, long before the first undisputed fossils (~1.05 Ga).
🔗 DOI: www.biorxiv.org/content/10.6...
illustration of various aspect of biology of the microbe discussed in the article
New #ISEPpapers! The genome sequence of the gastric gland parasite, #Cryptosporidium proliferans: Monika Wiśniewska et al. www.sciencedirect.com/science/arti...
#Protists #Parasites #Microbes #Genomics
a simplified evolutionary tree of eukaryotes with pictures of various microbes
New #ISEPpapers #preprint by @deemteam.bsky.social: Re-evaluating the eukaryotic Tree of Life with independent phylogenomic data www.biorxiv.org/content/10.6...
#Protists #Microbes #Evolution #Eukaryotes #TreeOfLife #Phylogeny #Phylogenomics #Bioinformatics #Algae
We stumbled upon an unusual gene called 'rqua' in the genome of some freshwater sponges. In microbes, this gene confers the ability to make a ubiquinone analog - rhodoquinone - which can help the respiratory chain run without oxygen.
From Pisani et al: CAT-GTR is one of the most flexible models in the phylogenomic arsenal.
Phylogenomic mixture models outperform homogeneous and partitioned models
academic.oup.com/mbe/advance-...
I thought maybe @nicolasgaltier.bsky.social was asking about whether Canadian French accent when speaking French has similarities to Canadian English accent when speaking English? Is that right Nicolas?
My guess is "no", but I'm not an expert on the various french accents in Canada (Québécois, Acadien, Franco-Albertan etc.). @christianlandry.bsky.social what are your thoughts on this?
Yes!
Welcome. The Canadian phonetics explanation was an unnecessary addition.
Differently. "Bias" -> "Buy Ass". "Bayes" -> "Baes". For Canadians "Bayes" = "B-ehs".
Very useful article.
MBE Call for Papers on the Major Transitions of Life
MBE is excited to launch our newest Call for Papers on the Major Transitions of Life, covering all aspects of phylogenomic research.
🔗 academic.oup.com/mbe/pages/call-for-papers-on-the-major-transitions-of-life
Guest Editors:
Davide Pisani
Anja Spang
#evobio #molbio #phylogenetics
This looks interesting!
Anecdotally, I remember explaining parsimony methods to a statistician colleague in the early 2000s...and they just looked bewildered as they didn't seem like a sensible statistical approach to estimation so wondered why on Earth someone would propose them.
I suppose all of what I said above depends on what might be considered a "relevant difference". I've heard the cladist arguments about this ad nauseum...
I'd guess that statisticians would not see them as guilty until proven innocent in the phylogenetic setting. Rather they'd think that, unless there is a relevant difference between the phylogenetic setting and more classical statistical problems, these methods should perform well 'off the shelf'.
For example, I think that there would be an a priori expectation that, based on their properties in more classical statistical settings, both maximum likelihood and Bayesian approaches to phylogenetic estimation would perform well if the models adequately described real data
Although I get the context of this quote (and agree with it in that context), if we just set aside the whole cladistics/phenetics/probabilistic debates in the early days, I'm not sure we'd think this way about modern phylogenetic approaches.
If you want to know more about the connection between AIC and cross-validation check the following paper out. Warning though -- some of the math is a bit tough going (for me to, even though I'm a co-author): academic.oup.com/mbe/article/...
AIC is an approximation to the expected predictive log-likelihood and is therefore getting at the same thing as cross-validation. In practice, if a model has the best AIC and the best BIC out of competing models (including simpler ones), then usually you don't have to worry about overfitting.
In practice if you are still worried about overfitting, you can always try splitting datasets into training and test sets to see how well your model fitted on the training data predicts the test data. Cross validation approaches are really helpful in this regard.
The above situation is quite similar in spirit to conducting penalized likelihood analysis to control overfitting. So I think you can roughly think of the summation constraint for weights of mixture models as functioning as a penalty for overfitting.
For example, to optimize mixture model weights with a quasi-Newton approach like L-BFGS, you can actually just constrain the weights as being only >0 and instead optimize the likelihood with a penalty of n*(Sum_i w_i - 1). This ends up guaranteeing that in the optimal solution the weights sum to 1
So the model 'self-compacts' in a sense if there isn't much evidence for particular mixture components. Parameters in components with weights at zero don't factor into the parameter count and don't contribute to model flexibility. This sounds mysterious, but it comes from the summation constraint.
This ends up turning the optimization of the mixture into something akin to a penalized likelihood setting. If the mixture is too rich, then often one or several of the components will end up with weights of 0 (or close to the lower bound for weights the software implementation you are using).