It’s that wonderful time of year again. A new GTDB release is out :)
Posts by Augustus (Gus) Pendleton
Unbelievably grateful that I get to work with these folks at the Lake Superior National Estuarine Research Reserve!
I defended my PhD 10 years ago today. That was the least remarkable thing that happened that day. Sharing something I wrote about it last year. Since then "the horrors persist but so do we." And with dignity.
Just a moment of gratitude: every industry scientist I've reached out to (even cold-calling on LinkedIn!) has been willing to chat with me, and been incredibly generous with their time, contacts, and advice. Makes a scary job search feel so much more welcoming!
COOL!
I just had that conversation earlier this week. The college dean is deciding if they will (dis)continue the bioinformatics training program for grad students. An argument for discontinuation is that students will use GenAI to help them code, so they don’t need to learn bioinformatics.
Doing a lot of testing of AI agents with my team these days. Pulled together a new package, wf, which helps install and manage agent skills from within #rstats
(A great time to be at Princeton DDSS; our team has an intro to Claude workshop today and an intro to Posit Assistant workshop next week!)
Those who know me know how big a deal this is for me...
Spent a few hours using Github Copilot in VS Code to prepare a new shiny app that complements our recent UniFrac paper! Excited to release this soon, including my code, skill files, and reflections on AI coding (from a certified AI-grump).
I always try and present like I'm about to tell you the juiciest gossip about bacteria
WOAH!! Catherine Lozupone and Rob Knight cited my paper!! this is actually mind blowing to the microbial ecologist nerd that I am 🤯🥳🤓 Huge credit to Jianshu Zhao for an incredible innovation
www.biorxiv.org/content/10.6...
I take data types very seriously
I finally spent some time sprucing up my personal website. Feel free to see what I've been up to!
gus-pendleton.github.io
A Shapiro-Wilk test of the response variable concludes very significant deviation of Normality. But residuals of linear model consistent with Normal distribution.
Visual check of the linear model with DHARMa
Periodic reminder that we should avoid testing the Normality of the response variable.
For a linear model, what matters is the Normality of residuals (and not that much). Visual checks better than test. #statistics
Where I ended up instead is binarizing the two variables and testing the probability of them both "peaking" together against a set of permuted datasets
I stumbled into defining a variable ("Active" vs. "Inactive") defined by high X OR high Y, and we get a beautiful example of anticorrelation in the "active" group. Of course, even in random data like above, you get great anticorrelation when you "cut the corner" like this (glad I checked!)
Yup - self-imposed! The real data (above is a simulation) is characterized by mostly low levels of both variables, with periods of "peaky" highs. Those peaks look anti-correlated, but the two variable have an overall positive correlation! ...
Learned a very painful statistics lesson today (after investing a whole day of analysis)
I am seeking a postdoc to join my group at UCLA -- ideally the candidate would have some experience in either population genetics or microbes/microbiome (computational background needed). We have a range of projects and are happy to tailer to your interests. Please dm/email me if interested.
Survived the 50km Winona Tourathon last Saturday, all in sub-zero temps! Skied an hour longer than planned but super happy with my 5th-place finish.
First paper of my PhD @doerrlab.bsky.social is up! We characterized meropenem tolerance in Enterobacterales species, and then further dissected tolerance mechanisms in Klebsiella pneumoniae. journals.plos.org/plospathogen...
“We found that using AI assistance led to a statistically significant decrease in mastery.”
Props to Anthropic for studying the effects of their creation and reporting results that are not probably what they wished for
www.anthropic.com/research/AI-...
We are thrilled to announce the first official release (v0.1.8) of #𝗯𝗲𝗱𝗱𝗲𝗿, the successor to one of our flagship tool, #𝗯𝗲𝗱𝘁𝗼𝗼𝗹𝘀! Based on ideas we conceived of long ago (!), this was achieved thanks to the dedication of Brent Pedersen.
1/n
We have an open position for a bioinformatics/theoretical microbial ecology PhD student to study viral strategies in the global Microverse! Join us at the @microverse.bsky.social at @uni-jena.de, please apply through this link:
jobs.uni-jena.de/jobposting/9...
Network diagram showing correlations between cyanobacterial taxa. Microcystis forms a cluster, Pseudanabaena and Dolichospermum form a cluster, and Cyanobium forms a cluster.
Network diagram showing correlations between cyanobacterial taxa. Microcystis forms a cluster, Pseudanabaena and Dolichospermum form a cluster, and Cyanobium forms a cluster.
I am having fun with networks!
I feel like I can use more classical/macroecology methods of calculating correlations/networks, but still want to control for false discovery rate. I'm having difficulty finding ideas/examples since most microbe-tools assume compositionality. Ideas? (2/2)
Ecologists 🗣️
I want to study cooccurence of abundant ASVs (~1000) with a subset of abundant ASVs (~100). Tons of tools for this, with a specific corrections for sparsity and compositionality. But my data aren't sparse (all have high prevalence) nor compositional (absolute-abundance corrected). (1\2)