Michael Hall (@mbhall88) Bsky

Release v4.0.0 · mbhall88/rasusa 4.0.0 (2026-04-01) ⚠ BREAKING CHANGES reads: --output-type has been renamed to --compress-type (-Z) for the reads subcommand. The -O short flag is now used for --output-format. Features reads: s...

v4.0.0 of rasusa is out now in all major retailers 🛍️
github.com/mbhall88/ras...

Big new feature is support for unaligned SAM/BAM/CRAM as input (and output) to the reads subcommand.

You can even pass in BAM and ask for FASTQ out the other side if you're into that kind of thing.

2 weeks ago 5 0 0 0

GitHub - snakemake/snakefmt: The uncompromising Snakemake code formatter The uncompromising Snakemake code formatter. Contribute to snakemake/snakefmt development by creating an account on GitHub.

🐍 For all ye #snakemake users out there who want consistent, opinionated formatting of their workflows: we have just released v1.0.0 of snakefmt 🥳
The major update is that it now sorts rule directives (e.g., input, output, resources, params, shell etc.).
See github.com/snakemake/sn... for more

4 weeks ago 18 7 0 0

There's lot more interesting things we looked at:
- We find some credible transmission links in India
- We also see some kind of (likely) reduction IS-mediated funny business going on in the linear plasmid.

Checkout the preprint for all the details.

1 month ago 0 0 0 0

We deemed this transposon Tn8026. We did some global screening and found Tn8026 in a variety of countries, with the earliest evidence being Norway in 2012. We also found it in 2 E. gallinarum isolates from S. Korea. PLUS it was also in the chromosome of one of our isolates!

1 month ago 0 0 1 0

To add to the intrigue, the linezolid resistance mechanism, a gene called poxtA-Ef, was located on this linear plasmid, along with Tn1546, which carries the vancomycin resistance gene cluster.
Upon further inspection, we realised poxtA-Ef was in what turns out to be an uncharacterised transposon

1 month ago 0 0 1 0

Turns out most of these isolates had a LINEAR plasmid. Really showing my inexperience here as I did not know that was a thing.
After doing some more reading I found that Jia Beh from the Doherty in Melbourne, Aus. had also found a linear plasmid in some LREs (as had a couple of others globally)

1 month ago 0 0 1 0

The dataset was linezolid resistant Enterococcus (LRE), which are very concerning pathogens that are resistant to nearly everything.
We sequenced all these on ONT and I started by making assemblies. First shout out to @rrwick.bsky.social for the beautiful piece of software that is Autocycler!

1 month ago 0 0 1 0

Novel transposon Tn8026 acts as a global driver of transmissible linezolid resistance in Enterococcus via a linear plasmid Linezolid is a critical last-resort antimicrobial for multidrug-resistant Enterococcus faecium , particularly against vancomycin-resistant lineages where therapeutic options are severely limited. Whil...

Until joining @loolibear.bsky.social's lab in July, I embarrassingly hadn't had much experience with plasmids.
So when I started, Leah said "here you go, have a look at this dataset".
What a fun ride this has been.
Preprint out today and thread below
www.medrxiv.org/content/10.6...

1 month ago 9 3 1 1

Does anyone else think they are seeing post-acceptance editorial changes at proof stage which are error-prone and probably due to adoption of AI?

4 months ago 4 2 4 1

GitHub - mbhall88/nohuman: Remove human reads from a sequencing run Remove human reads from a sequencing run. Contribute to mbhall88/nohuman development by creating an account on GitHub.

So nohuman now ships an unmasked HPRC.r2 DB by default, with optional dataset selection.

If you’ve used nohuman before, I highly recommend updating to v0.5.0 and re-downloading the new DB.

Repo: github.com/mbhall88/nohuman
Keep your metagenomes clean 🧹🧬

5 months ago 1 1 0 0

At the same time, I realised the Human Pnagenome Reference Consortium had made a second release of genomes.
So I rebuilt release 1 without masking, and added a release 2 database with no masking. The improvement in detection accuracy was substantial:

5 months ago 0 0 1 0

🚨 Update to nohuman 🚨

While testing against the standard Kraken DB, I noticed Kraken was detecting far more human reads than nohuman. I realised Kraken masks low-complexity regions by default during DB construction and that setting had been left on in nohuman, leading to missing human reads.

5 months ago 2 1 1 0

Stars are level of p value (description is in the figure caption in the paper)

5 months ago 1 0 0 0

True.
Thanks for the great questions and discussion

5 months ago 1 0 1 0

Correct. Yeah I guess mash on a random subset should perform similarly. Haven’t looked at that though.

5 months ago 1 0 0 0

It’s a decent sample size at 3000. But I guess more would always be better. I wanted to use refseq genomes which has long read data to be as sure as possible about the true size
There is likely inherent biases though based on error rates in reads for the kmer based methods

5 months ago 0 0 1 0

- Overlaps are pairwise alignment with minima2 (FFI)
-Thanks!
- See other thread where I have answered this

5 months ago 2 0 0 0

I just used mash v2.3. The supplement has an exploration of the best parameters to use for mash to estimate genome size. Mash was the fastest tool though.

5 months ago 0 0 0 0

GitHub - mbhall88/cud: Color Universal Design colourblind-friendly python matplotlib palette Color Universal Design colourblind-friendly python matplotlib palette - mbhall88/cud

Thanks for appreciating the plots. I obsessed a lot over them. I created a repo for the colour palette too if you’re interested in that github.com/mbhall88/cud

5 months ago 1 0 1 0

the bars are pair wise statistical comparisons. I only show the significant ones so as not to over clutter the plot

5 months ago 1 0 1 0

And lastly, a HUGE thank you to @lachlanjmc.bsky.social for a lot of the methodological heavy lifting when we were coming up with the idea

5 months ago 0 0 0 0

GitHub - mbhall88/lrge: Genome size estimation from long read overlaps Genome size estimation from long read overlaps. Contribute to mbhall88/lrge development by creating an account on GitHub.

Try LRGE here: github.com/mbhall88/lrge
(installable from wherever you get your podcasts 😉)

5 months ago 3 0 1 0

You might remember the preprint from late last year... Reviews/Publication were delayed while I was on parental leave. We extended validation to include H. sapiens, which lead to smarter handling of contained overlaps in repetitive genomes. Big shout-out to Chenxi Zhou for leading that part

5 months ago 0 0 1 0

However, the computational resource usage (runtime/memory) of LRGE was MUCH better than assembling

5 months ago 0 0 1 0

We benchmarked >3,000 bacterial genomes and found that LRGE (our method) achieves significantly better accuracy than k-mer-based methods like Mash and GenomeScope and performs on par with full genome assembly (Raven)

5 months ago 0 0 2 0

Genome size estimation from long read overlaps AbstractMotivation. Accurate genome size estimation is an important component of genomic analyses such as assembly and coverage calculation, though existin

Our method for genome size estimation from long-read overlaps is now published 🥳
academic.oup.com/bioinformati...

5 months ago 37 16 1 1

AltaiR: a C toolkit for alignment-free and temporal analysis of multi-FASTA data AbstractBackground. Most viral genome sequences generated during the latest pandemic have presented new challenges for computational analysis. Analyzing mi

New from @dgpratas.bsky.social et al. for analyzing multiple sequences in multi-FASTA format using alignment-free methodologies. Scalable to millions of sequences for pandemic research and more

AltaiR: a C toolkit for alignment-free and temporal analysis of multi-FASTA data doi.org/10.1093/giga...

1 year ago 4 4 0 0

How the Web of Science takes a step back <p>The Web of Science, a major commercial indexing service of scientific journals operated by Clarivate, recently decided to remove eLife from its Science Citation Index Expanded (SCIE). eLife will on...

“Clarivate’s decision rewards journals for continuing the unhelpful practice of keeping peer review information hidden and unintentionally presenting incomplete and inadequate studies as sound science and punishes those journals that are more transparent.” 👏🙌

www.coalition-s.org/blog/how-the...

1 year ago 3 0 0 0

The DOI URL doesn't seem to be working for the preprint currently. You can find it here: www.biorxiv.org/content/10.1...

1 year ago 0 0 0 0

GitHub - mbhall88/lrge: Genome size estimation from long read overlaps Genome size estimation from long read overlaps. Contribute to mbhall88/lrge development by creating an account on GitHub.

8/ Try it out!
LRGE is open-source and ready to integrate into your workflows as a Rust library or CLI application. Whether you’re on a high-performance cluster or a basic laptop, LRGE delivers fast and reliable genome size estimates. Get it here: github.com/mbhall88/lrge

1 year ago 3 0 0 0

Posts by Michael Hall