Advertisement · 728 × 90

Posts by Katharina Hoff

Don't port without support!!

4 days ago 9 2 0 0

If anyone wants to port our code to yet another language, go for it. Nextflow would be nice… just an example.

3 days ago 1 0 0 0

I did not port BRAKER & GALBA to prevent others from doing so. The reason was primarily to make the monsters maintainable. I never learnt software development, I am a Molecular Biologist. I didn’t know any better when those projects started.

3 days ago 2 0 1 0

Porting code has - luckily! - become easy, now. I am saying that from the perspective of sitting in a Perl/C++ codebase that has historically grown and is very difficult to maintain, and particularly Perl is just outdated. Not sure I will become a Rust coder, but AI would make it easier.

3 days ago 2 0 2 0
The AI Rewrite Dilemma

Blog post on "The AI Rewrite Dilemma": lh3.github.io/2026/04/17/t...

4 days ago 54 29 3 4
Video

I am so excited to share our new findings with you! We provide the structural evidence for a direct protein-to-DNA information pathway, showing how a bacterial enzyme 'reads' its own structure to 'write' DNA. www.science.org/doi/10.1126/...

5 days ago 216 97 6 11
Job Summary: The FlyBase project is an international collaboration of ~35 people distributed at several sites. This position is located at Harvard University, Cambridge, MA. All FlyBase staff work as a part of a team, in which curators and software engineers collaborate extensively. Each site has its own set of responsibilities. The Harvard FlyBase curators focus on curation/annotation of literature and high-throughput data pertaining to the Drosophila genome, transcriptome and proteome. FlyBase is constantly evolving, seeking both to improve and to expand its role within the Drosophila and wider scientific communities. The ideal applicant will be enthusiastic about participating in this process, bringing to the FlyBase group expertise and ideas concerning emerging directions in Drosophila biology and genomic/proteomic analysis. FlyBase is increasingly integrating artificial intelligence and machine learning tools into its curation workflows. The ideal candidate will be open-minded and enthusiastic about exploring AI-assisted approaches to biological data curation, including the use of large language models and AI coding assistants to accelerate and enhance curation tasks.

Job Summary: The FlyBase project is an international collaboration of ~35 people distributed at several sites. This position is located at Harvard University, Cambridge, MA. All FlyBase staff work as a part of a team, in which curators and software engineers collaborate extensively. Each site has its own set of responsibilities. The Harvard FlyBase curators focus on curation/annotation of literature and high-throughput data pertaining to the Drosophila genome, transcriptome and proteome. FlyBase is constantly evolving, seeking both to improve and to expand its role within the Drosophila and wider scientific communities. The ideal applicant will be enthusiastic about participating in this process, bringing to the FlyBase group expertise and ideas concerning emerging directions in Drosophila biology and genomic/proteomic analysis. FlyBase is increasingly integrating artificial intelligence and machine learning tools into its curation workflows. The ideal candidate will be open-minded and enthusiastic about exploring AI-assisted approaches to biological data curation, including the use of large language models and AI coding assistants to accelerate and enhance curation tasks.


Job-Specific Responsibilities: 
	•	Reading and abstracting of data from the current Drosophila literature, including its relationship to human disease, physical interactions, and gene expression 
	•	Evaluating and validating AI generated annotations and curation suggestions, ensuring accuracy and biological relevance before integration into the database 
	•	Using AI coding assistants (e.g. Claude Code, Codex, Gemini) to write scripts for data wrangling, format conversion, and routing curation tasks 
	•	Providing feedback on AI model outputs and refining prompts to help improve automated curation pipelines and annotation quality over time
	•	Annotation and analysis of the Drosophila melanogaster genome, including gene models, mapped mutations, and regulatory features
	•	Together with curators and developers at other sites, interact with broad research community by answering helpmail and giving presentations and tutorials at research conferences 
	•	Handling high-throughput datasets and associated metadata

Job-Specific Responsibilities: • Reading and abstracting of data from the current Drosophila literature, including its relationship to human disease, physical interactions, and gene expression • Evaluating and validating AI generated annotations and curation suggestions, ensuring accuracy and biological relevance before integration into the database • Using AI coding assistants (e.g. Claude Code, Codex, Gemini) to write scripts for data wrangling, format conversion, and routing curation tasks • Providing feedback on AI model outputs and refining prompts to help improve automated curation pipelines and annotation quality over time • Annotation and analysis of the Drosophila melanogaster genome, including gene models, mapped mutations, and regulatory features • Together with curators and developers at other sites, interact with broad research community by answering helpmail and giving presentations and tutorials at research conferences • Handling high-throughput datasets and associated metadata

FlyBase is seeking a new scientific curator at our Harvard University site.
More information here: wiki.flybase.org/wiki/FlyBase...

1 week ago 9 15 1 1
Advertisement
Post image

BRAKER4 and GALBA2 took a walk, today. They learnt how to run RED for repeat masking and minisplice for alignment. github.com/Gaius-August... github.com/Gaius-August...

1 week ago 4 1 0 0
Post image
1 week ago 2 0 0 0
Post image

BRAKER4 and GALBA2 are obviously related. GALBA has always been the little brother of the original BRAKER, too. And it's a Brokkoli, not pot!

1 week ago 2 1 0 0
Preview
GitHub - Gaius-Augustus/GALBA2: Snakemake port of the original Galba pipeline Snakemake port of the original Galba pipeline. Contribute to Gaius-Augustus/GALBA2 development by creating an account on GitHub.

GALBA2 doesn't just predict genes — it evaluates them too.

Built-in QC: compleasm, BUSCO, OMArk, gffcompare vs reference.

Optional: FANTASIA-Lite adds GO term functional annotation via ProtT5 protein language model embeddings. All in one pipeline run.

github.com/Gaius-Augustus/GALBA2

1 week ago 2 1 1 0

GALBA2 matches the accuracy of the original galba.pl on A. thaliana.

Where it really shines: large genomes (>1 Gbp). miniprot handles repeat-rich genomes better than BRAKER2, so GALBA outperforms BRAKER2 EP mode on big assemblies — in both accuracy and speed.

github.com/Gaius-Augustus/GALBA2

1 week ago 0 0 1 0

What's new in GALBA2 vs the old galba.pl?

- Snakemake + Singularity (no tool installation)
- Auto-resume after failures
- SLURM-native (each step = cluster job)
- Multi-sample CSV input
- Under 400 MB containers (down from 2 GB)
- HTML report with plots & citations

github.com/Gaius-Augustus/GALBA2

1 week ago 1 0 1 0
Post image

GALBA2 walks into the arena. We rewrote our protein-based genome annotation pipeline in Snakemake.

Give it a genome + proteins from close relatives → get gene predictions. No RNA-Seq, no GeneMark needed.

miniprot → AUGUSTUS, fully containerised, HPC-ready.

github.com/Gaius-Augustus/GALBA2

1 week ago 22 10 1 0
Advertisement
Post image

GALBA2 is walking into the arena. github.com/Gaius-August... Fully ported to snakemake, accuracy matches old galba.pl , similar features as BRAKER4, but smaller containers.

1 week ago 2 2 0 0
Post image

GALBA is the short-lived emperor. To our surprise, GALBA is still used. GALBA2 is on his way... #genomeannotation

1 week ago 4 1 0 0

Do you have a suggestion for a well annotated protist with a nonstandard genetic code that I can benchmark on?

1 week ago 0 0 0 0

8/ Also baked in the weird translation tables for you,
@dumack.bsky.social - but in ETP mode, that is unfortunately not supported from the GeneMark side; should work in ET and EP mode. We are working on a very different solution for protists...

1 week ago 3 1 1 1
Post image

7/ One pipeline. To annotate - maybe not all - but many!

1 week ago 8 1 1 0

6/ Check out the HTML report file! I love that one.

1 week ago 0 0 1 0
GitHub - Gaius-Augustus/BRAKER4: BRAKER re-implemented with snakemake BRAKER re-implemented with snakemake. Contribute to Gaius-Augustus/BRAKER4 development by creating an account on GitHub.

5/ Honest caveat: optional FANTASIA-Lite (ProtT5 to GO) validated only on an NVIDIA A100. Experimental on other GPUs; structural annotation unaffected if you leave it off.

MIT. github.com/Gaius-Augustus/BRAKER4
(Tiberius for the rest: github.com/Gaius-Augustus/Tiberius)

1 week ago 0 0 1 0

4/ QC built in: AGAT, BUSCO, compleasm, OMArk, optional gffcompare. Benchmark on A. thaliana (TAIR10) vs Araport11: BRAKER4 matches or slightly beats native braker.pl on locus and exon F1 across ET, EP, ETP.

1 week ago 2 0 1 0

3/ Repeat masking baked in. Evidence types first-class: RNA-Seq, PacBio IsoSeq (BAM/FASTQ), proteins, plus a dual mode fusing all three. VARUS pulls SRA from just genus + species. UTRs via StringTie2. Optional ncRNA: rRNA, tRNA, Rfam, lncRNA merged into one GFF3.

1 week ago 4 1 1 0
Advertisement

2/ BRAKER4: a Snakemake rewrite of BRAKER3 (GeneMark + AUGUSTUS + TSEBRA). Singularity-only, resumable after any crash. Multi-genome in one shot via samples.csv. Native SLURM spreads rules across nodes instead of timing out on one.

1 week ago 3 0 1 0
Post image

Bring us your genomes. All 1.5 million of them. A lot can now go through Tiberius (github.com/Gaius-August...), and if not: go BRAKER4! github.com/Gaius-August...

1 week ago 0 0 0 0

BRAKER4 is MIT-licensed (GeneMark keeps its own terms for commercial use). Repo, migration guide from braker.pl, and tutorials: github.com/Gaius-August...

1 week ago 1 0 1 0

Treat FANTASIA-Lite as experimental on other GPUs for now, and please report back. Structural annotation is completely unaffected if you leave it off.

1 week ago 0 0 1 0

One honest caveat: BRAKER4 ships an optional functional-annotation step via FANTASIA-Lite (ProtT5 embeddings to GO terms). Powerful, but so far validated only on a single NVIDIA A100.

1 week ago 0 0 1 0

Accuracy? Benchmarked on A. thaliana (TAIR10) vs Araport11 with gffcompare, BRAKER4 matches or slightly beats native braker.pl on locus and exon F1 across ET, EP, and ETP modes.

1 week ago 0 0 1 0

QC built in: AGAT GFF3 conversion, BUSCO, compleasm, OMArk, and optional gffcompare against a reference. All in the same run, no separate scripts to glue on afterward.

1 week ago 0 0 1 0
Advertisement