Advertisement Β· 728 Γ— 90

Posts by Pierre Peterlongo

Post image

Good Friday Evening news: we updated back_to_sequences
(find the origin of kmers)

- Faster
- Can consider multiline fasta files
- Much easier installation: see github.com/pierrepeterl...

5 months ago 4 1 0 0
Preview
Efficient and accurate search in petabase-scale sequence repositories - Nature MetaGraph enables scalable indexing of large sets of DNA, RNA or protein sequences using annotated de Bruijn graphs.

The Metagraph paper is out in Nature; it showed up in my feeds today! Congratulations to Mikhail Karasikov, @gxxxr.bsky.social, @akkah21.bsky.social and all of the other authors (whom I'd love to follow on Bluesky if I can find you ;P) www.nature.com/articles/s41...

6 months ago 36 15 1 0

Preprint out for myloasm, our new nanopore / HiFi metagenome assembler!

Nanopore's getting accurate, but

1. Can this lead to better metagenome assemblies?
2. How, algorithmically, to leverage them?

with co-author Max Marin @mgmarin.bsky.social, supervised by Heng Li @lh3lh3.bsky.social

1 / N

7 months ago 114 80 5 5

❗ I clearly consider this result as THE most important result achieved over this last decade for exploiting and democratizing genomic data.
I think there will be a "before" and an "after" logan and logan-search
github.com/IndexThePlan...
logan-search.org
Have a look at this thread

7 months ago 9 3 0 0

🀝 Amazing collaboration with @jermp.bsky.social, @yhhshb.bsky.social, @robp.bsky.social, Victor Levallois, and Bertrand Le Gal, and the help of β€ͺ@yoann.bsky.social‬. 8/8

10 months ago 3 0 0 0

🌊 On metagenomic data, other tools such as kmindex are good alternatives. At the same time, Kaminari consistently ranks as one of the fastest tools across all data types, generating the smallest indexes (or the lower FPR). 7/8

10 months ago 1 0 1 0

πŸ’Ύ For fixed False Positive rates, it uses up to 37x less space than COBS while being an order of magnitude faster to build and query. 6/8

10 months ago 2 0 1 0
Post image Post image

πŸ“Š Experimental results show Kaminari's superiority in index size and query performance across various genomic datasets. 5/8

10 months ago 1 0 1 0
Advertisement

🧬 Kaminari's design leverages properties of k-mer minimizers for compact space and fast query time, as inspired by the techniques proposed in Fulgor. 4/8

10 months ago 1 0 1 0
Preview
GitHub - yhhshb/kaminari: ι›· - kaminari (thunder/lightning) ι›· - kaminari (thunder/lightning). Contribute to yhhshb/kaminari development by creating an account on GitHub.

πŸ’» We implemented Kaminari in C++17, available under the MIT license at github.com/yhhshb/kaminari. Additional results and reproducibility info at github.com/vicLeva/benchmarks_kaminari. 3/8

10 months ago 1 0 1 0
Post image

πŸ” Key findings include:
- Use of minimizers and integer compression for indexing.
- Lower memory footprint and faster query times.
- Minimal impact of false positives on result ranking, using the Rank-Biased Overlap (RBO) metric.
2/8

10 months ago 2 0 1 0
Post image

πŸ“œ Excited to share insights from our recent paper: "Kaminari: a resource-frugal index for approximate colored k-mer queries". The study aims to efficiently identify documents containing a query string, focusing on DNA strings. www.biorxiv.org/content/10.1... 🧬 πŸ–₯️ 1/8

10 months ago 25 16 1 1

Thanks guys for your precious feedback. I modified the code accordingly.

1 year ago 1 0 0 0
Preview
GitHub - pierrepeterlongo/hyperloglog_kmer_counter Contribute to pierrepeterlongo/hyperloglog_kmer_counter development by creating an account on GitHub.

Hi @imartayan.bsky.social I wanted to run distinct-kmers, but I faced limitations as my input data contains non-ACGTacgt characters. Thus I created this github.com/pierrepeterl...
(again extremely simple)

1 year ago 2 0 1 0
Preview
GitHub - pierrepeterlongo/hyperloglog_kmer_counter Contribute to pierrepeterlongo/hyperloglog_kmer_counter development by creating an account on GitHub.

That's correct.
I just created this github.com/pierrepeterl... This is yet a new hll kmer counter, but hyper simple. And I did not find a way to accumulate the kmer counts for several input datasets.

1 year ago 0 0 1 0
Preview
GitHub - pierrepeterlongo/distinct-kmers: How many distinct k-mers are there in a sequence? How many distinct k-mers are there in a sequence? Contribute to pierrepeterlongo/distinct-kmers development by creating an account on GitHub.

@imartayan.bsky.social I needed a version of distinct_kmers for multiple fasta/fastq.
I created this fork github.com/pierrepeterl...
I'm almost ashamed that this code modification is public, but maybe it can be useful.

1 year ago 1 0 1 0

I added the notion of insertion order (mentioning your name). However, I don't get the point of the mergeability issue.

1 year ago 0 0 1 0
Advertisement
Post image

Note that the "conservative update" is also something we implemented (without describing it) in fimpera github.com/lrobidou/fim...

1 year ago 1 0 1 0

Thanks again for this pointer @benlangmead.bsky.social. What I described is the same idea, adapted when items are added on the fly, without their final abundance.
The technique in the "conservative update" is adapted when items are added simultaneously with their abundance.

1 year ago 1 0 1 0

HO! amazing results. The difference between you and a rust beginner.
You'll try to understand your code.

1 year ago 1 0 0 0

Thanks Ben - I'll at this.

1 year ago 2 0 1 0
Post image

Results: slightly longer insertion time, but 2 to 3 times lower abundance overestimations.

1 year ago 2 0 1 0
Post image

In two words: increase only minimal stored values of a cBF when adding elements to this filter.

1 year ago 1 0 1 0
Post image

Maybe the simplest idea to decrease overestimations of a counting bloom filter. A trivial observation + 10 lines of code.
I'm surprised it has not been described before. Please comment if this is not the case.
Blog post here:
pierrepeterlongo.github.io/2025/03/17/m... πŸ§ͺ🧬πŸ–₯️

1 year ago 7 2 2 1

Yes ntCard helps a lot and its precision is impressive on reads. Indeed I wanted exact number on genome.

1 year ago 1 0 0 0

I wanted something that used as little memory as possible. I don't want to count kmers, but only know the number of unique kmers. So jellyfish, KMC, ... are too advanced for this simple task.

1 year ago 3 0 0 0
Preview
GitHub - pierrepeterlongo/unique_kmer_counter: Count number of unique kmers from fasta or fasta.gz files Count number of unique kmers from fasta or fasta.gz files - pierrepeterlongo/unique_kmer_counter

Today I wanted to know the number of unique 27-mers in the hg38 human genome (spoiler there are 2.49 billion). I found no tool for doing this. So I wrote that github.com/pierrepeterl...

It may help.
Please use it / improve it.

πŸ§¬πŸ’» #bioinformatics

1 year ago 16 3 3 1
Advertisement
Post image

We are back in the Town Theatre for a great lecture on Alignment, by @rayanchikhi.bsky.social! πŸ§¬πŸ’» #evomics2025 #genomics #bioinformatics

1 year ago 22 7 1 0

bsky.app/profile/pier...
Applications for this position are still open. If you're passionate about large-scale science, we'd love to hear from you.
🧬 & πŸ–₯️

1 year ago 2 2 0 0

🚨🚨🚨
We are hiring
🚨🚨🚨

After the creation of logan-search (see: bsky.app/profile/pier...) we propose a 2-years engineer position for continuing the development and optimizations.

With @rayanchikhi.bsky.social and @tlemane.bsky.social

Details + applications: recrutement.inria.fr/public/class...

1 year ago 12 14 1 2