Léαlinux 🐧 (@lea-linux.org) Bsky

linux c'est comme windows mais en plus moche.

Ils sont trop lol les @lesechosfr.bsky.social ..

14 hours ago 22 4 3 1

What Makes System Calls Expensive: A Linux Internals Deep Dive An explanation of how Linux handles system calls on x86-64 and why they show up as expensive operations in performance profiles

blog.codingconfessions.com/p/what-makes... : Le coût de la vie.

15 hours ago 2 0 0 0

IA Calcul :
GPU VRAM ≈ Parameters (B/Milliards) x (Quantization (bits) / 8)

15 hours ago 2 0 1 0

GPU Memory Math for LLMs (2026 Edition) by @TheAhmadOsman(Ahmad) | Twitter Thread Reader GPU Memory Math for LLMs (2026 Edition) If you’re running models locally, thinking “model → VRAM” falls apart once you account for how the weights were trained and quantized in the first place. There’s a better way to think about it: > VRAM (in GB) ≈ Parameters (in billions) x (effective bits per weight ÷ 8) That’s it. This one formula explains everything across: - FP16 / BF16 - FP8 / INT8 - GPTQ / AWQ / NF4 - GGUF variants - basically every format you’ll use The Only Conversion You Actually Need Here’s the core intuition: - FP16 / BF16 → 16 bits → ~2 GB per 1B params - FP8 / INT8 → 8 bits → ~1 GB per 1B params - 4-bit quants → ~4 bits → ~0.5 GB per 1B params GGUF formats sit in between depending on the exact scheme: - Q6_K → ~0.82 GB per 1B - Q5_K → ~0.69 GB per 1B - Q4_K → ~0.56 GB per 1B - Q3_K → ~0.43 GB per 1B - Q2_K → ~0.33 GB per 1B Ultra-aggressive quants go even lower, but at a cost. If you remember nothing else, remember this: - FP16 = 2x model size - FP8 = 1x model size - 4-bit = 0.5x model size Everything else is just variations on that theme. Side Note: The VRAM Tax Nobody Talks About XIMGPH_1 Before you even think about weights, understand this: the model itself is only part of your VRAM bill. The real killer is everything around it. KV cache grows with context length and will quietly eat your memory alive at 32K, 128K, or higher. Activations vary by runtime and optimization level but can spike under certain execution paths. Batching and concurrency multiply memory usage fast, especially in agent-style workloads. Framework overhead adds its own tax depending on whether you’re using Transformers, vLLM, TensorRT-LLM, or llama.cpp. And then there’s CUDA Graphs, which trade extra reserved memory for much better latency and throughput stability. Bottom line: if you only budget for weights, you’re already out of memory. What This Looks Like in Practice Let’s translate that into real model sizes. A 7B model: - FP16 → ~14 GB - FP8 → ~7 GB - 4-bit → ~3.5–4 GB A 13B model: - FP16 → ~26 GB - FP8 → ~13 GB - 4-bit → ~6–7 GB A 70B model: - FP16 → ~140 GB - FP8 → ~70 GB - 4-bit → ~35–40 GB A 405B model: - FP16 → ~810 GB - FP8 → ~405 GB - 4-bit → ~200+ GB Now you understand why people either: - quantize aggressively - shard across GPUs (e.g. Tensor Parallelism) - or just give up and say “cloud it is” GPU Reality: What Actually Fits Here’s the practical translation into GPUs people actually own. 8 GB VRAM: - ~3B in FP16 - ~6–7B in FP8 - ~12–13B in 4-bit 12 GB VRAM: - ~5B FP16 - ~10B FP8 - ~18–20B 4-bit 16 GB VRAM: - ~7B FP16 - ~13B FP8 - ~25B 4-bit 24 GB VRAM: - ~10–12B FP16 - ~20B FP8 - ~35–40B 4-bit 48 GB VRAM: - ~20–24B FP16 - ~40B FP8 - ~70–80B 4-bit 80 GB VRAM: - ~35–40B FP16 - ~70B FP8 - ~140B-class 4-bit This is the “what actually fits” version for model weights. Why Your Model Still Crashes As we said earlier, even if the math says it fits, you can still run out of memory. Because weights are only part of the story. You also need memory for: - KV cache (this explodes with long context) - activations (depending on runtime) - batching / concurrency - framework overhead Rule of thumb: Add 10–30% extra VRAM for a safe run. If you’re doing: - long context (32K, 128K, etc) - high concurrency - agent workflows …you’ll need even more. The MoE Trap Mixture-of-Experts models confuse people. Example: - “8x7B” sounds like 56B - but only a subset of experts run per token So compute cost ≠ memory cost. What matters: - total parameters → affects memory footprint - active parameters → affects speed Depending on how the model is loaded: - you may still need memory for all experts - or you can shard them across GPUs If you treat MoE like dense, you’ll either overestimate or underestimate badly. GGUF Is Not Magic GGUF gets treated like a cheat code. It’s not. It’s a container + quantization strategy optimized for: - llama.cpp-style inference - CPU + GPU hybrid setups - ultra-efficient memory usage But: Those memory numbers only apply in that runtime. The moment you move into other frameworks: - weights may be dequantized - memory usage can jump dramatically So “it fits in 6 GB” is not universal truth. It’s runtime-specific truth. The Only Mental Model That Matters There isn’t a giant compatibility matrix you need to memorize. There’s just this: VRAM ≈ B x (bits ÷ 8) Then adjust for: - runtime overhead - KV cache - concurrency That’s it. Once you internalize this, you stop guessing. You start designing systems. And more importantly, you stop asking: “Can I run this?” You start asking: “How do I want to run this?” That’s when things get interesting. Until next time.

twitter-thread.com/t/2040103488... : GPU Memory Math for LLM

16 hours ago 0 1 0 0

"oN a CHangeEeeAy !"

16 hours ago 7 1 0 0

Fascinant Microsoft intune; pour résoudre les problèmes, il faut tout désinstaller et tout réinstaller (Edge+Intune+Extra) pour que tout revienne à la normale. Encore une belle prouesse venant d'eux.

21 hours ago 11 0 1 0

a man with a beard and white hair is wearing a white robe and a white hat . Alt: a man with a beard and white hair is wearing a white robe and a white hat .

"Ponceto Sensei
-- Twitch 2026"

22 hours ago 1 0 0 0

L'introduction au dossier de notre numéro spécial #gouvernance est accessible sur connect.ed-diamond.com/misc/mischs-.... N'hésitez pas à aller la consulter pour découvrir plus en détail son contenu.

Le numéro est actuellement disponible en kiosque & sur boutique.ed-diamond.com/nouveautes/1....

22 hours ago 1 2 0 0

Notifications History; Off

1 day ago 3 0 1 0

FBI Extracts Suspect’s Deleted Signal Messages Saved in iPhone Notification Database The case was the first time authorities charged people for alleged “Antifa” activities after President Trump designated the umbrella term a terrorist organization.

NEW: The FBI was able to forensically extract copies of incoming Signal messages from a defendant’s iPhone, even after the app was deleted, because copies of the content were saved in the device’s push notification database, multiple people present for FBI testimony in a trial told 404 Media.

1 day ago 589 333 14 60

faut être patient avec l'administration

1 day ago 0 0 1 0

Pour avoir fait partie à des réunions pour le choix informatique avec l'EdNat, la plupart des "problèmes possibles" remontés par l'EdNat etaient des "mais on a pas cette feature" et quand tu grattais, ils n'utilis(ai)ent pas ou ne l'ont jamais utilisé finalement après coup (ce qui foutait la rage)

1 day ago 3 0 0 0

On reposte, mais chaque mots de Michel Paulin sont goldés, notamment les points de blocage pour des fonctionnalités jamais utilisées :

1 day ago 13 3 1 0

1 day ago 6 1 1 0

FreeBSD Laptop Compatibility Each laptop is scored based on an aggregate of:

Top Laptops for use with FreeBSD

freebsdfoundation.github.io/freebsd-lapt...

1 day ago 20 3 1 1

(vous vous rendez compte qu'il a fallu un dingo à la WH pour qu'on accèlère le passage aux logiciels libres en Europe/France... 30 ans de combats qui se résument à ... de la simple politique)

1 day ago 13 2 1 0

1 day ago 45 12 6 3

On est assez vieux pour avoir connu...
(AAAAAHHH SFR et son proxy web qui injectait de la pub...)

1 day ago 8 2 1 0

a man in a black shirt is standing in a living room . Alt: a man in a black shirt is standing in a living room .

mais... mais ... MAIS ... NAAOOONNN !!!!

1 day ago 2 0 0 0

a woman is dancing in front of a refrigerator with the words stop the madness written below her Alt: a woman is dancing in front of a refrigerator with the words stop the madness written below her

1 day ago 0 0 0 0

DistroWatch.com: Put the fun back into computing. Use Linux, BSD. News and feature lists of Linux and BSD distributions.

distrowatch.com/dwres-mobile...
$ apt rollback

2 days ago 5 0 1 0

+1
D'où les tentatives de casser cela avec des initiatives genre WINE.

2 days ago 4 0 1 0

Oui, "qui" ?

2 days ago 17 1 3 0

Vu le nombre de fois où on en entendu parler (et notamment sur la partie "politique"), il va peut-être falloir qu'on commence un jour :-D

2 days ago 1 0 1 0

(vu la gueule des designs *1: ça doit venir d'un spinoff de starwars)

(*1: et aussi le copyright lucasfilm en bas ;)

2 days ago 4 1 2 0

Peut-être le meilleur Meme d'exemple sur la phase "Si vous n'avez rien à cacher" :

2 days ago 31 13 1 0

source: www.thibauld-feneuil.fr/docs/talks/2...

2 days ago 1 1 0 0

Example II: PBKDF2 Somewhere in PBKDF2 implementation … Uses only 4 bytes of the password

Oh bordel... ta derivation key qui derive pas trop...
Marrez-vous, mais 'déjà vu cela dans une autre implémentation sur du matériel crypto où la passphrase "secrète" était >24 octets et qu'en desassemblant le code, tu vois que le code n'utilise que le [0]. Allez, zou, une rainbow-table facile.

2 days ago 13 5 2 0

S'initier à la cryptographie post-quantique avec l'échange de clés de chiffrement ML-KEM Dans cet article, je vous propose de partir à la découverte du monde merveilleux de la cryptographie post-quantique avec l’algorithme d’échange de …

thibautprobst.fr/fr/posts/ml-... : Une belle doc, "d'un fort beau gabarie" (tm)

2 days ago 6 2 0 0

Guides ANSSI | MesServicesCyber

messervices.cyber.gouv.fr/guides/mecan... (et leurs PDF)

2 days ago 1 0 0 0

Posts by Léαlinux 🐧