phaleon (@phaleon) Bsky

I have said this before & I will say it again,it is time that taxpayers who are the majority of folks in every country should not pay a single penny to Microsoft. Taxpayer money should only go to FLOSS. Enough is enough with these big corporations. They have failed to protect user privacy & safety

1 week ago 61 10 1 0

numerique.gouv.fr Le numérique au service de l'efficacité de l'action publique

Excellent news.

France Launches Government Linux Desktop Plan as Windows Exit Begins www.numerique.gouv.fr/sinformer/es...

Bye bye spyware and AI batshit crazy Windows 11.

1 week ago 109 30 2 11

#picoclaw powered #smolClaw has now a real website! smolbsd.org/smolclaw/

1 week ago 4 2 0 0

GPU Memory Math for LLMs (2026 Edition) by @TheAhmadOsman(Ahmad) | Twitter Thread Reader GPU Memory Math for LLMs (2026 Edition) If you’re running models locally, thinking “model → VRAM” falls apart once you account for how the weights were trained and quantized in the first place. There’s a better way to think about it: > VRAM (in GB) ≈ Parameters (in billions) x (effective bits per weight ÷ 8) That’s it. This one formula explains everything across: - FP16 / BF16 - FP8 / INT8 - GPTQ / AWQ / NF4 - GGUF variants - basically every format you’ll use The Only Conversion You Actually Need Here’s the core intuition: - FP16 / BF16 → 16 bits → ~2 GB per 1B params - FP8 / INT8 → 8 bits → ~1 GB per 1B params - 4-bit quants → ~4 bits → ~0.5 GB per 1B params GGUF formats sit in between depending on the exact scheme: - Q6_K → ~0.82 GB per 1B - Q5_K → ~0.69 GB per 1B - Q4_K → ~0.56 GB per 1B - Q3_K → ~0.43 GB per 1B - Q2_K → ~0.33 GB per 1B Ultra-aggressive quants go even lower, but at a cost. If you remember nothing else, remember this: - FP16 = 2x model size - FP8 = 1x model size - 4-bit = 0.5x model size Everything else is just variations on that theme. Side Note: The VRAM Tax Nobody Talks About XIMGPH_1 Before you even think about weights, understand this: the model itself is only part of your VRAM bill. The real killer is everything around it. KV cache grows with context length and will quietly eat your memory alive at 32K, 128K, or higher. Activations vary by runtime and optimization level but can spike under certain execution paths. Batching and concurrency multiply memory usage fast, especially in agent-style workloads. Framework overhead adds its own tax depending on whether you’re using Transformers, vLLM, TensorRT-LLM, or llama.cpp. And then there’s CUDA Graphs, which trade extra reserved memory for much better latency and throughput stability. Bottom line: if you only budget for weights, you’re already out of memory. What This Looks Like in Practice Let’s translate that into real model sizes. A 7B model: - FP16 → ~14 GB - FP8 → ~7 GB - 4-bit → ~3.5–4 GB A 13B model: - FP16 → ~26 GB - FP8 → ~13 GB - 4-bit → ~6–7 GB A 70B model: - FP16 → ~140 GB - FP8 → ~70 GB - 4-bit → ~35–40 GB A 405B model: - FP16 → ~810 GB - FP8 → ~405 GB - 4-bit → ~200+ GB Now you understand why people either: - quantize aggressively - shard across GPUs (e.g. Tensor Parallelism) - or just give up and say “cloud it is” GPU Reality: What Actually Fits Here’s the practical translation into GPUs people actually own. 8 GB VRAM: - ~3B in FP16 - ~6–7B in FP8 - ~12–13B in 4-bit 12 GB VRAM: - ~5B FP16 - ~10B FP8 - ~18–20B 4-bit 16 GB VRAM: - ~7B FP16 - ~13B FP8 - ~25B 4-bit 24 GB VRAM: - ~10–12B FP16 - ~20B FP8 - ~35–40B 4-bit 48 GB VRAM: - ~20–24B FP16 - ~40B FP8 - ~70–80B 4-bit 80 GB VRAM: - ~35–40B FP16 - ~70B FP8 - ~140B-class 4-bit This is the “what actually fits” version for model weights. Why Your Model Still Crashes As we said earlier, even if the math says it fits, you can still run out of memory. Because weights are only part of the story. You also need memory for: - KV cache (this explodes with long context) - activations (depending on runtime) - batching / concurrency - framework overhead Rule of thumb: Add 10–30% extra VRAM for a safe run. If you’re doing: - long context (32K, 128K, etc) - high concurrency - agent workflows …you’ll need even more. The MoE Trap Mixture-of-Experts models confuse people. Example: - “8x7B” sounds like 56B - but only a subset of experts run per token So compute cost ≠ memory cost. What matters: - total parameters → affects memory footprint - active parameters → affects speed Depending on how the model is loaded: - you may still need memory for all experts - or you can shard them across GPUs If you treat MoE like dense, you’ll either overestimate or underestimate badly. GGUF Is Not Magic GGUF gets treated like a cheat code. It’s not. It’s a container + quantization strategy optimized for: - llama.cpp-style inference - CPU + GPU hybrid setups - ultra-efficient memory usage But: Those memory numbers only apply in that runtime. The moment you move into other frameworks: - weights may be dequantized - memory usage can jump dramatically So “it fits in 6 GB” is not universal truth. It’s runtime-specific truth. The Only Mental Model That Matters There isn’t a giant compatibility matrix you need to memorize. There’s just this: VRAM ≈ B x (bits ÷ 8) Then adjust for: - runtime overhead - KV cache - concurrency That’s it. Once you internalize this, you stop guessing. You start designing systems. And more importantly, you stop asking: “Can I run this?” You start asking: “How do I want to run this?” That’s when things get interesting. Until next time.

twitter-thread.com/t/2040103488... : GPU Memory Math for LLM

1 week ago 0 1 0 0

happy +1 @ponceto91.bsky.social
🥳🎉

1 week ago 1 1 3 0

Happy birthday Olivier 🍾

1 week ago 0 1 0 0

Je cherche une entreprise lyonnaise (ou avec un bureau lyonnais) qui a un usage assez avancé de l'IA dans leur workflow de développement produit pour organiser avec eux le prochain meetup Claude Code en partenariat avec Anthropic.

Qui sauriez-vous me recommander ?

#Lyon #ClaudeCode #AI

2 weeks ago 6 7 4 0

Can You Run This LLM? VRAM Calculator (Nvidia GPU and Apple Silicon) Calculate the VRAM required to run any large language model.

Y a ce calculateur qui est plutôt pas mal (ou du moins un peu plus précis :p ). Tant pour l'inférence que pour le fine-tuning : apxml.com/tools/vram-c...

1 week ago 0 1 0 0

linux c'est comme windows mais en plus moche.

Ils sont trop lol les @lesechosfr.bsky.social ..

1 week ago 24 6 5 1

Projets Libres saison 4 épisode 15 : mets de la vie privée dans ton smartphone !

Mets de la vie privée dans ton smartphone !
Quelles altenatives basées sur Android ? Et peut-on installer Linux sur son téléphone ?

👉www.projets-libres.org/podcast/s4e15-vieprivee-...

Avec fla de @framasoft.org

1 week ago 14 5 0 2

J'ai créé un skill Claude Code qui agrège l'actu dev francophone et te sort un récap trié par jour directement dans ton terminal.

/veille

👉 github.com/camilleroux/...

1 week ago 19 6 2 0

La taille démentielle de l’Univers va vous retourner le cerveau YouTube video by Balade Mentale

La taille démentielle de l’Univers va vous retourner le cerveau youtu.be/Uo9vimXhoPk?...

1 week ago 6 2 1 0

Si vous êtes chauds on va faire quelques travaux pratiques, y'a plein de nouveaux smols !

1 week ago 3 2 0 0

Gazette - Solutions informatiques Microlinux

Abonnez-vous à la Gazette de Microlinux :

www.microlinux.fr/gazette/

2 weeks ago 3 2 0 0

Teach Your Children | Playing For Change | Live in Australia YouTube video by Playing For Change

youtu.be/P5AuFDHdrrg?...

2 weeks ago 0 1 0 0

Evènement: La toute première vidéo tuto de Léalinux !
Aujourd'hui, nous vous présentons une méthode - étape par étape - sur "Comment debugger votre programme en toute simplicité":

2 weeks ago 12 4 4 1

i'm so sorry we've sent these souls to the moon and they're using outlook?

2 weeks ago 4006 489 47 46

🎤 Mon talk sur les agents IA a été accepté à @devlille.fr 😍

Construire un agent IA qui interroge des données en langage naturel

→ NL → SQL → réponse
→ ADK + MCP + #BigQuery
→ #Docker + #uv + local dev
→ Déploiement #CloudRun & #AgentEngine

Hâte de partager ça 🚀

#GoogleCloud @docker.com

2 weeks ago 5 3 1 0

Everyone knows that you shoudn't/mustn't/cant open windows in space ¯\_(ツ)_/¯
hence the consequence.

2 weeks ago 0 1 0 0

Corée du Sud 🇰🇷

Face à la crise énergétique, le Président Sud Coréen, demande à l'ensemble de la population "d'économiser chaque goutte de carburant". Le télé travail est imposé le vendredi pour les fonctionnaires, les éclairages publics sont coupés à partir de 21h les weekends

2 weeks ago 53 29 1 2

mais qu'est-ce q

$ du -sh ~/.cache/pip
46G /home/imil/.cache/pip

2 weeks ago 5 1 6 0

<Agence GCU pour l'emploi>on recherche maintenant DEUX personnes pour ma team, infra, environnement OpenShift (~k8s)/NetApp/NVidia/LLM, niveau: troubleshooting
On cherche aussi qqun pour bosser sur le forecast, le dashboarding (grafana/prom), les présentations, les sujets C-level...
</>

3 weeks ago 2 6 0 0

86€ le renouvellement de pkgin.net chez gandi
8-6€
QUATRE-VINGT-SIX PUTAIN D'EUROS
mais vous êtes complètement cinglés non ?
bah ouais je bouge le domaine chez bookmyname.
Comme tous les autres avant.
venez pas chialer quand tout le monde se sera barré et que vous devrez fermer les portes.

3 weeks ago 21 3 6 0

Tutoriel:
SmolBSD, conteneur avec agent Picoclaw (à partir des streams de @imil.net).
Utilisation de modèles légers et open source en local avec très peu de Ram GPU.

m.twitch.tv/videos/27248...

github.com/vvgnshs/smol...

3 weeks ago 8 4 0 2

Le "leak" de Claude Code est un gros fake vibe codé à l'arrache et on est tous tombé dans le panneau.

J'ai pas de preuve mais ça pourrait tout à fait être possible.

Mon pronostic : si ce leak là est bien réel, il faudra pas longtemps néanmoins pour que les faux leaks se multiplient.

2 weeks ago 2 2 1 1

Stream Twitch du 29/03/2026. Dev FPGA: on ajoute des leds et on code un premier HelloWorld en C YouTube video by LefStream

Le replay sur stream de ce matin (leds et code C sur #FPGA avec #LiteX) est maintenant disponible sur Youtube (+ la playlist à jour avec les 5 vidéos de la série):
www.youtube.com/watch?v=w4RY...
www.youtube.com/watch?v=KIH3...

3 weeks ago 8 9 0 1

Nicholas Carlini - Black-hat LLMs | [un]prompted 2026 YouTube video by unprompted

www.youtube.com/watch?v=1sd2...

3 weeks ago 1 0 0 0

Organiser ses arbres de travail git Si vous avez déjà maintenu plusieurs versions d’un même logiciel, vous savez certainement que jongler entre les branches peut s’avérer pénible. Bien souvent on a recours à un mix de branches temporair...

Connaissez-vous les arbres de travail git (git worktree) ? Si ce n'est pas le cas, foncez lire mon dernier article qui va sans doute révolutionner votre manière d'utiliser git.

rodolphe.breard.tf/article/orga...

3 weeks ago 6 4 3 0

Lefinnois_ - Twitch Lefinnois_ streams live on Twitch! Check out their videos, sign up to chat, and join their community.

Dans 5mn on se réveille doucement en faisant clignoter des leds et avant de faire exécuter notre code par le SoC RISCV construit avec #LiteX sur notre notre #FPGA Lattice ECP5 !
www.twitch.tv/lefinnois_

3 weeks ago 4 4 0 0

Si tu n'as pas encore testé je te conseille Open Code, en version TUI ou web les deux modes sont très bons (en tous cas sur un qwen 35b A3 les résultats sont au-delà de mes espérances)

3 weeks ago 1 1 1 0

Posts by phaleon