llm-d (@llm-d.ai) Bsky

📢 𝗧𝗵𝗲 𝗦𝘁𝗮𝘁𝗲 𝗼𝗳 𝗠𝗼𝗱𝗲𝗹 𝗦𝗲𝗿𝘃𝗶𝗻𝗴 𝗖𝗼𝗺𝗺𝘂𝗻𝗶𝘁𝗶𝗲𝘀: 𝗔𝗽𝗿𝗶𝗹 𝗘𝗱𝗶𝘁𝗶𝗼𝗻 𝗶𝘀 𝗼𝘂𝘁!

Our goal with this newsletter is to give a clear, community-driven view of what’s happening across the model serving ecosystem, including updates from projects like vLLM, KServe, @llm-d.ai, @kubernetes.io, Llama Stack, and more.

9 hours ago 3 2 1 1

Check out the latest newsletter to stay up to speed on the changes happening in the model serving communities!

8 hours ago 1 0 0 0

ICYMI: llm-d is officially a @CNCF Sandbox project! 🚀

We’re evolving #Kubernetes into SOTA AI infrastructure through a powerhouse coalition including Red Hat , Google Cloud , IBM Research, NVIDIA, Mistral AI, Hugging Face , and many more.

www.cncf.io/blog/2026/03...

2 weeks ago 0 0 0 0

Welcome llm-d to the CNCF: Evolving Kubernetes into SOTA AI infrastructure We are thrilled to announce that llm-d has officially been accepted as a Cloud Native Computing Foundation (CNCF) Sandbox project! As generative AI transitions from research labs to production…

It’s official: llm-d has joined the cncf.io ! 🚀

Our mission to evolve Kubernetes into SOTA AI infrastructure just got a massive boost. This milestone belongs to our amazing community.

Thank you for building this with us. 💜

We’re just getting started!

🔗 www.cncf.io/blog/2026/03...

3 weeks ago 2 1 0 0

Welcome llm-d to the CNCF: Evolving Kubernetes into SOTA AI infrastructure We are thrilled to announce that llm-d has officially been accepted as a Cloud Native Computing Foundation (CNCF) Sandbox project! As generative AI transitions from research labs to production…

It’s official: llm-d has joined the cncf.io ! 🚀

Our mission to evolve Kubernetes into SOTA AI infrastructure just got a massive boost. This milestone belongs to our amazing community.

Thank you for building this with us. 💜

We’re just getting started!

🔗 www.cncf.io/blog/2026/03...

3 weeks ago 2 1 0 0

vLLM Inference Meetup · Boston · Luma Deep technical sessions. Live demos. Real conversations. If you're deploying, or scaling LLM inference, this is the room to be in. Join Red Hat AI, IBM,…

Deploying or scaling LLM inference? This is the room to be in. 📈

The vLLM Inference Meetup hits Boston on March 31! Join us for an evening of deep technical sessions, live demos, and real conversations with the community.

📅 Mar 31, 5PM
📍 314 Main St, Cambridge
🔗 luma.com/4rmkrrb7

3 weeks ago 0 0 0 0

LLMInferenceService is now fully production-ready and built on the high-performance @llm-d.ai framework.

𝗪𝗵𝗮𝘁’𝘀 𝗶𝗻𝗰𝗹𝘂𝗱𝗲𝗱?

- KV-cache aware routing and disaggregated prefill-decode to maximize throughput.

3 weeks ago 0 1 1 0

Precise Prefix Cache-Aware Routing & Distributed Tracing in llm-d In this technical demo, we explore how llm-d optimizes distributed inference by using Precise Prefix Cache-Aware Routing and how you can gain full visibility into these decisions using Distributed…

The results with Llama 3.1 8B:

✅ Lower TTFT on cache hits.
✅ Full visibility into scoring decisions.
✅ Improved throughput & GPU utilization.

Watch the full walkthrough: youtu.be/NN-1JvnMMrU

4 weeks ago 0 0 0 0

Watch this preview of distributed tracing (llm-d 0.6) and Prefix Cache-Aware Routing.

🔹 State Tracking: llm-d tracks KV cache via ZMQ.
🔹 Smart Scoring: EPP pods tokenize prompts and query to find cached blocks.
🔹 Optimal Routing: Reqs go to the pod for the best cache hit.

4 weeks ago 0 1 1 0

llm d NYC 2026 Meetup YouTube video by llm-d Project

The first llm-d NYC Meetup is live!

Deep dive into the open-source stack for cloud-native inference with @IBMResearch, @AMD, and @RedHat

State-aware scheduling & KV cache reuse
P/D disaggregation
Scaling MoE models
AMD ROCm & llm-d

Watch: www.youtube.com/watch?v=_ZBQ...

1 month ago 1 1 0 0

📢 𝗧𝗵𝗲 𝗦𝘁𝗮𝘁𝗲 𝗼𝗳 𝗠𝗼𝗱𝗲𝗹 𝗦𝗲𝗿𝘃𝗶𝗻𝗴 𝗖𝗼𝗺𝗺𝘂𝗻𝗶𝘁𝗶𝗲𝘀: 𝗠𝗮𝗿𝗰𝗵 𝗘𝗱𝗶𝘁𝗶𝗼𝗻 𝗶𝘀 𝗼𝘂𝘁!

We launched our newsletter publicly last year to share our contributions to upstream communities from our Red Hat AI teams. We’ve gained over 𝟭𝟯𝟬𝟬 𝘀𝘂𝗯𝘀𝗰𝗿𝗶𝗯𝗲𝗿𝘀!

1 month ago 2 2 2 0

Distributed Inference Meetup NYC · Luma llm-d Distributed Inference Meetup NYC Hosted by Red Hat AI, IBM Research, and AMD, this event takes place on March 11, 2026 in New York City. What to…

Final Call: NYC 🗽

Registration for the llm-d Meetup closes, Tuesday March 10.

Join the community this Wednesday at the IBM 1 Madison office for a deep dive into llm-d 0.5, MoE scaling, and KV-cache offloading.

Don't miss a night of high-signal technical talks.

🎟️ Register now: luma.com/0crwqwg4

1 month ago 1 1 0 0

Distributed Inference Meetup NYC · Luma llm-d Distributed Inference Meetup NYC Hosted by Red Hat AI, IBM Research, and AMD, this event takes place on March 11, 2026 in New York City. What to…

Planning to join us in NYC next week? 🏙️

Registration for the llm-d Distributed Inference Meetup closes this Tuesday, March 10th.

Don't miss out on a night of technical talks and networking with the community at the IBM 1 Madison office. Grab your spot now!

🎟️ luma.com/0crwqwg4

#llmd #NYCMeetup

1 month ago 0 0 0 0

Distributed Inference Meetup NYC · Luma llm-d Distributed Inference Meetup NYC Hosted by Red Hat AI, IBM Research, and AMD, this event takes place on March 11, 2026 in New York City. What to…

What’s on the agenda for next Wednesday's NYC meetup?

🛠️ Intro to llm-d 0.5
⚡️ Distributed LLM serving on AMD
🧠 Lessons scaling Wide-EP and MoE
💾 KV-cache offloading & prefix scheduling

Join the engineers building the future of open-source inference.

Details: luma.com/0crwqwg4

1 month ago 1 0 0 0

Distributed Inference Meetup NYC · Luma llm-d Distributed Inference Meetup NYC Hosted by Red Hat AI, IBM Research, and AMD, this event takes place on March 11, 2026 in New York City. What to…

Join us next week in NYC with the llm-d community for a deep dive into distributed inference.

We’re talking llm-d 0.5, scaling MoE models, and KV-cache offloading.

If you're building LLM infra, don't miss this.

📅 March 11th
📍1 Madison Ave
Register: luma.com/0crwqwg4

1 month ago 1 0 0 1

Optimizing LLM Workloads: A Deep Dive into the GPU Recommendation Tool & Configuration Explorer YouTube video by llm-d Project

In the latest llm-d release, we’re tackling high hardware costs with the new GPU Recommendation Tool! 📈

Evaluate throughput, latency, and cost-effectiveness before requesting expensive cluster resources.

Check out the full demo: www.youtube.com/watch?v=Y26i...

1 month ago 2 1 0 0

Upcoming llm-d Events | llm-d Meet the llm-d community at upcoming talks, meetups, and conferences

There are many more sessions and community meetups happening throughout the year.

Check the full calendar for session details, room numbers, and the complete list of talks from the llm-d community:

🔗 llm-d.ai/docs/communi...

1 month ago 1 0 0 0

PyTorch Conference Europe | LF Events Join top-tier researchers, developers, and academics for a deep dive into PyTorch, the cutting-edge open-source machine learning framework.

📍 Stop 3: PyTorch Conference Europe
📅 April 7–8 | Paris

Deep technical tracks on chunked decoding, preemptive scheduling, and disaggregated tokenization. We'll be sharing the latest on state-aware serving with vLLM + llm-d.

Full Schedule: events.linuxfoundation.org/pytorch-conf...

1 month ago 1 0 1 0

KubeCon + CloudNativeCon Europe | LF Events The Cloud Native Computing Foundation’s flagship conference gathers adopters and technologists from leading open source and cloud native communities.

📍 Stop 2: KubeCon Europe
📅 March 23–26 | Amsterdam

From Istio Day to the main stage, we’re talking AI-aware routing and KV-cache scheduling. Don't miss our tutorial on building resilient LLM gateways with Kubernetes.

Details: events.linuxfoundation.org/kubecon-clou...

1 month ago 0 0 1 0

Distributed Inference Meetup NYC · Luma llm-d Distributed Inference Meetup NYC Hosted by Red Hat AI, IBM Research, and AMD, this event takes place on March 11, 2026 in New York City. What to…

📍 Stop 1: NYC Distributed Inference Meetup
📅 March 11 | IBM Innovation Studio

We’re diving into the weeds of llm-d 0.5, Wide-EP, and MoE model scaling. Perfect for anyone in the city looking to optimize LLM serving on AMD and beyond.

Register: luma.com/0crwqwg4

1 month ago 0 0 1 0

Where to find the llm-d community over the next 2 months 🧵

We have a busy Spring ahead with sessions in NYC, Amsterdam, and Paris. If you're building open-source infrastructure for distributed inference, come join the conversation. ⬇️

1 month ago 0 0 1 0

Distributed Inference Meetup NYC · Luma llm-d Distributed Inference Meetup NYC Hosted by Red Hat AI, IBM Research, and AMD, this event takes place on March 11, 2026 in New York City. What to…

The agenda is still evolving, and we’ve got even more awesomeness in the works! 📈

Whether you're running GenAI in production or building the platforms to support it, this is the room to be in.

📅 March 11 | 4:30 PM
📍 1 Madison Ave, NYC
🎟️ RSVP: luma.com/0crwqwg4

1 month ago 0 1 0 1

Hosted by Red Hat AI, IBM Research, and AMD. 🤝

If you're building or scaling models, this event is for you.

We’re bringing together maintainers and engineers working on:
🔹 llm-d project roadmap
🔹 Optimizing for AMD hardware
🔹 Scaling MoE (Mixture-of-Experts)
🔹 KV-Cache & Prefix-caching performance

1 month ago 0 0 1 0

NYC: Ready to go deep on Distributed Inference? 🗽

The llm-d community is hitting Manhattan on March 11th!

Join us at the IBM Innovation Studio for a technical deep dive into the infra powering the next generation of LLM serving. 🧵

1 month ago 1 1 1 0

[Announcement] WG Serving Has Succeeded and Will Be Disbanded

We'd like to announce that @kubernetes.io WG Serving has succeeded and will be disbanded! Thank you everyone who have participated and contributed to the discussions and initiatives!

More details: groups.google.com/a/kubernetes...

2 months ago 4 2 1 1

In case you missed it, last week the llm-d community shipped the v0.5 release.

Check out the post from the llm-d project owners to learn more about all the features we've included in this release.

llm-d.ai/blog/llm-d-v...

2 months ago 1 1 0 0

👉 Check out the February newsletter here: inferenceops.substack.com/p/state-of-the-model-ser...
👉 Subscribe to get future issues in your inbox: https://inferenceops.substack.com/

🚀 Thanks to everyone who subscribed so far!

Kudos to all contributors to this edition!

2 months ago 0 1 1 0

Our goal with this newsletter is to give a clear, community-driven view of what’s happening across the model serving ecosystem, including updates from vLLM, KServe, @llm-d.ai, @kubernetes.io, and Llama Stack.

2 months ago 0 1 1 0

GitHub - llm-d/llm-d: Achieve state of the art inference performance with modern accelerators on Kubernetes Achieve state of the art inference performance with modern accelerators on Kubernetes - llm-d/llm-d

This release is built on collaboration—from NIXL 0.9 merges to vLLM integrations.

We are building an open, hardware-agnostic inference control plane.
Ready to build? 🧱

GitHub: github.com/llm-d/llm-d
Website: llm-d.ai
Community Calls: Wed 12:30pm ET

2 months ago 0 0 0 0

🌐 In disaggregated serving, network congestion kills tail latency.

We’ve integrated the UCCL backend into NIXL, demonstrating 2.4x greater resilience to network contention than standard transports.

2 months ago 0 0 1 0

Posts by llm-d