Advertisement ยท 728 ร— 90

Posts by llm-d

๐Ÿ“ข ๐—ง๐—ต๐—ฒ ๐—ฆ๐˜๐—ฎ๐˜๐—ฒ ๐—ผ๐—ณ ๐— ๐—ผ๐—ฑ๐—ฒ๐—น ๐—ฆ๐—ฒ๐—ฟ๐˜ƒ๐—ถ๐—ป๐—ด ๐—–๐—ผ๐—บ๐—บ๐˜‚๐—ป๐—ถ๐˜๐—ถ๐—ฒ๐˜€: ๐—”๐—ฝ๐—ฟ๐—ถ๐—น ๐—˜๐—ฑ๐—ถ๐˜๐—ถ๐—ผ๐—ป ๐—ถ๐˜€ ๐—ผ๐˜‚๐˜!

Our goal with this newsletter is to give a clear, community-driven view of whatโ€™s happening across the model serving ecosystem, including updates from projects like vLLM, KServe, @llm-d.ai, @kubernetes.io, Llama Stack, and more.

9 hours ago 3 2 1 1

Check out the latest newsletter to stay up to speed on the changes happening in the model serving communities!

8 hours ago 1 0 0 0
Post image

ICYMI: llm-d is officially a @CNCF Sandbox project! ๐Ÿš€

Weโ€™re evolving #Kubernetes into SOTA AI infrastructure through a powerhouse coalition including Red Hat , Google Cloud , IBM Research, NVIDIA, Mistral AI, Hugging Face , and many more.

www.cncf.io/blog/2026/03...

2 weeks ago 0 0 0 0
Preview
Welcome llm-d to the CNCF: Evolving Kubernetes into SOTA AI infrastructure We are thrilled to announce that llm-d has officially been accepted as a Cloud Native Computing Foundation (CNCF) Sandbox project! As generative AI transitions from research labs to productionโ€ฆ

Itโ€™s official: llm-d has joined the cncf.io ! ๐Ÿš€

Our mission to evolve Kubernetes into SOTA AI infrastructure just got a massive boost. This milestone belongs to our amazing community.

Thank you for building this with us. ๐Ÿ’œ

Weโ€™re just getting started!

๐Ÿ”— www.cncf.io/blog/2026/03...

3 weeks ago 2 1 0 0
Preview
Welcome llm-d to the CNCF: Evolving Kubernetes into SOTA AI infrastructure We are thrilled to announce that llm-d has officially been accepted as a Cloud Native Computing Foundation (CNCF) Sandbox project! As generative AI transitions from research labs to productionโ€ฆ

Itโ€™s official: llm-d has joined the cncf.io ! ๐Ÿš€

Our mission to evolve Kubernetes into SOTA AI infrastructure just got a massive boost. This milestone belongs to our amazing community.

Thank you for building this with us. ๐Ÿ’œ

Weโ€™re just getting started!

๐Ÿ”— www.cncf.io/blog/2026/03...

3 weeks ago 2 1 0 0
Preview
vLLM Inference Meetup ยท Boston ยท Luma Deep technical sessions. Live demos. Real conversations. If you're deploying, or scaling LLM inference, this is the room to be in. Join Red Hat AI, IBM,โ€ฆ

Deploying or scaling LLM inference? This is the room to be in. ๐Ÿ“ˆ

The vLLM Inference Meetup hits Boston on March 31! Join us for an evening of deep technical sessions, live demos, and real conversations with the community.

๐Ÿ“… Mar 31, 5PM
๐Ÿ“ 314 Main St, Cambridge
๐Ÿ”— luma.com/4rmkrrb7

3 weeks ago 0 0 0 0

LLMInferenceService is now fully production-ready and built on the high-performance @llm-d.ai framework.

๐—ช๐—ต๐—ฎ๐˜โ€™๐˜€ ๐—ถ๐—ป๐—ฐ๐—น๐˜‚๐—ฑ๐—ฒ๐—ฑ?

- KV-cache aware routing and disaggregated prefill-decode to maximize throughput.

3 weeks ago 0 1 1 0
Advertisement
Precise Prefix Cache-Aware Routing & Distributed Tracing in llm-d
Precise Prefix Cache-Aware Routing & Distributed Tracing in llm-d In this technical demo, we explore how llm-d optimizes distributed inference by using Precise Prefix Cache-Aware Routing and how you can gain full visibility into these decisions using Distributedโ€ฆ

The results with Llama 3.1 8B:

โœ… Lower TTFT on cache hits.
โœ… Full visibility into scoring decisions.
โœ… Improved throughput & GPU utilization.

Watch the full walkthrough: youtu.be/NN-1JvnMMrU

4 weeks ago 0 0 0 0
Post image

Watch this preview of distributed tracing (llm-d 0.6) and Prefix Cache-Aware Routing.

๐Ÿ”น State Tracking: llm-d tracks KV cache via ZMQ.
๐Ÿ”น Smart Scoring: EPP pods tokenize prompts and query to find cached blocks.
๐Ÿ”น Optimal Routing: Reqs go to the pod for the best cache hit.

4 weeks ago 0 1 1 0
llm d NYC 2026 Meetup
llm d NYC 2026 Meetup YouTube video by llm-d Project

The first llm-d NYC Meetup is live!

Deep dive into the open-source stack for cloud-native inference with @IBMResearch, @AMD, and @RedHat

State-aware scheduling & KV cache reuse
P/D disaggregation
Scaling MoE models
AMD ROCm & llm-d

Watch: www.youtube.com/watch?v=_ZBQ...

1 month ago 1 1 0 0

๐Ÿ“ข ๐—ง๐—ต๐—ฒ ๐—ฆ๐˜๐—ฎ๐˜๐—ฒ ๐—ผ๐—ณ ๐— ๐—ผ๐—ฑ๐—ฒ๐—น ๐—ฆ๐—ฒ๐—ฟ๐˜ƒ๐—ถ๐—ป๐—ด ๐—–๐—ผ๐—บ๐—บ๐˜‚๐—ป๐—ถ๐˜๐—ถ๐—ฒ๐˜€: ๐— ๐—ฎ๐—ฟ๐—ฐ๐—ต ๐—˜๐—ฑ๐—ถ๐˜๐—ถ๐—ผ๐—ป ๐—ถ๐˜€ ๐—ผ๐˜‚๐˜!

We launched our newsletter publicly last year to share our contributions to upstream communities from our Red Hat AI teams. Weโ€™ve gained over ๐Ÿญ๐Ÿฏ๐Ÿฌ๐Ÿฌ ๐˜€๐˜‚๐—ฏ๐˜€๐—ฐ๐—ฟ๐—ถ๐—ฏ๐—ฒ๐—ฟ๐˜€!

1 month ago 2 2 2 0
Preview
Distributed Inference Meetup NYC ยท Luma llm-d Distributed Inference Meetup NYC Hosted by Red Hat AI, IBM Research, and AMD, this event takes place on March 11, 2026 in New York City. What toโ€ฆ

Final Call: NYC ๐Ÿ—ฝ

Registration for the llm-d Meetup closes, Tuesday March 10.

Join the community this Wednesday at the IBM 1 Madison office for a deep dive into llm-d 0.5, MoE scaling, and KV-cache offloading.

Don't miss a night of high-signal technical talks.

๐ŸŽŸ๏ธ Register now: luma.com/0crwqwg4

1 month ago 1 1 0 0
Preview
Distributed Inference Meetup NYC ยท Luma llm-d Distributed Inference Meetup NYC Hosted by Red Hat AI, IBM Research, and AMD, this event takes place on March 11, 2026 in New York City. What toโ€ฆ

Planning to join us in NYC next week? ๐Ÿ™๏ธ

Registration for the llm-d Distributed Inference Meetup closes this Tuesday, March 10th.

Don't miss out on a night of technical talks and networking with the community at the IBM 1 Madison office. Grab your spot now!

๐ŸŽŸ๏ธ luma.com/0crwqwg4

#llmd #NYCMeetup

1 month ago 0 0 0 0
Preview
Distributed Inference Meetup NYC ยท Luma llm-d Distributed Inference Meetup NYC Hosted by Red Hat AI, IBM Research, and AMD, this event takes place on March 11, 2026 in New York City. What toโ€ฆ

Whatโ€™s on the agenda for next Wednesday's NYC meetup?

๐Ÿ› ๏ธ Intro to llm-d 0.5
โšก๏ธ Distributed LLM serving on AMD
๐Ÿง  Lessons scaling Wide-EP and MoE
๐Ÿ’พ KV-cache offloading & prefix scheduling

Join the engineers building the future of open-source inference.

Details: luma.com/0crwqwg4

1 month ago 1 0 0 0
Advertisement
Preview
Distributed Inference Meetup NYC ยท Luma llm-d Distributed Inference Meetup NYC Hosted by Red Hat AI, IBM Research, and AMD, this event takes place on March 11, 2026 in New York City. What toโ€ฆ

Join us next week in NYC with the llm-d community for a deep dive into distributed inference.

Weโ€™re talking llm-d 0.5, scaling MoE models, and KV-cache offloading.

If you're building LLM infra, don't miss this.

๐Ÿ“… March 11th
๐Ÿ“1 Madison Ave
Register: luma.com/0crwqwg4

1 month ago 1 0 0 1
Optimizing LLM Workloads: A Deep Dive into the GPU Recommendation Tool & Configuration Explorer
Optimizing LLM Workloads: A Deep Dive into the GPU Recommendation Tool & Configuration Explorer YouTube video by llm-d Project

In the latest llm-d release, weโ€™re tackling high hardware costs with the new GPU Recommendation Tool! ๐Ÿ“ˆ

Evaluate throughput, latency, and cost-effectiveness before requesting expensive cluster resources.

Check out the full demo: www.youtube.com/watch?v=Y26i...

1 month ago 2 1 0 0
Preview
Upcoming llm-d Events | llm-d Meet the llm-d community at upcoming talks, meetups, and conferences

There are many more sessions and community meetups happening throughout the year.

Check the full calendar for session details, room numbers, and the complete list of talks from the llm-d community:

๐Ÿ”— llm-d.ai/docs/communi...

1 month ago 1 0 0 0
Preview
PyTorch Conference Europe | LF Events Join top-tier researchers, developers, and academics for a deep dive into PyTorch, the cutting-edge open-source machine learning framework.

๐Ÿ“ Stop 3: PyTorch Conference Europe
๐Ÿ“… April 7โ€“8 | Paris

Deep technical tracks on chunked decoding, preemptive scheduling, and disaggregated tokenization. We'll be sharing the latest on state-aware serving with vLLM + llm-d.

Full Schedule: events.linuxfoundation.org/pytorch-conf...

1 month ago 1 0 1 0
Preview
KubeCon + CloudNativeCon Europe | LF Events The Cloud Native Computing Foundationโ€™s flagship conference gathers adopters and technologists from leading open source and cloud native communities.

๐Ÿ“ Stop 2: KubeCon Europe
๐Ÿ“… March 23โ€“26 | Amsterdam

From Istio Day to the main stage, weโ€™re talking AI-aware routing and KV-cache scheduling. Don't miss our tutorial on building resilient LLM gateways with Kubernetes.

Details: events.linuxfoundation.org/kubecon-clou...

1 month ago 0 0 1 0
Preview
Distributed Inference Meetup NYC ยท Luma llm-d Distributed Inference Meetup NYC Hosted by Red Hat AI, IBM Research, and AMD, this event takes place on March 11, 2026 in New York City. What toโ€ฆ

๐Ÿ“ Stop 1: NYC Distributed Inference Meetup
๐Ÿ“… March 11 | IBM Innovation Studio

Weโ€™re diving into the weeds of llm-d 0.5, Wide-EP, and MoE model scaling. Perfect for anyone in the city looking to optimize LLM serving on AMD and beyond.

Register: luma.com/0crwqwg4

1 month ago 0 0 1 0

Where to find the llm-d community over the next 2 months ๐Ÿงต

We have a busy Spring ahead with sessions in NYC, Amsterdam, and Paris. If you're building open-source infrastructure for distributed inference, come join the conversation. โฌ‡๏ธ

1 month ago 0 0 1 0
Preview
Distributed Inference Meetup NYC ยท Luma llm-d Distributed Inference Meetup NYC Hosted by Red Hat AI, IBM Research, and AMD, this event takes place on March 11, 2026 in New York City. What toโ€ฆ

The agenda is still evolving, and weโ€™ve got even more awesomeness in the works! ๐Ÿ“ˆ

Whether you're running GenAI in production or building the platforms to support it, this is the room to be in.

๐Ÿ“… March 11 | 4:30 PM
๐Ÿ“ 1 Madison Ave, NYC
๐ŸŽŸ๏ธ RSVP: luma.com/0crwqwg4

1 month ago 0 1 0 1
Advertisement

Hosted by Red Hat AI, IBM Research, and AMD. ๐Ÿค

If you're building or scaling models, this event is for you.

Weโ€™re bringing together maintainers and engineers working on:
๐Ÿ”น llm-d project roadmap
๐Ÿ”น Optimizing for AMD hardware
๐Ÿ”น Scaling MoE (Mixture-of-Experts)
๐Ÿ”น KV-Cache & Prefix-caching performance

1 month ago 0 0 1 0

NYC: Ready to go deep on Distributed Inference? ๐Ÿ—ฝ

The llm-d community is hitting Manhattan on March 11th!

Join us at the IBM Innovation Studio for a technical deep dive into the infra powering the next generation of LLM serving. ๐Ÿงต

1 month ago 1 1 1 0
[Announcement] WG Serving Has Succeeded and Will Be Disbanded

We'd like to announce that @kubernetes.io WG Serving has succeeded and will be disbanded! Thank you everyone who have participated and contributed to the discussions and initiatives!

More details: groups.google.com/a/kubernetes...

2 months ago 4 2 1 1

In case you missed it, last week the llm-d community shipped the v0.5 release.

Check out the post from the llm-d project owners to learn more about all the features we've included in this release.

llm-d.ai/blog/llm-d-v...

2 months ago 1 1 0 0

๐Ÿ‘‰ Check out the February newsletter here: inferenceops.substack.com/p/state-of-the-model-ser...
๐Ÿ‘‰ Subscribe to get future issues in your inbox: https://inferenceops.substack.com/

๐Ÿš€ Thanks to everyone who subscribed so far!

Kudos to all contributors to this edition!

2 months ago 0 1 1 0

Our goal with this newsletter is to give a clear, community-driven view of whatโ€™s happening across the model serving ecosystem, including updates from vLLM, KServe, @llm-d.ai, @kubernetes.io, and Llama Stack.

2 months ago 0 1 1 0
Preview
GitHub - llm-d/llm-d: Achieve state of the art inference performance with modern accelerators on Kubernetes Achieve state of the art inference performance with modern accelerators on Kubernetes - llm-d/llm-d

This release is built on collaborationโ€”from NIXL 0.9 merges to vLLM integrations.

We are building an open, hardware-agnostic inference control plane.
Ready to build? ๐Ÿงฑ

GitHub: github.com/llm-d/llm-d
Website: llm-d.ai
Community Calls: Wed 12:30pm ET

2 months ago 0 0 0 0
Post image

๐ŸŒ In disaggregated serving, network congestion kills tail latency.

Weโ€™ve integrated the UCCL backend into NIXL, demonstrating 2.4x greater resilience to network contention than standard transports.

2 months ago 0 0 1 0
Advertisement