#LMCache hashtag - Bluesky

@tensormesh.bsky.social

4 months ago

WOOT! #LMCache in the CNCF Technology Radar. cncf.io/reports/cncf...
That's golden to our community and everyone
@tensormesh

#kubecon #cncf #AI #LLM #inference

0 0 0 0

Tensormesh

@tensormesh.bsky.social

5 months ago

Comparing LLM Serving Stacks: Introduction to Tensormesh Benchmark | Tensormesh Tensormesh cuts inference costs and latency by up to 10x with enterprise-grade, AI-native caching.

Do you want to compare the caching performance of your LLM serving stack? We've put together a simple command line tool to do so. Introducing Tensormesh Benchmark.
tensormesh.ai/blog-posts/t...

#llm #ai #kvcache #lmcache #vllm #benchmarking

0 0 0 0

Suraj Deshmukh | सुरज देशमुख

@suraj.io

5 months ago

KubeCon + CloudNativeCon North America 2025: LLMs on Kubernetes: Squeeze 5x GPU Effic... View more about this event at KubeCon + CloudNativeCon North America 2025

Join me and Yuhan Liu for our talk at the upcoming #Kubecon NA 2025 in Atlanta: sched.co/27FcQ we will talk about increasing efficency while serving #LLMs using #vLLM & #LMCache!

1 0 0 0

Suraj Deshmukh | सुरज देशमुख

@suraj.io

6 months ago

How to Reduce KV Cache Bottlenecks with NVIDIA Dynamo | NVIDIA Technical Blog As AI models grow larger and more sophisticated, inference, the process by which a model generates responses, is becoming a major challenge. Large language models (LLMs) like GPT-OSS and DeepSeek-R1…

How to Reduce KV Cache Bottlenecks with NVIDIA Dynamo | NVIDIA Technical Blog developer.nvidia.com/blog/how-to-...

#LMCache

0 0 0 0

Kosseila (CloudDude)

@clouddude.bsky.social

6 months ago

vLLM production-stack: LLM inference for Enterprises (part1) - Cloudthrill vLLM Production Stack tackles usual issues that come with scaling LLM serving (slow recovery, High GPU bills) with a community-maintained layer that wraps vanilla vLLM, adds a Python-native router, LMCache-powered KV-cache network, autoscaling hooks and Grafana dashboards—all deployable in a single Helm chart. Let's dive into it!✍🏻

🚀#NewBlog #vLLM
📖 𝐯𝐋𝐋𝐌 𝐩𝐫𝐨𝐝𝐮𝐜𝐭𝐢𝐨𝐧-𝐬𝐭𝐚𝐜𝐤: AI inference for enterprises💫

🏢Production-stack is the K8s-native, enterprise-ready inference setup that supercharges vLLM inference at scale, across Clouds.

👉Start here: cloudthrill.ca/vllm-product...

#AI #LLM #vLLM #Kubernetes #MLOps #KVCache #LMCache

1 1 0 1

Kosseila (CloudDude)

@clouddude.bsky.social

7 months ago

📦#vLLM for 𝐁𝐞𝐠𝐢𝐧𝐧𝐞𝐫𝐬 𝐛𝐮𝐧𝐝𝐥𝐞: from basics to deployment! 👇Missed our vLLM series this summer? Here’s a full recap
Part1️⃣: 𝐅undamentals cloudthrill.ca/what-is-vllm
Part2️⃣: 𝐊ey 𝐅eatures cloudthrill.ca/what-is-vllm...
part3️⃣: 𝐃eployment 𝐎ptions cloudthrill.ca/vllm-deloyment
#vllm_project #lmcache #LLMs

0 0 0 1

Kosseila (CloudDude)

@clouddude.bsky.social

9 months ago

vLLM for beginners: Key Features & Performance Optimization(PartII) - Cloudthrill In this series, we aim to provide a solid foundation of vLLM core concepts to help you understand how it works and why it’s emerging as a defacto choice for LLM deployment.

🚀#NewBlog #vllm🔥
𝐯𝐋𝐋𝐌 𝐟𝐨𝐫 𝐁𝐞𝐠𝐢𝐧𝐧𝐞𝐫𝐬 𝐏𝐚𝐫𝐭 𝟐:📖𝐊𝐞𝐲 𝐅𝐞𝐚𝐭𝐮𝐫𝐞𝐬 & 𝐎𝐩𝐭𝐢𝐦𝐢𝐳𝐚𝐭𝐢𝐨𝐧s
💎 What makes #vLLM the Rolls Royce of inference?
👉check it out: cloudthrill.ca/what-is-vllm...

✅ #PagedAttention #PrefixCaching #ChunkedPrefill
✅ #SpeculativeDecoding #FlashAttention #lmcache
✅ Tensor & #PipelineParallelism⚡

0 0 0 1

Kosseila (CloudDude)

@clouddude.bsky.social

10 months ago

#NewBlog 𝐊𝐕 𝗖𝗮𝗰𝗵𝗲 𝗘𝘅𝗽𝗹𝗮𝗶𝗻𝗲𝗱: like I'm 5😎
🧠Ever wondered what #KVCache really is in LLM inference? Forget the math-heavy blabla—this one's made to click !
👉check it out: cloudthrill.ca/kv_cache-exp...
@Cloud_Thrill
#vLLM #AIInfra #lmcache

0 0 0 0