Advertisement · 728 × 90

Posts by Hugo Larcher

This first step will very soon be followed by the integration of new backends (TRT-LLM, llama.cpp, vLLM, Neuron and TPU).

We are polishing the TensorRT-LLM backend which achieves impressive performances on NVIDIA GPUs, stay tuned 🤩!

1 year ago 0 0 0 0
Preview
Introducing multi-backends (TRT-LLM, vLLM) support for Text Generation Inference We’re on a journey to advance and democratize artificial intelligence through open source and open science.

We are introducing multi-backend support in Hugging Face 🤗Text Generation Inference!
With new TGI architecture we are now able to plug new modeling backends to get best performances according to selected model and available hardware.

huggingface.co/blog/tgi-mul...

1 year ago 5 0 2 0
Preview
From Files to Chunks: Improving HF Storage Efficiency We’re on a journey to advance and democratize artificial intelligence through open source and open science.

When XetHub joined Hugging Face, we brainstormed how to share our tech with the community.

The magic? Versioning chunks, not files, giving rise to:

🧠 Smarter storage
⏩ Faster uploads
🚀 Efficient downloads

Curious? Read the blog and let us know how it could help your workflows!

1 year ago 33 14 1 2