Advertisement · 728 × 90

Posts by kb

MLX, Llama.cpp, and Candle are performing about equally on an M3 Max now.

MLX, Llama.cpp, and Candle are performing about equally on an M3 Max now.

🕯️🔥[Candle](github.com/huggingface/...) is now much faster on macOS thanks to a contribution by @EricLBuehler, which brings major speed improvements to the Metal backend.🍎📈
Try it out by running some of our examples with the `--features metal` flag.

#Candle #RustLang #macOS #Metal #HuggingFace

9 months ago 2 1 0 0
Preview
Building Tensors from Scratch in Rust (Part 2): View Operations A Blog post by Kyle Birnbaum on Hugging Face

I just published part 2 of my article series about creating tensors from scratch in Rust. This one is about view operations.
#tensors #machine-learning #ml #ai

Take a look here:
huggingface.co/blog/KeighBe...

10 months ago 2 1 1 0
Preview
Building Tensors From Scratch in Rust: Part 1, Core Structure and Indexing A Blog post by Kyle Birnbaum on Hugging Face

I'm writing an article series about creating tensors from scratch in Rust. #tensors #machine-learning #ml #ai

huggingface.co/blog/KeighBe...

10 months ago 5 3 0 0

The mixture of experts model is also an option:

```
cargo run --example qwen --features metal --release -- --prompt "Write a poem about butterflies. <think></think>." --model "3-moe-a3b"
```

10 months ago 0 0 0 0
Preview
GitHub - huggingface/candle: Minimalist ML framework for Rust Minimalist ML framework for Rust. Contribute to huggingface/candle development by creating an account on GitHub.

Qwen 3 is now supported in Candle!
Run the 3-4B model locally with:

```
cargo run --example qwen --release -- --model 3-4b --prompt 'The capital of France is '
```

On macOS, enable Metal for faster inference:

```
--features metal
```

Clone the repo and test it out. github.com/huggingface/...

10 months ago 0 0 1 0
Preview
microsoft/rifts · Datasets at Hugging Face We’re on a journey to advance and democratize artificial intelligence through open source and open science.

RIFTS Dataset: Solving Critical LLM Conversation Failures

- LLMs 3x less likely to clarify than humans
- 16x less likely to provide follow-up requests
- Early failures predict later breakdowns
- Includes preliminary intervention strategies

huggingface.co/datasets/mic...

1 year ago 11 3 1 0
Preview
Welcome Gemma 3: Google's all new multimodal, multilingual, long context open LLM We’re on a journey to advance and democratize artificial intelligence through open source and open science.

Google just released Gemma 3, an open, on-device LLM with vision capabilities and support for over 140 different languages. Models range from 1B-27B parameters.

Zero-day support for multiple frameworks including transformers, MLX, llama.cpp, and more! 💼 🚀

Read more here:
huggingface.co/blog/gemma3

1 year ago 3 0 1 0
Advertisement
Preview
LLMs Can Easily Learn to Reason from Demonstrations Structure, not content, is what matters! Large reasoning models (LRMs) tackle complex reasoning problems by following long chain-of-thoughts (Long CoT) that incorporate reflection, backtracking, and self-validation. However, the training tec...
1 year ago 5 1 0 0
Video

Made some significant updates to the @hf.co semantic datasets search app. If you love falling into a wiki black hole, you might like this...

huggingface.co/spaces/libra...

1 year ago 9 4 0 0
How DeepSeek Changes the LLM Story
How DeepSeek Changes the LLM Story YouTube video by Sasha Rush 🤗

What to know about DeepSeek

youtu.be/0eMzc-WnBfQ?...

In which we attempt to figure out MoE, o1, scaling, tech reporting, modern semiconductors, microeconomics, and international geopolitics.

1 year ago 96 14 1 5
Post image Post image Post image Post image

Over-Tokenized Transformer: Vocabulary is Generally Worth Scaling

Hongzhi Huang, Defa Zhu, Banggu Wu, Yutao Zeng, Ya Wang, Qiyang Min, Xun Zhou

tl;dr: increasing input vocabulary is always good, increasing output vocabularies is good for bigger models.
arxiv.org/abs/2501.16975

1 year ago 6 2 0 0
Post image

It’s a green light for the Frugal AI Challenge! 🚀
For the next month, we invite all members of the AI community to participate in one of our 3 AI for Climate tasks, with the goal of developing a highly accurate model while consuming as little energy as possible ⚡

1 year ago 23 11 2 1
Preview
GitHub - huggingface/coreml-examples: Swift Core ML Examples Swift Core ML Examples. Contribute to huggingface/coreml-examples development by creating an account on GitHub.

We’ve got great examples of PyTorch to CoreML conversions in the Huggingface coreml-examples repo. Currently, there’s one tutorial, but more are coming soon. After converting, you can choose what compute units you want the model to run on!

1 year ago 0 0 0 0
Video

Christmas came early! 🎅🏻 Today marks the newest release of the HuggingChat 🤗 update with some really exciting capabilities! First up, automatic context injection!

1) Open a file in a supported app, summon HFChat, and it pre-populates the context window. No more copy-pasting. /cc @hf.co

1 year ago 11 2 1 1

Or, My laptop has a 72 Wh battery (~208,512 J assuming only 80% is usable). Running Llama3.2-1B would drain the battery after processing:

- CPU: 674,249 tokens (~518,653 words, ~7 novels)
- GPU: 2,799,550 tokens (~2,153,500 words, ~30 novels)
- ANE: 11,273,184 tokens (~8,671,679 words, ~123 novels)

1 year ago 2 0 0 0
Advertisement

To put it in perspective: Llama3.2-1B uses ~280 GFLOPS per 20 tokens. My laptop (~2kg) running the model would be the energy equivalent of:

- CPU (6 J): dropping it from 1 foot (31 cm)
- GPU (1.4 J): dropping it from 3 inches (7 cm)
- ANE (0.3 J): dropping it by just half an inch (1.5 cm)!

1 year ago 2 0 1 0
Chart Title: Model Hardware vs Energy per GigaFLOP.
Vertical Axis: mJ/GFLOP(Log)
Horizontal Axis: Hardware Type(CPU, CPU + GPU, CPU + ANE)
CPU: min 6.9 1st quartile 11.7 median 13.4 3rd quartile 35.6 max 53.1
CPU + GPU: 4.6 4.6 4.7 6.2 9.6
CPU + ANE: 0.9 1.0	1.1 1.4 1.8

Chart Title: Model Hardware vs Energy per GigaFLOP. Vertical Axis: mJ/GFLOP(Log) Horizontal Axis: Hardware Type(CPU, CPU + GPU, CPU + ANE) CPU: min 6.9 1st quartile 11.7 median 13.4 3rd quartile 35.6 max 53.1 CPU + GPU: 4.6 4.6 4.7 6.2 9.6 CPU + ANE: 0.9 1.0 1.1 1.4 1.8

Preliminary data shows the Apple Neural Engine uses ~94% less energy than the CPU and ~75% less than the GPU 🤯

On the On-Device team at Hugging Face, we've been profiling energy usage for CoreML models. Here’s some data I collected:

1 year ago 4 1 2 0