We have Nvidia B200s ready to go for you in Hugging Face Inference Endpoints π₯
I tried them out myself and the performance is amazing.
On top of that we just got a fresh batch of H100s as well. At $4.5/hour it's a clear winner in terms of price/perf compared to the A100.
Posts by Erik
We just refreshed π our analytics in @hf.co
endpoints. More info below!
Morning workout at the @hf.co Paris office is imo one of the best perks.
Gemma 3 is live π₯
You can deploy it from endpoints directly with an optimally selected hardware and configurations.
Give it a try π
Apparently, mom is a better engineer than what I am.
today as part of a course, I implemented a program that takes a bit stream like so:
10001001110111101000100111111011
and decodes the intel 8088 assembly from it like:
mov si, bx
mov bx, di
only works on the mov instruction, register to register.
code: github.com/ErikKaum/bit...
Ambition is a paradox.
You should always aim higher, but that easily becomes a state where you're never satisfied. Just reached 10k MRR. Now there's the next goal of 20k.
Sharif has a good talk on this: emotional runway.
How do you deal with this paradox?
video: www.youtube.com/watch?v=zUnQ...
Thereβs some deep wisdom in that as well!
Qui Gon Jinn sharing some insightful prompting wisdom ππΌ
Exactly.
Suppose we have an algorithm that is guaranteed to give output according to a structure, with the caveat that it might run out of tokens.
Should this still be classified as structured generation?
π€
CUDA libraries..? So they have access to gpus as well? π
A video series on how to develop, profile and compare cuda kernels would be such a banger.
And allow a lot more tinkerers to enter the field.
Hell yeah π₯
How would you classify the edge case when running out of tokens?
E.g if it goes into a β\nβ loop and runs out of tokens.
Hah, fair!
Interesting, for me it's snappy as hell, maybe things aren't cached as well in Costa Rica? π€
pro tip for the borrow-checker, using .clone() everywhere is okay π
it's this time of the year π
Or then you can let the model run free in a constrained environment.
Iβm tinkering on this: bsky.app/profile/erik...
Hugging Face inference endpoints now support CPU deployment for llama.cpp π π
Why this is a huge deal? Llama.cpp is well-known for running very well on CPU. If you're running small models like Llama 1B or embedding models, this will definitely save tons of money π° π°
Nice! This is so neat ππ½
Let's go! We are releasing SmolVLM, a smol 2B VLM built for on-device inference that outperforms all models at similar GPU RAM usage and tokens throughputs.
SmolVLM can be fine-tuned on a Google collab and be run on a laptop! Or process millions of documents with a consumer GPU!
Is it just me or does it intuitively align that chat bars are at the bottom of the page and search bars at the top?
I've noticed that perplexity positions the question on the top and generates the text below.
Is it because they want to position more as a search engine?
The hope if have with Bluesky is that I as a user can do moderation more efficiently than what I could on twitter π€πΌ
Feeds and starter packs helped at least me a lot. E.g: bsky.app/profile/did:...
Indeed, the beauty of open source π₯
Canβt wait to have that feature!
Itβs kinda mind blowing that itβs not a thing on other social media platforms π€·πΌββοΈ
code boxes with syntax highlighting π