The current approach is literally like doing precision-critical maths, then rounding to the nearest integer and complaining about a lack of accuracy
Posts by magnesit
Reasoning LLMs are the biggest compromise in machine learning - converting all those rich, contextual embeddings into discrete token IDs and then feeding them back. Awful.
I'm confident "they" will come up with something more efficient soon.
Please.
the U.S. made tax exemptions for WHAT???
Wowza, that actually sounds pretty cool!
Thanks for deciding to make it open-source! ๐ซถ
๐ญ๐ญ๐
These are tokens, generated by the model, paid for by the users. I don't know if that's supposed to drive profits up for the providers, but it feels so arbitrary to spend time waiting for the model to say that it's about to start thinking.
Like botched training. Strong models, but weirdly trained.
Image depicting a chain-of-thought of Gemma 4 E2B generating "Thinking process: 1. **Analyze the request**: ..."
I don't like the chains of thought of the newer open-weight LLMs on the market. They just don't try to be efficient anymore. I know, it's supposed to be more structured and stuff, but I think leaving all the distillation artifacts from bigger models like the one marked in the image is unacceptable.
Really cool stuff. Haven't given much *Attention* (see what I did there?) to the bigger ones in the family, but I guess it only gets better from here!
Gemma 4 E2B is extremely impressive. I tried it out a little bit and must admit that it's got that "feeling" of being much stronger than many of its bigger counterparts.
I am especially blown away by the inference speed on my phone, because it's genuinely a bit faster than reading speed on Pixel 7a
Rare European L unfortunately, the system has been heavily criticized. It certainly has upsides though, let's see what happens
When doing the exam, all you need to perform is a few matmuls here and there and you'll have your own chatbot to support you with the toughest questions!
Follow for more undetectable life hacks.
I came up with a tremendous, nearly undetectable method to cheat in exams; instead of trying to smuggle in an AI assistant akin to ChatGPT to help you out, simply *remember* (!) the weights of an open, frontier LLM like, say, GLM 5.1.
I welcome the two-week ceasefire the US and Iran agreed last night. It brings much-needed de-escalation.
I thank Pakistan for its mediation.
Now it is crucial that negotiations for an enduring solution to this conflict continue.
We will continue coordinating with our partners to this end.
The amount of nihilistic comments really shows that as long as there's authority, Bluesky will find something negative about it.
Social media accounts of corpos, countries and in this case, the European commission, exist. Deal with it.
You're doing great, EU, keep it up.๐ช๐บ
But the more I read things like these, my brain goes "ugh, federalize already."
This is such an astronomically great thing! ๐ฅน
He didn't get away with it.
Why in the name of fuck is Gemini 3.1 Flash Lite priced at $1.50/M output tokens? lil bro it's not that good ๐ฅ๐ฅ
Volla... ๐
funny how it's the same 3 companies that come up with stuff like that every single time
I'm really, really looking forward to Deepseek V4! Let's just hope it releases soon, because the competition is evolving a lot right now...
We're happy to announce a long-term partnership with Motorola. We're collaborating on future devices meeting our privacy and security standards with official GrapheneOS support.
motorolanews.com/motorola-thr...
There's just no better option security-wise. But rest assured, they are working on their own phone together with a large OEM, (likely) coming 2027.
Flash (Fast) avoids mistakes a pure instruct model would never be able to avoid with the current SOTA. Even situations where way too much relevance would be put onto the first token of the response - and where every other instruct model fails - are handled well by Flash.
Gemini 3 Flash (Fast mode) is literally just a reasoning model that pretends like it's not and any comparison between instruct models is inherently unfair. Even minimal reasoning is still reasoning and you can clearly feel the difference in the response quality my opinion.
Graphene Is All You Need.
Every VLM Implementation except for Qwen's and Gemini's feels botched.
Update, support for the model improved with newer versions of llama.cpp and hits >60 t/s decode speed now.
I still don't believe this thing will run well on a phone though.
Doubling the number of active parameters but cutting the number of experts in half feels arbitrary and has not shown an improvement in output quality so far. The model runs slower though.
We'll have to see what Magistral can make out of it.
I'm glad the RL slop has been reduced in the Ministral series of models. The models don't perform absolutely SOTA, but at least they do not seem to spam \boxed for every single problem now.
Their Magistral pipeline could probably continue to scale extremely well in the future though.