Re-distilling a distilled model ( Qwen-Deepseek R1 1.5B ) . Getting few percentage point increase in benchmarks.
Posts by Appu Shaji
Super thrilled to release a new version of gemlite, delivering up to 7–8x faster prefill and 3–6x faster batch decoding speed 🚀🚀🚀🚀🚀 compared to PyTorch's tinygemm.
Interestingly GluGlu activations are demonstrating significant gains on Winograd-like datasets, with performance curiously peaking as we approach the winter holiday period. 🍷 Inspired by this, we will release a new dataset called GluWine following the completion of extensive experimentation.
"Many years later, as he faced the firing squad, Colonel Aureliano Buendía was to remember that distant afternoon when his father took him to discover ice."
Are there other examples of such tense melding in literature?
p.s..: Anticipatory apologies for not using the Spanish version.
Agree. Long-form podcasts are often consumed while multitasking—at least for me (e.g., driving, cooking working out etc.)—unlike the focused attention typical of books or TV shows from yesteryear. What’s absorbed becomes a mix of the discussion and one’s mental interludes.
Considering @roydanroy.bsky.socials little one has finished the Harry Potter series, The Hobbit should definitely be accessible. I believe Tolkien originally wrote it for his own children. The Lord of the Rings, however, can be a bit trickier.
Hobbit and Lord of the rings series. I read aloud Hobbit with my daughter a few years back, and not sure who among us enjoyed it the most. Tolkien’s use of language is really beautiful.
The script of serious researchers fading away while nefarious actors take over as technology becomes practical is as old as time. Are we unintentionally contributing to this cycle by staying out of it?
100% agree on continuum—applies broadly across other aspects of AI (e.g., models enabling bio-weapons, surveillance, etc.). Why not redirect your efforts to safety research? i.e entry barrier for setting up a tracking system has become so low, ergo tackling FP and safety concerns is relevant & open.
I suggest you keep a version private/commercial. With your data moat, you might able to raise better valuation than these guys : techcrunch.com/2014/07/18/y...
arxiv.org/abs/1606.04474 this was getting a lot of eyeballs a few years back: and in generally spun out a few meta learning work ( especially in context and few shot context ).
Is there a better estimator here?
In stochastic gradients, as the number of mini-batches (assuming i.i.d.) grows, the Central Limit Theorem kicks in, making gradient estimation more robust. (Ergo, if you have scale, this is a sensible thing to do)
Still remember the time when people were freaking out when #papers breached 1000 ( sometime in 2000s ).
Was not uncommon for the submission servers going down and getting more time to iterate and submit past the deadline.
Treat it as noise. These are low entropy/utility interactions and not worth the headspace.
Anti-censorship is something I agree with in principle, like Elon (though I don’t think he practices what he preaches). As adults, we should be able to block and disengage ourselves; but platform maintainers jumping in to censor feels like going from the frying pan into the fire (or vice versa 🤔 ).
This is incredibly unfortunate and sets a very bad precedent. What’s even more appalling is the complete lack of explanation for the ban.
Let’s not make this us (ML folks) vs them (anti-AI crowd). Many are anxious about being replaced by AI, and their frustration is often misdirected in loud, mob-like ways at the wrong targets. While we shouldn’t tolerate toxicity ( thanks for the list btw), siloing ourselves can be equally harmful.
This! In general, the goal of any review system should be to verify, reproduce, and push the boundaries of our collective scientific knowledge. Compared to openly reproducible code and evaluations, the merits of rushed and often opinionated reviews are frequently inferior. ( note: very ML specific )
False negatives are also a major issue. My most cited paper, which I coauthored and is widely used by practitioners, was rejected four times before we turned it into a technical report and finally published it in PAMI (scholar.google.de/scholar?q=sl...). We had almost given up.
Really happy to contribute to the batched version of faster-whisper that is 4x faster and more accurate 🚀🚀🚀
github.com/SYSTRAN/fast...
Imho, peer review is a system designed to verify, reproduce, and push the boundaries of our collective scientific knowledge. My two cents are seeking out a better system that aligns with these goals is the need of the hour.
I agree that this is, unfortunately, the era of anti-intellectualism. That said, if the glaring inefficiencies in current peer-review systems are not addressed, it will only add fuel to the fire. It is far better to hash things out collectively and proactively.
The paths my colleagues took were through intermediate adjunct roles. Given the current climate & the pay scale differences in ML, this rarely happens.
That said, such avenues should be normalized, as they ultimately represent valid contributions to scientific knowledge and pushing a field forward.
🫡
For forward-looking ideas, there should be an equivalent platform. Limiting papers per reviewer is low-hanging fruit—I’ve seen reviewers rush 10+ papers a few hrs before deadlines. Further, with arXiv, double-blind reviews feel like a smoke screen & might work better for peer or community feedback.
100% agree this needs reworking. In commercial research labs (like ours), where publications aren’t a currency as in academia, we focus on blogs and open-source projects to disseminate ideas. For evals & reproducible research, this approach is more effective at pushing knowledge boundaries imho.
Really appreciate the work and opening up the recipe. Open source FTW!
While I partially agree, ultimately, from a product usage perspective, final speed and memory usage matters a lot. However, from a research standpoint, it likely needs to be categorized—for example, similar to how we classify models by parameter counts, type of architecture, and so on.
Working on multimodality 🙋♂️ . Been in the field for past 20 years or so.
Hello, everyone! I love the AI community on X, though not so much the constant squabbling and bickering. I'm here with a faint hope to find more of the former and less of the latter.