Appu Shaji (@appughar) Bsky

Re-distilling a distilled model ( Qwen-Deepseek R1 1.5B ) . Getting few percentage point increase in benchmarks.

1 year ago 1 0 0 0

Super thrilled to release a new version of gemlite, delivering up to 7–8x faster prefill and 3–6x faster batch decoding speed 🚀🚀🚀🚀🚀 compared to PyTorch's tinygemm.

1 year ago 1 0 0 0

Interestingly GluGlu activations are demonstrating significant gains on Winograd-like datasets, with performance curiously peaking as we approach the winter holiday period. 🍷 Inspired by this, we will release a new dataset called GluWine following the completion of extensive experimentation.

1 year ago 7 0 1 0

"Many years later, as he faced the firing squad, Colonel Aureliano Buendía was to remember that distant afternoon when his father took him to discover ice."

Are there other examples of such tense melding in literature?

p.s..: Anticipatory apologies for not using the Spanish version.

1 year ago 0 0 0 0

Agree. Long-form podcasts are often consumed while multitasking—at least for me (e.g., driving, cooking working out etc.)—unlike the focused attention typical of books or TV shows from yesteryear. What’s absorbed becomes a mix of the discussion and one’s mental interludes.

1 year ago 0 0 0 0

Considering @roydanroy.bsky.socials little one has finished the Harry Potter series, The Hobbit should definitely be accessible. I believe Tolkien originally wrote it for his own children. The Lord of the Rings, however, can be a bit trickier.

1 year ago 3 0 0 0

Hobbit and Lord of the rings series. I read aloud Hobbit with my daughter a few years back, and not sure who among us enjoyed it the most. Tolkien’s use of language is really beautiful.

1 year ago 7 0 1 0

The script of serious researchers fading away while nefarious actors take over as technology becomes practical is as old as time. Are we unintentionally contributing to this cycle by staying out of it?

1 year ago 1 0 0 0

100% agree on continuum—applies broadly across other aspects of AI (e.g., models enabling bio-weapons, surveillance, etc.). Why not redirect your efforts to safety research? i.e entry barrier for setting up a tracking system has become so low, ergo tackling FP and safety concerns is relevant & open.

1 year ago 0 0 1 0

Yo Raises $1.5M In Funding At A $10M Valuation, Investors Include Betaworks And Pete Cashmore | TechCrunch Yo, the simple app that just sends a "yo" to your friends, has closed $1.5 million in seed funding with a $10 million valuation and is finally ready to talk about its investors. They include Betaworks...

I suggest you keep a version private/commercial. With your data moat, you might able to raise better valuation than these guys : techcrunch.com/2014/07/18/y...

1 year ago 4 0 1 0

Learning to learn by gradient descent by gradient descent The move from hand-designed features to learned features in machine learning has been wildly successful. In spite of this, optimization algorithms are still designed by hand. In this paper we show how...

arxiv.org/abs/1606.04474 this was getting a lot of eyeballs a few years back: and in generally spun out a few meta learning work ( especially in context and few shot context ).

1 year ago 1 0 0 0

Is there a better estimator here?

In stochastic gradients, as the number of mini-batches (assuming i.i.d.) grows, the Central Limit Theorem kicks in, making gradient estimation more robust. (Ergo, if you have scale, this is a sensible thing to do)

1 year ago 1 0 0 0

Still remember the time when people were freaking out when #papers breached 1000 ( sometime in 2000s ).

Was not uncommon for the submission servers going down and getting more time to iterate and submit past the deadline.

1 year ago 0 0 0 0

Treat it as noise. These are low entropy/utility interactions and not worth the headspace.

1 year ago 6 0 0 0

Anti-censorship is something I agree with in principle, like Elon (though I don’t think he practices what he preaches). As adults, we should be able to block and disengage ourselves; but platform maintainers jumping in to censor feels like going from the frying pan into the fire (or vice versa 🤔 ).

1 year ago 3 0 1 1

This is incredibly unfortunate and sets a very bad precedent. What’s even more appalling is the complete lack of explanation for the ban.

1 year ago 5 0 1 0

Let’s not make this us (ML folks) vs them (anti-AI crowd). Many are anxious about being replaced by AI, and their frustration is often misdirected in loud, mob-like ways at the wrong targets. While we shouldn’t tolerate toxicity ( thanks for the list btw), siloing ourselves can be equally harmful.

1 year ago 3 0 1 0

This! In general, the goal of any review system should be to verify, reproduce, and push the boundaries of our collective scientific knowledge. Compared to openly reproducible code and evaluations, the merits of rushed and often opinionated reviews are frequently inferior. ( note: very ML specific )

1 year ago 13 3 0 0

Google Scholar

False negatives are also a major issue. My most cited paper, which I coauthored and is widely used by practitioners, was rejected four times before we turned it into a technical report and finally published it in PAMI (scholar.google.de/scholar?q=sl...). We had almost given up.

1 year ago 2 0 0 0

Release faster-whisper 1.1.0 · SYSTRAN/faster-whisper New Features New batched inference that is 4x faster and accurate, Refer to README on usage instructions. Support for the new large-v3-turbo model. VAD filter is now 3x faster on CPU. Feature Extr...

Really happy to contribute to the batched version of faster-whisper that is 4x faster and more accurate 🚀🚀🚀

github.com/SYSTRAN/fast...

1 year ago 2 1 0 0

Imho, peer review is a system designed to verify, reproduce, and push the boundaries of our collective scientific knowledge. My two cents are seeking out a better system that aligns with these goals is the need of the hour.

1 year ago 1 0 0 0

I agree that this is, unfortunately, the era of anti-intellectualism. That said, if the glaring inefficiencies in current peer-review systems are not addressed, it will only add fuel to the fire. It is far better to hash things out collectively and proactively.

1 year ago 1 0 1 0

The paths my colleagues took were through intermediate adjunct roles. Given the current climate & the pay scale differences in ML, this rarely happens.

That said, such avenues should be normalized, as they ultimately represent valid contributions to scientific knowledge and pushing a field forward.

1 year ago 0 0 0 0

🫡

1 year ago 0 0 0 0

For forward-looking ideas, there should be an equivalent platform. Limiting papers per reviewer is low-hanging fruit—I’ve seen reviewers rush 10+ papers a few hrs before deadlines. Further, with arXiv, double-blind reviews feel like a smoke screen & might work better for peer or community feedback.

1 year ago 0 0 0 0

100% agree this needs reworking. In commercial research labs (like ours), where publications aren’t a currency as in academia, we focus on blogs and open-source projects to disseminate ideas. For evals & reproducible research, this approach is more effective at pushing knowledge boundaries imho.

1 year ago 0 0 2 0

Really appreciate the work and opening up the recipe. Open source FTW!

1 year ago 2 0 0 0

While I partially agree, ultimately, from a product usage perspective, final speed and memory usage matters a lot. However, from a research standpoint, it likely needs to be categorized—for example, similar to how we classify models by parameter counts, type of architecture, and so on.

1 year ago 1 0 0 0

Working on multimodality 🙋‍♂️ . Been in the field for past 20 years or so.

1 year ago 1 0 0 0

Hello, everyone! I love the AI community on X, though not so much the constant squabbling and bickering. I'm here with a faint hope to find more of the former and less of the latter.

1 year ago 1 0 0 0

Posts by Appu Shaji