We couldn't have done this without amazing authors. Shai Satran, Will Kidder, Jason D'Cruz, @krvarshney.bsky.social, Sean Laurent, Sooyun Iris Chung, Ariel Goldstein, @gabistanovsky.bsky.social Austin Beattie, @andyhigh.bsky.social @mohammadatari.bsky.social @firatseker.bsky.social Aliah Zewail 5/n
Posts by Kush Varshney कुश वार्ष्णेय
The latest Stanford University Foundation Model Transparency Index was released out today, and IBM took the top spot !
In a year when other major AI players retreated from transparency, we doubled down and received the highest score in the Index’s history:
research.ibm.com/blog/ibm-gra...
"When language no longer requires belief, AI’s fluency becomes a kind of anesthesia. And we are the ones it sedates. I’m reminded of T. S. Eliot’s ghostly image of a “patient etherized upon a table,” alive yet emptied of agency." www.psychologytoday.com/us/blog/the-...
The bar chart is titled **“Retrieval Augmented Generation (RAG)”** and shows **MTRAG mean accuracy** on the y-axis (0–80 scale). ### Results by model: * **Granite-4.0-H-Small**: **73** (blue bar, highest) * **Granite-4.0-Micro**: **72** (blue bar, nearly tied with H-Small) * **GPT-OSS-20B**: **68** (green bar) * **Mistral-Small-3.2-Instruct**: **48** (green bar, lowest score) * **Llama-3.2-Instruct**: **53** (green bar) * **Llama-3.3-70B-Instruct**: **61** (green bar) * **Qwen3-8B**: **55** (green bar) ### Key takeaway: The **Granite-4.0 models (H-Small and Micro)** outperform all others, achieving ~73 accuracy, with GPT-OSS-20B in third at 68. The weakest performance is from **Mistral-Small-3.2-Instruct (48)**.
Granite-4.0-H-Small: a 32B-A9B MoE Mamba for high efficency
Damn! IBM is on the map. The American Qwen? I barely even knew IBM made LLMs, this is solid
www.ibm.com/new/announce...
Recently got to have a super interesting conversation with the infinitely fascinating @krvarshney.bsky.social about why we need to make AI safe, and the very nature of ethics in a disaggregated digital world. Have a watch !
www.youtube.com/watch?v=g2A7...
Check out IBM's latest open source tools for trustworthy AI on GitHub:
In-Context Explainability 360
FactReasoner
Contextual Privacy
Links from here: research.ibm.com/blog/debuggi...
"In my own interactions with ChatGPT, it has often responded, with patently insincere flattery: “That’s a great question.” It has never responded: “That’s the wrong question.” It has never challenged my moral convictions or asked me to justify myself."
www.nytimes.com/2025/08/02/o...
"Until we recognise that the debate about AI is not just about what machines can do but also about how humans should value education and knowledge, it will remain mired in confusion." observer.co.uk/news/opinion...
"The true measure of progress in AI lies not in the sophistication of algorithms but in whether it genuinely serve the people and communities they seek to empower. Without grounding in human dignity and local contexts, AI risks creating technological subjugation."
www.brookings.edu/articles/ai-...
What do authorship, copyright, and creativity mean in the age of AI? @krvarshney.bsky.social talks to us about it:
research.ibm.com/blog/kush-va...
"Training yourself to observe and challenge these automatic thoughts—what psychologists call metacognition—is strikingly similar to the Buddhist concept of yoniso manasikāra, or wise attention." www.forbes.com/councils/for...
"The next decade will be shaped by innovators using AI to solve real problems in real communities. The future won’t be written in Silicon Valley, but in Lagos, Jakarta, Cairo and Dubai. AI-powered solutions fused with local knowledge will unlock this future." www.weforum.org/stories/2025...
Weike Zhao, Chaoyi Wu, Yanjie Fan, Xiaoman Zhang, Pengcheng Qiu, Yuze Sun, Xiao Zhou, Yanfeng Wang, Ya Zhang, Yongguo Yu, Kun Sun, Weidi Xie
An Agentic System for Rare Disease Diagnosis with Traceable Reasoning
https://arxiv.org/abs/2506.20430
📣 Today we open-sourced EvalAssist, a web-based tool that makes it super easy to develop criteria for llm judges. You can run this now locally and then scale up with notebooks using Unitxt. Check out the AI Alliance article to get the scoop:
thealliance.ai/blog/llm-as-...
🚨 Announcing our #keynote speakers for the 3rd Trustworthy AI #Workshop @deeplearningindaba.bsky.social ! We are excited to welcome thought leaders pushing the boundaries of #ResponsibleAI
@krvarshney.bsky.social is a Fellow IBM Research
Djallel Bouneffouf, Matthew Riemer, Kush Varshney: The Ultimate Test of Superintelligent AI Agents: Can an AI Balance Care and Control in Asymmetric Relationships? https://arxiv.org/abs/2506.01813 https://arxiv.org/pdf/2506.01813 https://arxiv.org/html/2506.01813
Announcing our keynote speakers for #FAccT2025! 🎉
Suresh Venkatasubramanian (Brown)
Nathalie Smuha (KU Leuven)
Kristian Lum (Google DeepMind)
Molly Crockett (Princeton)
And the plenary panel will be on “Pathways of Change and the Future of Responsible AI"
Frying gulab jamuns helps you understand the phenomenon of tidal locking between moons and planets.
🔗 Want to connect your agents together wherever they are🌎?
See what's possible with ACP! This video will show:
🎁 How to wrap an agent with the SDK
🔈 Calling out with a a standardized client
⛓️Chaining ACP calls to different agents
📲 Prototype of ACPCallingAgent
👉 www.youtube.com/watch?v=Nzaq...
Happy to see @bhoov.bsky.social recognized in this article about spin glasses and associative memory.
www.quantamagazine.org/the-strange-...
🤖 ✏️ There is a better way to explain how you used AI in your {research paper, college essay, blog posts, …}. Check out our new AI Attribution Toolkit and look for us at #CHI2025!
aiattribution.github.io
dl.acm.org/doi/full/10....
I appreciated the framing in terms of governors (research.ibm.com/blog/AI-gove...) and the discussion of many strategies for pursuing safety (doi.org/10.1089/big....). Now that we're moving to agentic AI, I think systems theories will be even more important for control (arxiv.org/abs/2503.00237).
“If we think about how human beings in the world, we do see bad things, so it’s not about allowing the language model to see only the good things. It’s about understanding the full spectrum — both good and bad,” says Ko, “and choosing to uphold our values when we speak.”
news.mit.edu/2025/trainin...
LLMs need not engage in a coloniality of knowledge by treating one culture's ethics or moral philosophy as universally correct. Instead, open LLMs should be aligned to value systems from different epistomologies and not assume universal values. 🌍🤖 #ai #hcai #alignment