Salesforce AI Research (@sfresearch) Bsky

Beyond 100K Tokens: Evaluating AI Agents in Long-Context Software Engineering As codebases grow to millions of lines of code, can AI agents still understand, reason, and code effectively? LoCoBench-Agent delivers the answer: a comprehensive benchmark for evaluating AI coding as...

Can AI coding assistants maintain effectiveness as codebases scale 100×? LoCoBench-Agent evaluates agents across 10K to 1M token contexts, spanning 8,000 scenarios in 10 programming languages and four difficulty tiers.

https://sforce.co/4txCDj4

👥: Jielin Qiu & Huan Wang

#EnterpriseAI #AgenticAI

9 hours ago 0 0 0 0

The Illusion of Certainty: Decoupling Capability and Calibration in On-Policy Distillation: bit.ly/48iccVY

On-policy distillation boosts accuracy but causes severe overconfidence. CaOPD uses a student-grounded empirical target for Pareto-optimal calibration.

Code: bit.ly/4cUtCKO

1 day ago 0 0 0 0

TDX Live Blog: All the Highlights You Missed On-the-ground reporting from the must-attend developer event for the Agentic Enterprise

At #TDX26, Itai Asseo @iiitaiii.bsky.social on Enterprise General Intelligence: refining generic LLMs into reliable enterprise agents through AI Foundry and Agentforce Labs, including eVerse, the Learning Engine, and Agent Startup. https://sforce.co/3OrRYCb #FutureOfAI #EnterpriseAI

1 day ago 0 0 0 0

(7/7) Lost in Translation: Do LVLM Judges Generalize Across Languages?

Authors: Md Tahmid Rahman Laskar, Mohammed Saidul Islam, Mir Tafseer Nayeem, Amran Bhuiyan, Mizanur Rahman, Shafiq Joty, Enamul Hoque, Jimmy Huang

Accepted to #ACL2026

2 days ago 0 0 0 0

(6/7) Discovering the Gems in Early Layers: Accelerating Long-Context LLMs with 1000x Input Token Reduction: bit.ly/4mQq0gj

Authors: Zhenmei Shi, Yifei Ming, Xuan-Phi Nguyen, Yingyu Liang, Shafiq Joty

Accepted to #ACL2026

2 days ago 0 0 1 0

(5/7) J4R: Learning to Judge with Equivalent Initial State Group Relative Policy Optimization: bit.ly/48egJZp

Authors: Austin Xu, Yilun Zhou, Xuan-Phi Nguyen, Caiming Xiong, Shafiq Joty

Accepted to #ACL2026

2 days ago 0 0 1 0

(4/7) Hard2Verify: A Step-Level Verification Benchmark for Open-Ended Frontier Math: bit.ly/4ofghQs

Authors: Shrey Pandit, Austin Xu, Xuan-Phi Nguyen, Yifei Ming, Caiming Xiong, Shafiq Joty

Accepted to #ACL2026

2 days ago 0 0 1 0

(3/7) From Passive Metric to Active Signal: A Survey on the New Paradigm of Uncertainty in Large Language Models: bit.ly/3O4NIrI

Authors: Jiaxin Zhang, Wendi Cui, Zhuohang Li, Lifu Huang, Brad Malin, Caiming Xiong, Chien-Sheng Wu

Accepted to #ACL2026

2 days ago 0 0 1 0

(2/7) GTA: Generating Long-horizon Tasks for Web Agents at Scale

Authors: Tenghao Huang, Kung-Hsiang Huang, Prafulla Kumar Choubey, Yilun Zhou, Chien-Sheng Wu

Accepted to #ACL2026

2 days ago 0 0 1 0

(1/7) We have 6 papers accepted to ACL 2026, advancing work across web agent evaluation, LLM reasoning verification, uncertainty quantification, long-context efficiency, and multilingual judge systems.

ACL 2026 takes place July 2-7 in San Diego, California.

#ACL2026 #FutureOfAI #EnterpriseAI

2 days ago 0 0 1 0

That's a wrap on #TDX26! The @salesforce.com AI Research team was on the ground in San Francisco, connecting with the community and sharing the latest from our labs. From research demos to conversations about what's next in #EnterpriseAI, it was great to be part of the energy!

5 days ago 0 0 0 0

Salesforce AI: Reliability Trumps Raw Model Capability As AI matures, enterprise success hinges on integrated systems that deliver consistent performance across the most complex professional business workflows

Reliability over raw power. 🔬 @aimagazine.bsky.social features Silvio Savarese on why enterprise AI value comes from integrated systems—not bigger models—and how AI Foundry is building that foundation. https://bit.ly/41z0G4E

#SystemLevelAI #AIFoundry

6 days ago 0 0 0 0

Salesforce launches AI Foundry as it doubles down on three "big bets" to accelerate enterprise AI Details on Salesforce AI Foundry, launched today to help AI researchers, customers and partners to collaborate in a safe environment

AI Foundry: Turning foundational research into enterprise AI products faster. Silvio Savarese and Itai Asseo in TechFinitive: bit.ly/4ccYmFy

#FutureOfAI #EnterpriseAI #AgenticAI

1 week ago 1 0 0 0

Salesforce AI Foundry: System Reliability Beats Model Power The era of the model wars is over, with enterprise AI success occurring at the system level, demanding reliability and full integration over model power

The model wars are over. Enterprise AI success now lives at the system level. Silvio Savarese and Itai Asseo in @technologymag.bsky.social: bit.ly/4ttSBu4

#FutureOfAI #EnterpriseAI #AgenticAI

1 week ago 1 0 0 0

Salesforce Brings Ambient Intelligence to Sales Calls Salesforce showcases ambient intelligence for real-time sales workflows and highlights Agentforce upgrades focused on scale, governance, and enterprise outcomes.

Ambient intelligence is moving from research into live sales and service workflows. Silvio Savarese discusses with @cxtoday.com: bit.ly/47DUhJ1

#FutureOfAI #EnterpriseAI #AgenticAI

1 week ago 1 0 0 0

Salesforce AI Research identifies trends shaping agentic AI Simulation environments, agent-to-agent ecosystems, and ambient intelligence will be at the heart of the Salesforce product roadmap through AI Foundry initiative.

Three agentic AI trends shaping the enterprise through 2027. Silvio Savarese and Itai Asseo discuss AI Foundry with @cio.com: bit.ly/4sg6iMf

#FutureOfAI #EnterpriseAI #AgenticAI

1 week ago 2 0 0 0

One demo. Reliable replay. No cloud calls. GPA turns a single recorded workflow into deterministic desktop automation, entirely on-device.

🔎 Explore GPA: bit.ly/48r7Onp

📖 Read the blog: sforce.co/4sYdhu8

#EnterpriseAI #GUIAutomation

1 week ago 2 1 0 0

The big bets are on as Salesforce pitches the need for enterprise transition from model to system level AI Itai Asseo, VP of Salesforce AI Research, explains some new enterprise realities on the way.

Why #EnterpriseAI demands a shift from models to systems. Itai Asseo discusses AI Foundry with @diginomica.com: bit.ly/4scZwGT

#FutureOfAI #AgenticAI

2 weeks ago 2 1 0 0

From One Demo to Reliable Automation: How GPA Reimagines GUI Process Automation Tired of GUI automation that breaks after one demo? Discover how GPA reimagines process automation to deliver stable, scalable, and truly reliable results for enterprise workflows.

From One Demo to Reliable Automation: How GPA Reimagines GUI Process Automation https://sforce.co/4sYdhu8

Show it a workflow once. GPA replays it reliably, locally, and without brittle scripts to maintain.

#FutureOfAI #EnterpriseAI #AgenticAI #GUIAutomation

2 weeks ago 2 0 0 0

(5/5) ✍️ Authors: Jielin Qiu, Zixiang Chen, Liangwei Yang, @mingzhu0527.bsky.social, Zhiwei Liu, Juntao Tan, @wenting088.bsky.social, Rithesh Murthy, Roshan Ram,
@aksh555.bsky.social, @shelbyhai.bsky.social, @caimingxiong.bsky.social, Silvio Savarese, Huan Wang

#FutureOfAI #EnterpriseAI

2 weeks ago 2 0 0 0

(4/5) 🏎️ Using Deepgram, vLLM, and ElevenLabs, the team hit a P50 time-to-first-audio of 947ms (best case 729ms) — ~17× faster than native speech-to-speech. Full 9-chapter tutorial with working code 📖

2 weeks ago 1 0 1 0

(3/5) ⚡ The key insight: "realtime" isn't one fast model. It's streaming + pipelining across components. A cascaded STT → LLM → TTS pipeline where each stage streams output to the next achieves sub-1-second response.

2 weeks ago 1 0 1 0

(2/5) 🐌 Native speech-to-speech models like Qwen2.5-Omni produce quality audio but are too slow for realtime (~13s time-to-first-audio) and don't support function calling — a must for enterprise agents.

2 weeks ago 1 0 1 0

(1/5) 🎙️ Building Enterprise Realtime Voice Agents from Scratch: A Technical Tutorial

Paper: bit.ly/4seq7Ee

25+ open-source speech-to-speech models exist, but none shows how to build a complete streaming voice agent with function calling.

2 weeks ago 2 0 1 0

5/5 ✍️ Jielin Qiu, Liangwei Yang, @mingzhu0527.bsky.social, @wenting088.bsky.social, Zhiwei Liu, Juntao Tan, Zixiang Chen, Roshan Ram, @aksh555.bsky.social, Rithesh Murthy, @shelbyhai.bsky.social, @caimingxiong.bsky.social, Silvio Savarese, Huan Wang

#FutureOfAI #EnterpriseAI

2 weeks ago 2 0 0 0

4/5 📊 Tested on 50 insurance products across 10 categories with 2,490 FAQs, 290 coverage details, and 162 pricing tiers. Domain-agnostic and adaptable to any enterprise sales environment.

2 weeks ago 2 0 1 0

3/5 ⚡ 2.8-second mean response time with 100% question detection, a 14× speedup over manual search. Cross-product comparisons see the biggest gains at 23×.

2 weeks ago 1 0 1 0

2/5 🔍 The system streams live audio through speech-to-text, detects customer questions via LLM, then retrieves answers using hybrid FAQ matching and text-to-SQL over a structured product database.

2 weeks ago 1 0 1 0

Enterprise Sales Copilot: Enabling Real-Time AI Support with Automatic Information Retrieval in Live Sales Calls During live sales calls, customers frequently ask detailed product questions that require representatives to manually search internal databases and CRM systems. This process typically takes 25-65 seco...

1/5 Enterprise Sales Copilot: Enabling Real-Time AI Support with Automatic Information Retrieval in Live Sales Calls bit.ly/4tamShg

🎙️ Reps lose 25–65 sec per query searching CRM systems mid-call. SalesCopilot fixes that.

2 weeks ago 3 0 1 0

GPA: GUI Process Automation | Salesforce AI Research GUI Process Automation (GPA) is a demo-based RPA framework for automating desktop GUI tasks on macOS — LLM-free, vision-based, fully local.

GPA exposes workflows as MCP/CLI tools — your AI agent handles reasoning while GPA handles deterministic GUI execution. Works with Claude Code, Claude Desktop, Cursor, and more.

Learn more at www.salesforceairesearch.com/gpa

2 weeks ago 2 0 0 0

Posts by Salesforce AI Research