Advertisement · 728 × 90
#
Hashtag
#LLMAgents
Advertisement · 728 × 90
Agentic Engineering Patterns - Simon Willison's Weblog

3/3

What stuck with me most: follow your curiosity. Whatever you build becomes the foundation for the next thing.
simonwillison.net/guides/agent...

#AgenticEngineering #AI #SoftwareEngineering #TDD #CodingWithAI #LLMAgents #BuildInPublic #DevProductivity #AITools #CuriousBuilder

2 0 0 0

🛡️ Diseño de agentes de IA para resistir la inyección de prompts

Cómo ChatGPT se defiende de ataques de ingeniería social e inyección de prompts.

openai.com/index/designing-agents-t...

#AISecurity #PromptInjection #LLMAgents #RoxsRoss

1 0 0 0

Datalayer is writing a paper on cost-effective LLM agents (70–92% token reduction via codemode).

Seeking co-authors with real-world agent data.Read & apply → datalayer.ai/research/con...

#LLMAgents #AIEngineering #OpenScience

1 1 0 0
New Defense Thwarts Attacks on AI Language Agents ICON, a new defense mechanism, effectively neutralizes Indirect Prompt Injection (IPI) attacks on Large Language Model (LLM) agents. By leveraging a probing-to-mitigation framework, ICON achieves a competitive 0.4% Attack Success Rate (ASR) and a significant 50% task utility gain. This ensures

📰 New Defense Thwarts Attacks on AI Language Agents

ICON, a new defense mechanism, effectively neutralizes Indirect Prompt Injection (IPI) attacks on Large Language Model (LLM) a...

www.clawnews.ai/new-defense-thwarts-atta...

#AISecurity #LLMAgents #PromptInjection

0 0 0 0

Vercel's agent evals show `AGENTS.md` (compressed docs) beats sophisticated 'skills'. 🤔 This sparks debate on LLM context management: why did simple win over complex? It challenges assumptions about how LLMs best use provided info. #LLMAgents 1/6

0 0 1 0
Post image Post image

Building an agent is like hiring and training a new investigator. You provide them with the right expertise- Model, give them access to the filing cabinet and phone-the Tools, and hand them a standard operating procedure- the Instructions.
#GenerativeAI #LLMAgents #OpenAI #AIAutomation

1 0 0 0

RLMs vs. RAG: A key difference is "agency." While RAG relies on external retrieval, RLMs empower the LLM itself to autonomously retrieve and decide what information to use, increasing its role in decision-making processes. #LLMagents 3/6

0 0 1 0
Preview
Agent-R1: Training Powerful LLM Agents with End-to-End Reinforcement Learning Large Language Models (LLMs) are increasingly being explored for building Agents capable of active environmental interaction (e.g., via tool use) to solve complex problems. Reinforcement Learning…

Agent-R1 frames RL for agentic LLMs (extended MDP) and ships a modular end-to-end training stack. On multi-hop QA, RL beats RAG/base tool calling with notable gains. Code (MIT) inside. Paper: arxiv.org/abs/2511.14460 #LLMAgents #ReinforcementLearning #NLP

1 0 0 0
Preview
Reinforcement Learning for Self-Improving Agent with Skill Library Large Language Model (LLM)-based agents have demonstrated remarkable capabilities in complex reasoning and multi-turn interactions but struggle to continuously improve and adapt when deployed in new…

SAGE: an RL framework that teaches LLM agents to create & reuse executable skills via Sequential Rollout + Skill-integrated Reward. On AppWorld it boosts SGC and slashes tokens vs GRPO. Paper: arxiv.org/abs/2512.17102 #ReinforcementLearning #LLMAgents #SkillLibrary

0 0 0 0
Preview
Meta-RL Induces Exploration in Language Agents Reinforcement learning (RL) has enabled the training of large language model (LLM) agents to interact with the environment and to solve multi-turn long-horizon tasks. However, the RL-trained agents…

LaMer brings meta-RL to LLM agents: cross-episode credit + in-context reflection = stronger exploration, better pass@3 & OOD generalization across Sokoban, Minesweeper, Webshop, ALFWorld. Paper: arxiv.org/abs/2512.16848 #MetaRL #LLMAgents #ReinforcementLearning

0 0 0 0
Preview
DeepCode: Open Agentic Coding Recent advances in large language models (LLMs) have given rise to powerful coding agents, making it possible for code assistants to evolve into code engineers. However, existing methods still face…

DeepCode turns papers into production-grade repos via blueprint distillation, code memory, RAG, and closed-loop fixes—posting SOTA on PaperBench and even topping PhD experts on a 3-paper subset. Paper: arxiv.org/abs/2512.07921 #AI #SoftwareEngineering #LLMAgents

1 0 0 0
µ𝐒𝐭𝐚𝐜𝐤 interactive CLI

µ𝐒𝐭𝐚𝐜𝐤 interactive CLI

🤖⚡🖥️ From 𝙇𝙇𝙈-𝙥𝙤𝙬𝙚𝙧𝙚𝙙 𝙨𝙩𝙧𝙪𝙘𝙩𝙪𝙧𝙚 𝙜𝙚𝙣𝙚𝙧𝙖𝙩𝙞𝙤𝙣 𝙩𝙝𝙧𝙤𝙪𝙜𝙝 𝙈𝙇-𝙗𝙖𝙨𝙚𝙙 𝙧𝙚𝙡𝙖𝙭𝙖𝙩𝙞𝙤𝙣 𝙩𝙤 𝙝𝙞𝙜𝙝-𝙛𝙞𝙙𝙚𝙡𝙞𝙩𝙮 𝙢𝙞𝙘𝙧𝙤𝙨𝙘𝙤𝙥𝙮 𝙨𝙞𝙢𝙪𝙡𝙖𝙩𝙞𝙤𝙣𝙨—µStack orchestrates the complete pipeline with ease.

𝚆𝚑𝚊𝚝 𝚖𝚊𝚔𝚎𝚜 𝚒𝚝 𝚜𝚙𝚎𝚌𝚒𝚊𝚕: ✨
• Multi-technique microscopy support (𝗦𝗧𝗠, 𝗜𝗘𝗧𝗦, 𝗧𝗘𝗠, 𝗔𝗙𝗠) with GPU acceleration 🎯
• Intelligent session management for seamless structure reuse 🔄
• Natural language query interface paired with Materials Project database 📊
• Both CLI and interactive web interface with real-time progress tracking 🖥️

𝚄𝚗𝚍𝚎𝚛 𝚝𝚑𝚎 𝚑𝚘𝚘𝚍: 🛠️
- LangGraph for multi-agent orchestration
- MACE-MP/UMA universal ML potential for rapid structure relaxation using TorchSim
- GPAW DFT, abTEM, and ppafm for physics-accurate simulations
- FastAPI + React frontend

🤖⚡🖥️ From 𝙇𝙇𝙈-𝙥𝙤𝙬𝙚𝙧𝙚𝙙 𝙨𝙩𝙧𝙪𝙘𝙩𝙪𝙧𝙚 𝙜𝙚𝙣𝙚𝙧𝙖𝙩𝙞𝙤𝙣 𝙩𝙝𝙧𝙤𝙪𝙜𝙝 𝙈𝙇-𝙗𝙖𝙨𝙚𝙙 𝙧𝙚𝙡𝙖𝙭𝙖𝙩𝙞𝙤𝙣 𝙩𝙤 𝙝𝙞𝙜𝙝-𝙛𝙞𝙙𝙚𝙡𝙞𝙩𝙮 𝙢𝙞𝙘𝙧𝙤𝙨𝙘𝙤𝙥𝙮 𝙨𝙞𝙢𝙪𝙡𝙖𝙩𝙞𝙤𝙣𝙨—µStack orchestrates the complete pipeline with ease. 𝚆𝚑𝚊𝚝 𝚖𝚊𝚔𝚎𝚜 𝚒𝚝 𝚜𝚙𝚎𝚌𝚒𝚊𝚕: ✨ • Multi-technique microscopy support (𝗦𝗧𝗠, 𝗜𝗘𝗧𝗦, 𝗧𝗘𝗠, 𝗔𝗙𝗠) with GPU acceleration 🎯 • Intelligent session management for seamless structure reuse 🔄 • Natural language query interface paired with Materials Project database 📊 • Both CLI and interactive web interface with real-time progress tracking 🖥️ 𝚄𝚗𝚍𝚎𝚛 𝚝𝚑𝚎 𝚑𝚘𝚘𝚍: 🛠️ - LangGraph for multi-agent orchestration - MACE-MP/UMA universal ML potential for rapid structure relaxation using TorchSim - GPAW DFT, abTEM, and ppafm for physics-accurate simulations - FastAPI + React frontend

🔬Meet µ𝐒𝐭𝐚𝐜𝐤—an AI-powered platform that democratizes atomistic microscopy simulations! 🚀

💥LLM-driven structure generation → ML-based relaxation → GPU-accelerated simulations

Big thanks to our team🤗 & hackathon organizers! 🙌

Related links in thread👇

#AI #Science #Microscopy #llmagents #hackathon

1 0 1 0

The community desires an LLM-agnostic coding agent, allowing easy model switching without high costs. The current AI coding landscape is fragmented, lacking standardization and creating friction for developers seeking optimal tools. #LLMAgents 6/6

0 0 0 0
Post image

🚀 CrewAI just dropped function‑based guardrails, letting LLM agents obey rule‑based constraints right from the prompt. Curious how this shapes future AI workflows? Dive into the Analytics Vidhya breakdown now! #CrewAI #FunctionGuardrails #LLMAgents

🔗 aidailypost.com/news/crewai-...

0 0 0 0
¡ALERTA CIBERSEGURIDAD! Ataque oculto explota compresión de prompts en agentes LLM
¡ALERTA CIBERSEGURIDAD! Ataque oculto explota compresión de prompts en agentes LLM YouTube video by En la mente de la máquina, Inteligencia Artificial

¡Shock en IA! ⚠️ Nuevo ataque (#CompressionAttack) explota la compresión de prompts en agentes LLM. Logra 98% de manipulación de preferencias. Es furtivo e indetectable. ¡Tu agente local es vulnerable! #Ciberseguridad #LLMAgents #SeguridadIA #Hacking youtu.be/pyeVzMfpkoQ

0 0 0 0
AgentGym-RL: ¡Desbloqueando agentes LLM que superan a los modelos comerciales!
AgentGym-RL: ¡Desbloqueando agentes LLM que superan a los modelos comerciales! YouTube video by En la mente de la máquina, Inteligencia Artificial

Agentes que superan a GPT-4o y Gemini-2.5-Pro. Te presentamos AgentGym-RL, un framework que entrena LLMs con aprendizaje por refuerzo para tomar decisiones complejas y sin SFT. 🤯 ¡El futuro de la IA está aquí!

youtu.be/2jalLx2ZWpE

#AgentGymRL #LLMAgents #RLHF #IA #LLM

1 0 0 0
Video

🚨New preprint: Terrarium-an open source, blackboard-based testbed for studying safety, privacy & security in LLM multi‑agent systems (MAS). We showcase the vulnerabilities and safety considerations of agentic MASs in this modular and configurable framework. 🧵
#AISafety #LLMAgents #Agents

5 0 1 1
LLM agents show anxiety‑induced bias in grocery choices

LLM agents show anxiety‑induced bias in grocery choices

A study of ChatGPT‑5, Gemini 2.5 and Claude 3.5‑Sonnet showed that anxiety‑inducing prompts reduced grocery basket health scores by 0.08–0.13 across $24, $54 and $108 budgets. Read more: getnews.me/llm-agents-show-anxiety-... #llmagents #grocerybias #aiethics

0 0 0 0
Balancing Autonomy and Privacy in Personalized LLM Agents

Balancing Autonomy and Privacy in Personalized LLM Agents

A study of 450 users found that personalization without explicit consent raises privacy concerns and lowers trust, while intermediate autonomy buffers these effects. getnews.me/balancing-autonomy-and-p... #privacy #llmagents #autonomy

1 0 0 0
JEF Hinter Improves LLM Agent Adaptation with Compact Offline Hints

JEF Hinter Improves LLM Agent Adaptation with Compact Offline Hints

The JEF Hinter system, in an arXiv pre‑print (arXiv:2510.04373) from October 2025, boosts LLM agents on MiniWoB++, WorkArena‑L1 and WebArena‑Lite. Read more: getnews.me/jef-hinter-improves-llm-... #jefhinter #llmagents #webbenchmarks

0 0 0 0
AgentHub Proposal Sets Agenda for Sharing LLM-Based Agents

AgentHub Proposal Sets Agenda for Sharing LLM-Based Agents

AgentHub proposes a registry for LLM‑based agents, emphasizing standardized metadata, security scoring, and versioned releases. The agenda is detailed in a preprint (arXiv:2510.03495). Read more: getnews.me/agenthub-proposal-sets-a... #agenthub #llmagents

0 0 0 0
Hierarchical Preference Learning Improves Long‑Horizon LLM Agents

Hierarchical Preference Learning Improves Long‑Horizon LLM Agents

Hierarchical Preference Learning adds a group‑level objective between trajectory‑ and step‑level DPO, using a curriculum that scales to complex sub‑task groups. Read more: getnews.me/hierarchical-preference-... #hierarchicalpreferencelearning #llmagents

0 0 0 0
MarketSenseAI 2.0 Elevates Stock Analysis with LLM Agents

MarketSenseAI 2.0 Elevates Stock Analysis with LLM Agents

MarketSenseAI 2.0 posted a 125.9% cumulative return on S&P 100 stocks from 2023‑2024, outpacing the index’s 73.5% gain. The work was first submitted on 1 February 2025. Read more: getnews.me/marketsenseai-2-0-elevat... #marketsenseai #llmagents

0 0 0 0
Instance-Level Context Learning Boosts LLM Agent Performance

Instance-Level Context Learning Boosts LLM Agent Performance

The new Instance-Level Context Learning (ILCL) framework boosted TextWorld agents’ success rates, raising ReAct from 37% to 95% and IGE from 81% to 95%. getnews.me/instance-level-context-l... #instancelcontext #llmagents #ilcl

0 0 0 0
LLM Agents Automate Data-Driven Engineering Modeling and Analysis

LLM Agents Automate Data-Driven Engineering Modeling and Analysis

LLM agents automate data‑driven engineering modeling, handling cleaning and neural‑network training. In a CHF benchmark of ~25,000 observations, their model beat traditional lookup tables. getnews.me/llm-agents-automate-data... #llmagents #engineering

0 0 0 0
Self-Organizing Multi-Agent LLMs Boost Performance

Self-Organizing Multi-Agent LLMs Boost Performance

SelfOrg builds a DAG using approximate Shapley values to rank LLM agents, delivering notable performance gains for weaker models while matching state‑of‑the‑art results for strong ones. getnews.me/self-organizing-multi-ag... #selforg #llmagents #shapley

0 0 0 0
ACON: Optimizing Context Compression for Long‑Horizon LLM Agents

ACON: Optimizing Context Compression for Long‑Horizon LLM Agents

ACON trims long‑term LLM agent context, cutting peak token usage by up to 54% while keeping accuracy within 95% of the uncompressed baseline. Distilled versions retain over 95% accuracy. Read more: getnews.me/acon-optimizing-context-... #acon #llmagents

0 0 0 0
LLM‑Agent Survey Shows Growing Role in Data Analysis

LLM‑Agent Survey Shows Growing Role in Data Analysis

The survey outlines five design goals for LLM-agent data analysis and categorizes advances across four data modalities: structured, semi-structured, unstructured and heterogeneous data. Read more: getnews.me/llm-agent-survey-shows-g... #llmagents #dataanalysis #ai

1 0 0 0
GA-Rollback Framework Boosts Decision Making in Large Language Model Agents

GA-Rollback Framework Boosts Decision Making in Large Language Model Agents

GA‑Rollback, presented at EMNLP 2025, adds a verification assistant that can backtrack errors; tests on three benchmarks show it beats strong baselines. Read more: getnews.me/ga-rollback-framework-bo... #garollback #emnlp2025 #llmagents

0 0 0 0
Self‑Imitation and Progressive Exploration Enhance Agentic RL

Self‑Imitation and Progressive Exploration Enhance Agentic RL

Researchers unveiled SPEAR, a curriculum‑driven self‑imitation method that begins with high policy entropy and later reduces it to stabilize long‑horizon RL for LLM agents. (Sept 2025) getnews.me/self-imitation-and-progr... #spear #llmagents

0 0 0 0