3/3
What stuck with me most: follow your curiosity. Whatever you build becomes the foundation for the next thing.
simonwillison.net/guides/agent...
#AgenticEngineering #AI #SoftwareEngineering #TDD #CodingWithAI #LLMAgents #BuildInPublic #DevProductivity #AITools #CuriousBuilder
🛡️ Diseño de agentes de IA para resistir la inyección de prompts
Cómo ChatGPT se defiende de ataques de ingeniería social e inyección de prompts.
openai.com/index/designing-agents-t...
#AISecurity #PromptInjection #LLMAgents #RoxsRoss
Datalayer is writing a paper on cost-effective LLM agents (70–92% token reduction via codemode).
Seeking co-authors with real-world agent data.Read & apply → datalayer.ai/research/con...
#LLMAgents #AIEngineering #OpenScience
📰 New Defense Thwarts Attacks on AI Language Agents
ICON, a new defense mechanism, effectively neutralizes Indirect Prompt Injection (IPI) attacks on Large Language Model (LLM) a...
www.clawnews.ai/new-defense-thwarts-atta...
#AISecurity #LLMAgents #PromptInjection
Vercel's agent evals show `AGENTS.md` (compressed docs) beats sophisticated 'skills'. 🤔 This sparks debate on LLM context management: why did simple win over complex? It challenges assumptions about how LLMs best use provided info. #LLMAgents 1/6
Building an agent is like hiring and training a new investigator. You provide them with the right expertise- Model, give them access to the filing cabinet and phone-the Tools, and hand them a standard operating procedure- the Instructions.
#GenerativeAI #LLMAgents #OpenAI #AIAutomation
RLMs vs. RAG: A key difference is "agency." While RAG relies on external retrieval, RLMs empower the LLM itself to autonomously retrieve and decide what information to use, increasing its role in decision-making processes. #LLMagents 3/6
Agent-R1 frames RL for agentic LLMs (extended MDP) and ships a modular end-to-end training stack. On multi-hop QA, RL beats RAG/base tool calling with notable gains. Code (MIT) inside. Paper: arxiv.org/abs/2511.14460 #LLMAgents #ReinforcementLearning #NLP
SAGE: an RL framework that teaches LLM agents to create & reuse executable skills via Sequential Rollout + Skill-integrated Reward. On AppWorld it boosts SGC and slashes tokens vs GRPO. Paper: arxiv.org/abs/2512.17102 #ReinforcementLearning #LLMAgents #SkillLibrary
LaMer brings meta-RL to LLM agents: cross-episode credit + in-context reflection = stronger exploration, better pass@3 & OOD generalization across Sokoban, Minesweeper, Webshop, ALFWorld. Paper: arxiv.org/abs/2512.16848 #MetaRL #LLMAgents #ReinforcementLearning
DeepCode turns papers into production-grade repos via blueprint distillation, code memory, RAG, and closed-loop fixes—posting SOTA on PaperBench and even topping PhD experts on a 3-paper subset. Paper: arxiv.org/abs/2512.07921 #AI #SoftwareEngineering #LLMAgents
µ𝐒𝐭𝐚𝐜𝐤 interactive CLI
🤖⚡🖥️ From 𝙇𝙇𝙈-𝙥𝙤𝙬𝙚𝙧𝙚𝙙 𝙨𝙩𝙧𝙪𝙘𝙩𝙪𝙧𝙚 𝙜𝙚𝙣𝙚𝙧𝙖𝙩𝙞𝙤𝙣 𝙩𝙝𝙧𝙤𝙪𝙜𝙝 𝙈𝙇-𝙗𝙖𝙨𝙚𝙙 𝙧𝙚𝙡𝙖𝙭𝙖𝙩𝙞𝙤𝙣 𝙩𝙤 𝙝𝙞𝙜𝙝-𝙛𝙞𝙙𝙚𝙡𝙞𝙩𝙮 𝙢𝙞𝙘𝙧𝙤𝙨𝙘𝙤𝙥𝙮 𝙨𝙞𝙢𝙪𝙡𝙖𝙩𝙞𝙤𝙣𝙨—µStack orchestrates the complete pipeline with ease. 𝚆𝚑𝚊𝚝 𝚖𝚊𝚔𝚎𝚜 𝚒𝚝 𝚜𝚙𝚎𝚌𝚒𝚊𝚕: ✨ • Multi-technique microscopy support (𝗦𝗧𝗠, 𝗜𝗘𝗧𝗦, 𝗧𝗘𝗠, 𝗔𝗙𝗠) with GPU acceleration 🎯 • Intelligent session management for seamless structure reuse 🔄 • Natural language query interface paired with Materials Project database 📊 • Both CLI and interactive web interface with real-time progress tracking 🖥️ 𝚄𝚗𝚍𝚎𝚛 𝚝𝚑𝚎 𝚑𝚘𝚘𝚍: 🛠️ - LangGraph for multi-agent orchestration - MACE-MP/UMA universal ML potential for rapid structure relaxation using TorchSim - GPAW DFT, abTEM, and ppafm for physics-accurate simulations - FastAPI + React frontend
🔬Meet µ𝐒𝐭𝐚𝐜𝐤—an AI-powered platform that democratizes atomistic microscopy simulations! 🚀
💥LLM-driven structure generation → ML-based relaxation → GPU-accelerated simulations
Big thanks to our team🤗 & hackathon organizers! 🙌
Related links in thread👇
#AI #Science #Microscopy #llmagents #hackathon
The community desires an LLM-agnostic coding agent, allowing easy model switching without high costs. The current AI coding landscape is fragmented, lacking standardization and creating friction for developers seeking optimal tools. #LLMAgents 6/6
🚀 CrewAI just dropped function‑based guardrails, letting LLM agents obey rule‑based constraints right from the prompt. Curious how this shapes future AI workflows? Dive into the Analytics Vidhya breakdown now! #CrewAI #FunctionGuardrails #LLMAgents
🔗 aidailypost.com/news/crewai-...
¡Shock en IA! ⚠️ Nuevo ataque (#CompressionAttack) explota la compresión de prompts en agentes LLM. Logra 98% de manipulación de preferencias. Es furtivo e indetectable. ¡Tu agente local es vulnerable! #Ciberseguridad #LLMAgents #SeguridadIA #Hacking youtu.be/pyeVzMfpkoQ
Agentes que superan a GPT-4o y Gemini-2.5-Pro. Te presentamos AgentGym-RL, un framework que entrena LLMs con aprendizaje por refuerzo para tomar decisiones complejas y sin SFT. 🤯 ¡El futuro de la IA está aquí!
youtu.be/2jalLx2ZWpE
#AgentGymRL #LLMAgents #RLHF #IA #LLM
🚨New preprint: Terrarium-an open source, blackboard-based testbed for studying safety, privacy & security in LLM multi‑agent systems (MAS). We showcase the vulnerabilities and safety considerations of agentic MASs in this modular and configurable framework. 🧵
#AISafety #LLMAgents #Agents
LLM agents show anxiety‑induced bias in grocery choices
A study of ChatGPT‑5, Gemini 2.5 and Claude 3.5‑Sonnet showed that anxiety‑inducing prompts reduced grocery basket health scores by 0.08–0.13 across $24, $54 and $108 budgets. Read more: getnews.me/llm-agents-show-anxiety-... #llmagents #grocerybias #aiethics
Balancing Autonomy and Privacy in Personalized LLM Agents
A study of 450 users found that personalization without explicit consent raises privacy concerns and lowers trust, while intermediate autonomy buffers these effects. getnews.me/balancing-autonomy-and-p... #privacy #llmagents #autonomy
JEF Hinter Improves LLM Agent Adaptation with Compact Offline Hints
The JEF Hinter system, in an arXiv pre‑print (arXiv:2510.04373) from October 2025, boosts LLM agents on MiniWoB++, WorkArena‑L1 and WebArena‑Lite. Read more: getnews.me/jef-hinter-improves-llm-... #jefhinter #llmagents #webbenchmarks
AgentHub Proposal Sets Agenda for Sharing LLM-Based Agents
AgentHub proposes a registry for LLM‑based agents, emphasizing standardized metadata, security scoring, and versioned releases. The agenda is detailed in a preprint (arXiv:2510.03495). Read more: getnews.me/agenthub-proposal-sets-a... #agenthub #llmagents
Hierarchical Preference Learning Improves Long‑Horizon LLM Agents
Hierarchical Preference Learning adds a group‑level objective between trajectory‑ and step‑level DPO, using a curriculum that scales to complex sub‑task groups. Read more: getnews.me/hierarchical-preference-... #hierarchicalpreferencelearning #llmagents
MarketSenseAI 2.0 Elevates Stock Analysis with LLM Agents
MarketSenseAI 2.0 posted a 125.9% cumulative return on S&P 100 stocks from 2023‑2024, outpacing the index’s 73.5% gain. The work was first submitted on 1 February 2025. Read more: getnews.me/marketsenseai-2-0-elevat... #marketsenseai #llmagents
Instance-Level Context Learning Boosts LLM Agent Performance
The new Instance-Level Context Learning (ILCL) framework boosted TextWorld agents’ success rates, raising ReAct from 37% to 95% and IGE from 81% to 95%. getnews.me/instance-level-context-l... #instancelcontext #llmagents #ilcl
LLM Agents Automate Data-Driven Engineering Modeling and Analysis
LLM agents automate data‑driven engineering modeling, handling cleaning and neural‑network training. In a CHF benchmark of ~25,000 observations, their model beat traditional lookup tables. getnews.me/llm-agents-automate-data... #llmagents #engineering
Self-Organizing Multi-Agent LLMs Boost Performance
SelfOrg builds a DAG using approximate Shapley values to rank LLM agents, delivering notable performance gains for weaker models while matching state‑of‑the‑art results for strong ones. getnews.me/self-organizing-multi-ag... #selforg #llmagents #shapley
ACON: Optimizing Context Compression for Long‑Horizon LLM Agents
ACON trims long‑term LLM agent context, cutting peak token usage by up to 54% while keeping accuracy within 95% of the uncompressed baseline. Distilled versions retain over 95% accuracy. Read more: getnews.me/acon-optimizing-context-... #acon #llmagents
LLM‑Agent Survey Shows Growing Role in Data Analysis
The survey outlines five design goals for LLM-agent data analysis and categorizes advances across four data modalities: structured, semi-structured, unstructured and heterogeneous data. Read more: getnews.me/llm-agent-survey-shows-g... #llmagents #dataanalysis #ai
GA-Rollback Framework Boosts Decision Making in Large Language Model Agents
GA‑Rollback, presented at EMNLP 2025, adds a verification assistant that can backtrack errors; tests on three benchmarks show it beats strong baselines. Read more: getnews.me/ga-rollback-framework-bo... #garollback #emnlp2025 #llmagents
Self‑Imitation and Progressive Exploration Enhance Agentic RL
Researchers unveiled SPEAR, a curriculum‑driven self‑imitation method that begins with high policy entropy and later reduces it to stabilize long‑horizon RL for LLM agents. (Sept 2025) getnews.me/self-imitation-and-progr... #spear #llmagents