#LLMagents hashtag - Bluesky

Agent-R1: Training Powerful LLM Agents with End-to-End Reinforcement Learning Large Language Models (LLMs) are increasingly being explored for building Agents capable of active environmental interaction (e.g., via tool use) to solve complex problems. Reinforcement Learning…

3 months ago

Agent-R1 frames RL for agentic LLMs (extended MDP) and ships a modular end-to-end training stack. On multi-hop QA, RL beats RAG/base tool calling with notable gains. Code (MIT) inside. Paper: arxiv.org/abs/2511.14460 #LLMAgents #ReinforcementLearning #NLP

1 0 0 0

Reinforcement Learning for Self-Improving Agent with Skill Library Large Language Model (LLM)-based agents have demonstrated remarkable capabilities in complex reasoning and multi-turn interactions but struggle to continuously improve and adapt when deployed in new…

3 months ago

SAGE: an RL framework that teaches LLM agents to create & reuse executable skills via Sequential Rollout + Skill-integrated Reward. On AppWorld it boosts SGC and slashes tokens vs GRPO. Paper: arxiv.org/abs/2512.17102 #ReinforcementLearning #LLMAgents #SkillLibrary

0 0 0 0

Meta-RL Induces Exploration in Language Agents Reinforcement learning (RL) has enabled the training of large language model (LLM) agents to interact with the environment and to solve multi-turn long-horizon tasks. However, the RL-trained agents…

3 months ago

LaMer brings meta-RL to LLM agents: cross-episode credit + in-context reflection = stronger exploration, better pass@3 & OOD generalization across Sokoban, Minesweeper, Webshop, ALFWorld. Paper: arxiv.org/abs/2512.16848 #MetaRL #LLMAgents #ReinforcementLearning

0 0 0 0

DeepCode: Open Agentic Coding Recent advances in large language models (LLMs) have given rise to powerful coding agents, making it possible for code assistants to evolve into code engineers. However, existing methods still face…

3 months ago

DeepCode turns papers into production-grade repos via blueprint distillation, code memory, RAG, and closed-loop fixes—posting SOTA on PaperBench and even topping PhD experts on a 3-paper subset. Paper: arxiv.org/abs/2512.07921 #AI #SoftwareEngineering #LLMAgents

1 0 0 0

Aritra Roy

@aritraroy24.bsky.social

3 months ago

µ𝐒𝐭𝐚𝐜𝐤 interactive CLI

🤖⚡🖥️ From 𝙇𝙇𝙈-𝙥𝙤𝙬𝙚𝙧𝙚𝙙 𝙨𝙩𝙧𝙪𝙘𝙩𝙪𝙧𝙚 𝙜𝙚𝙣𝙚𝙧𝙖𝙩𝙞𝙤𝙣 𝙩𝙝𝙧𝙤𝙪𝙜𝙝 𝙈𝙇-𝙗𝙖𝙨𝙚𝙙 𝙧𝙚𝙡𝙖𝙭𝙖𝙩𝙞𝙤𝙣 𝙩𝙤 𝙝𝙞𝙜𝙝-𝙛𝙞𝙙𝙚𝙡𝙞𝙩𝙮 𝙢𝙞𝙘𝙧𝙤𝙨𝙘𝙤𝙥𝙮 𝙨𝙞𝙢𝙪𝙡𝙖𝙩𝙞𝙤𝙣𝙨—µStack orchestrates the complete pipeline with ease. 𝚆𝚑𝚊𝚝 𝚖𝚊𝚔𝚎𝚜 𝚒𝚝 𝚜𝚙𝚎𝚌𝚒𝚊𝚕: ✨ • Multi-technique microscopy support (𝗦𝗧𝗠, 𝗜𝗘𝗧𝗦, 𝗧𝗘𝗠, 𝗔𝗙𝗠) with GPU acceleration 🎯 • Intelligent session management for seamless structure reuse 🔄 • Natural language query interface paired with Materials Project database 📊 • Both CLI and interactive web interface with real-time progress tracking 🖥️ 𝚄𝚗𝚍𝚎𝚛 𝚝𝚑𝚎 𝚑𝚘𝚘𝚍: 🛠️ - LangGraph for multi-agent orchestration - MACE-MP/UMA universal ML potential for rapid structure relaxation using TorchSim - GPAW DFT, abTEM, and ppafm for physics-accurate simulations - FastAPI + React frontend

🔬Meet µ𝐒𝐭𝐚𝐜𝐤—an AI-powered platform that democratizes atomistic microscopy simulations! 🚀

💥LLM-driven structure generation → ML-based relaxation → GPU-accelerated simulations

Big thanks to our team🤗 & hackathon organizers! 🙌

Related links in thread👇

#AI #Science #Microscopy #llmagents #hackathon

1 0 1 0

Hacker News Companion

@hncompanion.com

4 months ago

The community desires an LLM-agnostic coding agent, allowing easy model switching without high costs. The current AI coding landscape is fragmented, lacking standardization and creating friction for developers seeking optimal tools. #LLMAgents 6/6

0 0 0 0

AI Daily Post

@aidailypost.com

4 months ago

🚀 CrewAI just dropped function‑based guardrails, letting LLM agents obey rule‑based constraints right from the prompt. Curious how this shapes future AI workflows? Dive into the Analytics Vidhya breakdown now! #CrewAI #FunctionGuardrails #LLMAgents

🔗 aidailypost.com/news/crewai-...

0 0 0 0

En la mente de la máquina

@elmdlm.bsky.social

5 months ago

¡ALERTA CIBERSEGURIDAD! Ataque oculto explota compresión de prompts en agentes LLM YouTube video by En la mente de la máquina, Inteligencia Artificial

¡Shock en IA! ⚠️ Nuevo ataque (#CompressionAttack) explota la compresión de prompts en agentes LLM. Logra 98% de manipulación de preferencias. Es furtivo e indetectable. ¡Tu agente local es vulnerable! #Ciberseguridad #LLMAgents #SeguridadIA #Hacking youtu.be/pyeVzMfpkoQ

0 0 0 0

En la mente de la máquina

@elmdlm.bsky.social

5 months ago

AgentGym-RL: ¡Desbloqueando agentes LLM que superan a los modelos comerciales! YouTube video by En la mente de la máquina, Inteligencia Artificial

Agentes que superan a GPT-4o y Gemini-2.5-Pro. Te presentamos AgentGym-RL, un framework que entrena LLMs con aprendizaje por refuerzo para tomar decisiones complejas y sin SFT. 🤯 ¡El futuro de la IA está aquí!

youtu.be/2jalLx2ZWpE

#AgentGymRL #LLMAgents #RLHF #IA #LLM

1 0 0 0

Mason Nakamura

@masonnaka.bsky.social

5 months ago

🚨New preprint: Terrarium-an open source, blackboard-based testbed for studying safety, privacy & security in LLM multi‑agent systems (MAS). We showcase the vulnerabilities and safety considerations of agentic MASs in this modular and configurable framework. 🧵
#AISafety #LLMAgents #Agents

5 0 1 1

6 months ago

LLM agents show anxiety‑induced bias in grocery choices

A study of ChatGPT‑5, Gemini 2.5 and Claude 3.5‑Sonnet showed that anxiety‑inducing prompts reduced grocery basket health scores by 0.08–0.13 across $24, $54 and $108 budgets. Read more: getnews.me/llm-agents-show-anxiety-... #llmagents #grocerybias #aiethics

0 0 0 0

6 months ago

Balancing Autonomy and Privacy in Personalized LLM Agents

A study of 450 users found that personalization without explicit consent raises privacy concerns and lowers trust, while intermediate autonomy buffers these effects. getnews.me/balancing-autonomy-and-p... #privacy #llmagents #autonomy

1 0 0 0

6 months ago

JEF Hinter Improves LLM Agent Adaptation with Compact Offline Hints

The JEF Hinter system, in an arXiv pre‑print (arXiv:2510.04373) from October 2025, boosts LLM agents on MiniWoB++, WorkArena‑L1 and WebArena‑Lite. Read more: getnews.me/jef-hinter-improves-llm-... #jefhinter #llmagents #webbenchmarks

0 0 0 0

6 months ago

AgentHub Proposal Sets Agenda for Sharing LLM-Based Agents

AgentHub proposes a registry for LLM‑based agents, emphasizing standardized metadata, security scoring, and versioned releases. The agenda is detailed in a preprint (arXiv:2510.03495). Read more: getnews.me/agenthub-proposal-sets-a... #agenthub #llmagents

0 0 0 0

6 months ago

Hierarchical Preference Learning Improves Long‑Horizon LLM Agents

Hierarchical Preference Learning adds a group‑level objective between trajectory‑ and step‑level DPO, using a curriculum that scales to complex sub‑task groups. Read more: getnews.me/hierarchical-preference-... #hierarchicalpreferencelearning #llmagents

0 0 0 0

6 months ago

MarketSenseAI 2.0 Elevates Stock Analysis with LLM Agents

MarketSenseAI 2.0 posted a 125.9% cumulative return on S&P 100 stocks from 2023‑2024, outpacing the index’s 73.5% gain. The work was first submitted on 1 February 2025. Read more: getnews.me/marketsenseai-2-0-elevat... #marketsenseai #llmagents

0 0 0 0

6 months ago

Instance-Level Context Learning Boosts LLM Agent Performance

The new Instance-Level Context Learning (ILCL) framework boosted TextWorld agents’ success rates, raising ReAct from 37% to 95% and IGE from 81% to 95%. getnews.me/instance-level-context-l... #instancelcontext #llmagents #ilcl

0 0 0 0

6 months ago

LLM Agents Automate Data-Driven Engineering Modeling and Analysis

LLM agents automate data‑driven engineering modeling, handling cleaning and neural‑network training. In a CHF benchmark of ~25,000 observations, their model beat traditional lookup tables. getnews.me/llm-agents-automate-data... #llmagents #engineering

0 0 0 0

6 months ago

Self-Organizing Multi-Agent LLMs Boost Performance

SelfOrg builds a DAG using approximate Shapley values to rank LLM agents, delivering notable performance gains for weaker models while matching state‑of‑the‑art results for strong ones. getnews.me/self-organizing-multi-ag... #selforg #llmagents #shapley

0 0 0 0

6 months ago

ACON: Optimizing Context Compression for Long‑Horizon LLM Agents

ACON trims long‑term LLM agent context, cutting peak token usage by up to 54% while keeping accuracy within 95% of the uncompressed baseline. Distilled versions retain over 95% accuracy. Read more: getnews.me/acon-optimizing-context-... #acon #llmagents

0 0 0 0

6 months ago

LLM‑Agent Survey Shows Growing Role in Data Analysis

The survey outlines five design goals for LLM-agent data analysis and categorizes advances across four data modalities: structured, semi-structured, unstructured and heterogeneous data. Read more: getnews.me/llm-agent-survey-shows-g... #llmagents #dataanalysis #ai

1 0 0 0

6 months ago

GA-Rollback Framework Boosts Decision Making in Large Language Model Agents

GA‑Rollback, presented at EMNLP 2025, adds a verification assistant that can backtrack errors; tests on three benchmarks show it beats strong baselines. Read more: getnews.me/ga-rollback-framework-bo... #garollback #emnlp2025 #llmagents

0 0 0 0