Advertisement · 728 × 90

Posts by Andrew Gross

Eh, looks like its possible

2 weeks ago 0 0 0 0

Maybe its time to write my own Slack (no clue if they even support that) bsky.app/profile/simo...

2 weeks ago 0 0 1 0

Somehow slack is using more memory for text only office chat, than an entire docker container.

2 weeks ago 3 0 1 0

Would feel nice to be able to have an agent run off to do some investigation if I knew that it didn't have perms to mess things up too badly. It doesn't solve the lethal trifecta, but can help. Definitely needs to be self service and easy to use. Maybe part of MCP/skills and harnesses?

2 weeks ago 0 0 0 0

With all the agent stuff heating up one thing that would be really nice is to be able to create sessions with dynamically downscoped creds. Ex: instead of my full sql perms, the session just has "read only for X,Y,Z and write to A,B,C". They key being that I can generate any subset of my perms

2 weeks ago 0 0 1 0

I'll believe we have AGI when companies start shipping native apps instead of Electron everywhere.

3 weeks ago 0 0 0 0

Im afraid to ask an LLM this due to sycophancy concerns. "Thats a great observation! etc", especially if they dont know. I suppose I could build up how TPUs handle MoE weights now and then pose it.

3 weeks ago 0 0 0 0
Advertisement

In TPUs, my impression is that weights are "stationary" and data is (systolically) moved over them, while a GPU has weights and data brought together, combined, and flushed. In a world of large MoE models, do TPUs have a disadvantage because they will need to move around more weights than before?

3 weeks ago 0 0 1 0
A series of Agentic Read calls to the users home dir except one in the middle says anthropic instead of andrew

A series of Agentic Read calls to the users home dir except one in the middle says anthropic instead of andrew

Everyone one and a while you get to see the outside of the sampling curve on LLMs

3 weeks ago 1 0 0 0

I definitely think it is interesting and has some cool outputs for this domain. I am curious how they will be able to handle cases where there are multiple values to optimize for. Here it seems purely around speed. Obviously, in the real world there are more goals, and harder to balance.

1 month ago 1 0 0 0

Context Poisoning issues are unsolved in this domain.

1 month ago 0 0 1 0

I have some great success building Skills w/ scripts around things like JIRA where I don't need the whole API, just a few basic functions. Plus, you can make the scripts super easy to use for the agent, doing things like auto translating user names to IDs without putting the burden on the agent

1 month ago 0 0 0 0

The combination of Skills w/ the ease of creating your own "bespoke" software feels a lot better than MCP servers (or even using most skills from other folks). Seems way more effective to create some tight minimal skills/scripts for things than plug in a whole MCP api surface.

1 month ago 0 0 1 0

There have been times that I was using an agent to add features to my own OS library, where it referred me to use... my own open source library.

1 month ago 2 0 0 0
Advertisement

I can see companies wanting to expose a way for your agent to hook into their systems/data, but with a better interface than just some query APIs. They would have their own prompting, skills, agents etc, but would be "prompted" by your local agent. Maybe the A2A protocol has something like this.

1 month ago 0 0 0 0

Has anyone seen support for things like "remote skills" for Claude Code or similar? Its a bit different from MCP in that its not just calling a regular API endpoint. Im thinking something closer to how the Web Search tool works, where its really a mini remote agentic system returning results.

1 month ago 0 0 1 0
Auto memory and MEMORY.md in Claude Code
Auto memory and MEMORY.md in Claude Code YouTube video by Adam Hennings

Looks like at least 10 days ago www.youtube.com/watch?v=l3O4...

1 month ago 1 0 0 0
Preview
Manage Claude's memory - Claude Code Docs Learn how to manage Claude Code's memory across sessions with different memory locations and best practices.

code.claude.com/docs/en/memo...

1 month ago 0 0 1 0
Picture of the auto memory description text from https://code.claude.com/docs/en/memory

Picture of the auto memory description text from https://code.claude.com/docs/en/memory

Picture of a claude code session where the agent decides to write to the MEMORY.md file for the project.

Picture of a claude code session where the agent decides to write to the MEMORY.md file for the project.

When did Claude Code auto memory start getting rolled out, cool as hell

1 month ago 0 0 1 0
Preview
SWE-rebench Leaderboard SWE-rebench: A Continuously Evolving and Decontaminated Benchmark for Software Engineering LLMs.

Have you been following model performance on SWE-Rebench to attempt to identify contamination? swe-rebench.com

1 month ago 3 0 1 0

When agents have metrics, output tests, and input data, they can iterate like crazy. Being able to generate this loop this approach for any problem would be a huge timesaver and remove a lot of mental effort.

2 months ago 1 0 0 0

For the next big jump for coding agents, I think the frontier labs are going to spend a lot of effort on making them good and creating their own problem specific harness for iterating.

2 months ago 0 0 1 0

I want to make the concept of cells, output, and state in an ipykernel legible to them as well for the same reason. Much easier to collaborate when the end goal isnt just a top to bottom script every time.

2 months ago 1 0 0 0
Advertisement

I have a sneaking suspicion time travel debuggers are going to come back in to vogue once they are made legible to agents.

2 months ago 1 0 1 0

Maybe the harness needs to start with setting a target iteration speed and using it as an optimization metric.

2 months ago 0 0 0 0

Getting agents to run experiments where you need to trade off feature optimization with running time optimization is tough. Agents don't experience linear time and don't have a "feeling" that they need to spend time optimizing to iterate faster.

2 months ago 0 0 1 0

There's no reason the "Movie" Toothbrushing song on the Yoto has to go that hard.

2 months ago 2 0 0 0

One of those cases where it would have been astounding to me if someone hadn't already investigated these problems deeply, I just didn't know how to find it.

2 months ago 0 0 0 0

I was doing a bit of work with taking disjoin subgraphs and wanting to separate some of the larger ones based on overlapping connections and got introduced to the Louvain method and bridging.

2 months ago 0 0 1 0

LLMs can be great for those cases (in coding at least) where you assume there is a body of work around a problem, but you don't know the terminology to find it. In the past you just had to Google and hope, or ask a coworker. Still possible to fail with LLMs but can be easier.

2 months ago 0 0 1 0