Derek Abdine (@dabdine) Bsky

Fair enough. Way more conspiracy theories in X these days than there used to be though. Also had about 11 bots follow me yesterday after a single tweet. One created a meme coin for my startup. Maybe dead internet theory is actually real….

1 year ago 1 0 1 0

Perfectly describes the current state of Xhitter

1 year ago 1 0 1 0

Turns out I was right on the nose. I didn’t see his talk (even posted this hours before I saw it popped up), but it’s not hard to see based on the rate of enhancements over the past 24 months.

1 year ago 0 0 0 0

They’re waiting for you, Gordon. In the tessssssst chamberrrrrr.

1 year ago 2 0 1 0

I keep forgetting they’re still doing this

1 year ago 1 0 0 0

Manual. YMMV with prepared stuff like AutoGPT, but base LLMs at a fundamental level are just token emitters, so you have to string them together with other stuff to make them useful. Like a brain without a body.

1 year ago 1 0 1 0

Another fun thought: I could give furl an agent that knows how furl itself is designed, its code framework, etc., and make it self-generate new agents and tools in case it can’t accomplish a task itself. Even an agent/tool to (re)train its own model.

1 year ago 1 0 1 0

- Anthropic released a computer use model which seems like it would rely on tools combined with image processing (which has already existed).

To name a few. In other words, innovation seems to be on price per token and specific application now rather than on overall accuracy of base models.

1 year ago 1 0 0 0

It seems like there’s credence to the idea that LLMs are at a point where we will see less significant gains on base models alone:

- Amazon’s stats at re:invent were underwhelming compared to most existing models.
- OpenAI’s o1 appears to just be an agent arch applying a critic.

1 year ago 1 0 1 1

AI layer to research details about software like vendor website, docs, etc that a human could do but would take forever. Useful for remediation to have all the details about a particular software / package / whatever available when deciding what to do.

1 year ago 1 0 0 0

This setup is used as the backing AI to furl.ai’s autonomous patching. We expose it all as a REST API internally to our other services which rely on our AI layer to gen the scripts/instrictuons/research details on software for us (software inventory info databases suck so we also use our 1/2

1 year ago 0 0 1 0

For executing scripts we basically just boot a clean macOS / windows / Linux (rhel, Ubuntu) host and ship the script, execute and return stdout/stderr. Lots of ways to do that (some cheaper than others). 2/2

1 year ago 1 0 0 0

Nope, those tools were built by us in-house. You can use scraperapi or other headless browser scraping services for content extraction (note: this is a slightly dumb way to do it, there are more intelligent ways to extract text from websites). 1/2

1 year ago 1 0 1 0

to use with the web_scrape tool. If we find that it isn't doing that well enough, we can make a google_search agent (agents have a system prompt, samples, own model, etc that tools don't have. Tools are just functions.) that is specialized for this task. 5/5

1 year ago 0 0 1 0

The research_from_internet tool actually calls our "internet_researcher" agent, which itself has web_scrape and search_google tools. The former will use services to extract text from rendered websites, the latter will use Google's customsearch api. internet_researcher must also gen search terms 4/5

1 year ago 0 0 1 0

For example, the "upgrade_script_developer" agent uses OpenAI's base gpt-4o model, but itself knows about two tools: execute_script_on_runner and research_from_internet. The execute_script_on_runner tool runs a script that is generated by the LLM on a host and simply returns the response. 3/5

1 year ago 0 0 1 0

with it's own system prompt and tool knowledge. Each agent can be configured to use its own model if we want (but don't do right now). When we build out a new agent, we can make the agent use other agents to achieve its goal.
2/5

1 year ago 0 0 1 0

We use OpenAI's base models with RAG (later, fine tuned) essentially. So, in this case gpt-4o. Our "cognition" framework (which follows the NVIDIA blog post) contains agents and tools. Agents know about tools. Agents can be tools themselves. So basically each agent is the specialist 1/5

1 year ago 0 0 1 0

Right now we just use OpenAI, though our design allows us to plug any LLM in (we have support for Gemini, Azure OpenAI, Grok, and Anthropic). Only very few support tool calls. For those that do, I still haven’t seen accuracy or reliability as high as OpenAI. Tool calls can be added to any LLM tho.

1 year ago 1 0 1 0

Introduction to LLM Agents | NVIDIA Technical Blog Consider a large language model (LLM) application that is designed to help financial analysts answer questions about the performance of a company. With a well-designed retrieval augmented generation…

More or less implement the components here, though the agent graph is not detailed:

developer.nvidia.com/blog/introdu...

1 year ago 1 0 0 0

Haven’t written a guide, but open to doing that. LangGraph may be the closest framework to what we’ve built.

Most of what we have now is the culmination of trial & error + arxiv papers + blog posts + security/scanning backgrounds + some major conceptual contributions from our former chief of ai

1 year ago 1 0 2 0

Definitely is. I’ve found accuracy improves greatly as you add more “specialists” that work in concert with each other (ie a true multi agent architecture), not just tools and not just prompt engineering. Accuracy scales fairly well and much faster than with prompt tweaks alone.

1 year ago 1 0 1 0

Dunno. I’ve built one that uses agents to reason through creating upgrade scripts that work by giving it access to search google, scrape content from websites, and execute stuff in a sandbox. If it fails itll correct itself and try again. Knowing when to stop is key tho not hard for narrow use cases

1 year ago 1 0 1 0

Yep. Basically run the original request and response through a “critic” which attempts to refute hallucinated bullshit. LLMs are pretty damn good at text extraction, so you are sort of leaning on that to provide some level of error correction.

1 year ago 1 0 1 0

Scaling up the Prime Video audio/video monitoring service and reducing costs by 90% The move from a distributed microservices architecture to a monolith application helped achieve higher scale, resilience, and reduce costs.

Relevant:

www.primevideotech.com/video-streaming/scaling-...

2 years ago 1 0 0 0

Good to be on BS (are we calling it that)? Guess I should update my profile image...

2 years ago 2 0 0 0

Posts by Derek Abdine