Advertisement · 728 × 90

Posts by Scott Condron

How do I get Bluesky to show me less politics and more AI/ML things? I have followed mostly people who work in AI/ML

1 year ago 3 0 0 0

Maybe they could tell you what they’ve learned like “it seems you’re interested in staying up to date with recommender systems, want to add that to your feed?”

1 year ago 0 0 0 0

Thanks Scott! Very exciting

1 year ago 0 0 0 0

Prompts within a complex system are brittle

I have seen some teams be successful by replacing prompts with smaller, more deterministic components and improved reliability with fine-tuning. Anyone else have success with this approach?

Seems to help a lot with agents

1 year ago 1 0 0 0

I collected some folk knowledge for RL and stuck them in my lecture slides a couple weeks back: web.mit.edu/6.7920/www/l... See Appendix B... sorry, I know, appendix of a lecture slide deck is not the best for discovery. Suggestions very welcome.

1 year ago 114 18 3 3

If you’re taking time to enjoy your family and not building with LLMs, you’re ngmi.
America is cooked

1 year ago 2 0 0 0
Post image Post image Post image

LLM app dev broke our comparison tools because tiny diffs can cause large behaviour change.

At wandb, we've spent years thinking about experiment comparison. We've added new tools for LLM app dev: code, prompts, models, configs, outputs, eval metrics, eval predictions, eval scores..
wandb.me/weave

1 year ago 8 0 1 1
Advertisement

The art of how to refer to model behaviour with tasteful non-person metaphors. Say “stochastic” you’re in one camp, say “emergent” you’re in another.
It’s a minefield out there people

1 year ago 0 0 0 0

People ask for an iOS app but maybe we shouldn’t as it would cause more misery on-the-go

1 year ago 0 0 0 0

Being logged into wandb on your phone is a recipe for misery

1 year ago 74 4 9 0

Would be happy to schedule a chat to hear more about your experience with W&B

1 year ago 0 0 0 0

hey, sorry to hear your complaints about wandb. Have you seen the big response in that issue with options? Tables is built on parquet so it’s difficult from an architectural perspective. With the recent release Weave, there may be a path forward by using the weave backend instead of parquet…

1 year ago 0 0 1 0

Agreed, fellow competitor.

It’s the biggest hurdle I see from teams trying to build GenAI features

We need tools to lower the barrier to entry with LLM judges, existing benchmarks, manual annotation as eval collection, synthetic data… anything else?

1 year ago 1 0 0 0

I think these small models are not for day to day use but instead, they’re for b2c applications of LLMs, where it’s cost/latency prohibitive to use anything else

1 year ago 0 0 0 0
Preview
chore: Add llms.txt by scottire · Pull Request #3045 · wandb/weave Adds a script to generate llms.txt file. Features Generates Docs & Optional sections Links to Github markdown Includes logic to remove certain files Includes generated markdown To generate ll...

- it really works to teach an LLM about your tool, thank you long context!

Link for the curious:
github.com/wandb/weave/...

1 year ago 4 0 0 0

- it's much better for scraping if the links included are .md files
- you need to be clear which files to include and which are optional because context blows up quickly
- automating creating your docs' llms.txt is pretty easy

1 year ago 3 0 1 0
Advertisement

Lessons from creating an llms.txt file
An llms.txt file is a way to tell a LLM about your website. In the .txt file, you include links to other files with info to learn more.
- the llms.txt file isn't the file you send to an LLM, you use it to generate a llms .md file

1 year ago 3 0 1 0
Preview
Creating a LLM-as-a-Judge That Drives Business Results – A step-by-step guide with my learnings from 30+ AI implementations.

hamel.dev/blog/posts/l...

1 year ago 0 0 0 0

from @hamel.bsky.social’s hamel.dev/blog/posts/llm…

We're building LLM / Human "scorers" in @weightsbiases.bsky.social to have the same data model for this reason

1 year ago 2 0 1 0

Your human and LLM judges should follow the same criteria.

Then, you can transition from manual to automated evaluation once you have inter-annotator agreement between LLM & human. You now have a faster iteration speed and the annotator can focus on finding edge cases!

1 year ago 4 0 1 0
Preview
glif - MSPaint (Flux) by fab1an

glif.app/@fab1an/glif...

1 year ago 0 0 0 0
Post image

Put glue on pizza

1 year ago 2 0 1 0
Post image

The most bizarre AI interview I've ever done was at wandb when as usual I asked a candidate to build an AI classifier in any language/framework of their choice..

And they nonchalantly said "I'll write it in Redstone", to which I almost let loose a chuckle until...

1 year ago 2 0 0 0
Post image

Claude defaults to concise responses when there's high demand, clever way to smooth peaks

1 year ago 5 0 0 0

We've been working on just that at @weightsbiases.bsky.social with Weave!

Weave is a lightweight llm tracing and evaluations toolkit, that focuses on letting you iterate fast and make sure that your production LLM based application is not degrading when you change prompts or models!

1 year ago 14 3 4 2