Advertisement Β· 728 Γ— 90

Posts by Mikyo

Logs are for for coding agent's eyes. Connect your agents to infra if you want them to be effective.

Surprised Claude hasn't added a debug mode like Cursor. I think it should be a first class citizen.

4 weeks ago 3 0 1 0
Preview
Automating Evals With Claude Code + Phoenix As AI agents like Claude Code write more of our code, they need the same visibility into system behavior that human developers have relied on. Observability data: traces, spans, latency and errors are no longer just for dashboards and human eyes. This hands-on session covers how to give Claude Code direct access to your Phoenix observability data via the CLI.

RSVP just opened up for our workshop with @mikeldking + Hamel: maven.com/p/2c8410/au...

Learn how to:
⚑ Connect Claude Code to Phoenix observability data
⚑Use CLI commands to fetch traces and debug agents
⚑Prompt AI to analyze system behavior in real-time

2 months ago 1 1 0 0
Video

Phoenix 13.0

Phoenix 13 is a major release centered around Dataset Evaluators, a new system that turns your datasets into reusable evaluation suites. This release also introduces custom model providers, OpenAI Responses API support, and dozens of Playground and experiment UX improvements.

2 months ago 1 1 1 0
Post image

Phoenix Evals now supports message-based LLM-as-a-judge promptsβ€” an upgrade that aligns evals with how modern models actually expect instructions.

πŸ§΅πŸ‘‡

4 months ago 1 1 1 0

New Evals for TypeScript agent builders πŸ”₯

With Mastra now integrating directly with Phoenix, you can trace your TypeScript agents with almost zero friction.

And now… you can evaluate them too: directly from TypeScript using Phoenix Evals.

5 months ago 4 1 1 0

This is why going forward all AI features I help build will be natively instrumented with #OTEL. The telemetry data is the "fossil fuel" that feeds understanding and future improvement. AI cannot be treated as a black-box. It has to be inspected and understood.

9 months ago 0 0 0 0

Telemetry while testing and developing has been critical for me. It let's me hook into and inspect how systems like Vercel's AI SDK and LiteLLM work under the hood and figure out what prompts are being used for judgement.

9 months ago 0 0 1 0

Take evals. You might pick an eval and trust that it works. But this would be a mistake. It's rare that these evals will work for you across the board. Previously it would have been crazy to enable telemetry during testing. But with evals, you are going to want to inspect how your tests "operate".

9 months ago 0 0 1 0
Advertisement
Video

Tracing and telemetry traditionally has been an operational requirement, not a development one. But I've found that with AI applications this fundamentally changes.

9 months ago 1 0 1 0
Video

🐳
@arize.bsky.social OSS Prompt Playground
@arize-phoenix.bsky.social gets Deepseek support! Now you can compare outputs of all the top tier reasoning models.

Which LLM provider would you like to see next? Let us know on GitHub!

github.com/Arize-ai/pho...

10 months ago 0 1 0 0
Post image

πŸ‘¨β€πŸ³ @arize-phoenix.bsky.social continues to cook

Announcing OpenInference instrumentation for Agno, Mastra, Bedrock Agents, and AutoGen AgentChat!

At @arize.bsky.social we believe observability deserves to be built in the open

s/o @anthonypowell.me and many others

github.com/Arize-ai/ope...

11 months ago 1 0 0 0
Post image Post image Post image

πŸ§ͺ πŸ“Š The @arize-phoenix.bsky.social TS/JS client now supports Experiments and Datasets!

You can now create datasets, run experiments, and attach evaluations to experiments using the Phoenix TS/JS client.

Shoutout to @anthonypowell.me and @mikeldking.bsky.social for the work here!

11 months ago 3 2 0 0
Post image

@arizeai/phoenix-client@1.3.0 -
@arize-phoenix.bsky.social javascript client gets experiments πŸ§ͺ

s/o @anthonypowell.me !

- native tracing of ai tasks and evaluators,
- async concurrency queues
- support for any evaluator (e.g. bring your own evals) and more!

11 months ago 2 1 0 0
Client Challenge

OpenTelemetry instrumentation for Agno is published! Huge s/o to Dirk Brand.

A true testament that AI observability should be built in the open πŸ‘

@arize-phoenix.bsky.social

pypi.org/project/open...

11 months ago 0 0 0 0
annotating an llm call

annotating an llm call

πŸ“Annotation Configs in @arize-phoenix.bsky.social

Part of the "Look at the Data" initiative, create custom rubrics and forms to annotate your spans.

s/o to @anthonypowell.me here who built out all the rich UI features.

11 months ago 1 1 0 0
Video

9⃣ @arize-phoenix.bsky.social is gonna turn 9 today.

Project Retention Policies

Customize the data retention of your projects by number of days or by trace count. No more cron jobs or manual deleting of traces needed!

A much requested ask from our on-prem users and phoenix-cloud users alike.

11 months ago 0 0 0 0
Advertisement
Post image

Learn to prompt better

11 months ago 6 5 0 0
A speaker announcement card showing that Ben McHone is going to be presenting at Arize: Observe 2025 on June 25th, 2025.

A speaker announcement card showing that Ben McHone is going to be presenting at Arize: Observe 2025 on June 25th, 2025.

I'll be speaking at Arize:Observe at SHACK15 on June 25! Looking forward to exploring what’s next for AI agents & assistants. More details on my session to come. @arize.bsky.social

arize.com/observe-2025

1 year ago 3 2 0 0

I still own plenty of pencils but no erasers. What does that say about me?

11 months ago 0 0 0 0
Post image

Just dropped a tutorial on using the OpenAI Agents SDK + @arize-phoenix.bsky.social to go from building to evaluating agents.

βœ”οΈ Trace agent decisions at every step
βœ”οΈ Offline and Online Evals using LLM as a Judge

If you're building agents, measuring them is essential.

Full vid and cookbook below

1 year ago 4 3 1 0
Text reads: Building AI? Demo your app. Arize:Observe community demos. Submit by 4.30.25. Apply.

Text reads: Building AI? Demo your app. Arize:Observe community demos. Submit by 4.30.25. Apply.

Demo your app at this year's Observe! Fill out a short application by 4.30 to be considered for our Demo Den. Great opportunity to showcase your work to the AI community in SF.

Apply here: docs.google.com/forms/d/e/1F...

1 year ago 2 2 0 0

"The purpose of abstraction is not to be vague, but to create a new semantic level in which one can be absolutely precise." - Edsger W. Dijkstra. Just read this and I am going to be using it a LOT.

1 year ago 80 11 3 1
Video

In case you missed it, Arize AI Phoenix crossed the 5k GitHub star mark last week! ⭐️

Phoenix has changed a TON since its first iteration.

I'm constantly in awe of the execution speed and quality of this team. Here's to the next 5k and beyond!

1 year ago 4 2 0 0

Love the community we're building!

1 year ago 0 0 0 0
Preview
LLM Evals Office Hours with Arize Β· Luma Join us for an open coworking session focused on LLM and Agent Evaluations! Whether you're actively working on evaluation strategies or just exploring the…

For all my NYC friends! πŸ—½πŸŽ

We're hosting an in-person office hours tomorrow all around LLM and Agent Evals.

Join for the free snacks/drinks, stay for the heated discussions about the validity of Pokemon-based model evaluations βš‘οΈπŸ€

1 year ago 3 3 0 0

How much more data does an LLM app really need?

In my latest tutorial, I explore how few-shot prompting boosts accuracy without massive datasets or retrainingβ€”using @arize-phoenix.bsky.social prompts and experiments to break it down.

This kicks off my prompting series... more to come!

1 year ago 7 3 1 0
Advertisement
Video

πŸ€– OpenAI 's agent framework openai-agents provides a rich set of composable primitives that enable you to build agents.

openinference-instrumentation-openai-agents, an OpenTelememetry instrumentor that is compatible with any OTel backend like @arize-phoenix.bsky.social. Fully OSS and free to use!

1 year ago 0 0 0 0
Post image

How can you programmatically improve your prompts? πŸ€” πŸ€–

Forget manual prompt engineering - there are better (read: "more automatic") ways to improve your prompts.

This video and notebook break down these techniques.

Featuring:
- DSPy
- @arize-phoenix.bsky.social

1 year ago 5 2 1 0
Preview
Prompt Management from First Principles In Phoenix 8.0, we built a prompt management system to ensure reproducibility and empower developers with better testing and control.

Learn how we built a holistic prompt management system that preserves developer freedom.

With Phoenix 8.0, we built a prompt management system that prioritizes: LLM reproducibility, prompt versioning & tracking, & developer flexibilityβ€”no vendor lock-in

arize.com/blog/prompt-...

1 year ago 4 4 0 0

AI is all about vibes lately

1 year ago 0 0 0 0