Advertisement · 728 × 90

Posts by arize-phoenix

An OpenTelemetry span tree illustrates ATIF steps of a finance assistant calling a financial search tool and processing results.

An OpenTelemetry span tree illustrates ATIF steps of a finance assistant calling a financial search tool and processing results.

Get visibility into agent benchmarking execution using ATIF

22 minutes ago 0 0 0 0
Four feature descriptions showcase a user interface for managing experiments, including tracking progress, completion, stopping, and resuming tasks.

Four feature descriptions showcase a user interface for managing experiments, including tracking progress, completion, stopping, and resuming tasks.

Experiments kicked off from the UI have all the bells and whistles - live progress, a durable runtime, cancel, and resumption.

19 hours ago 1 0 0 0
Three model experiments display settings, user prompts, and evaluation results for AI support agents' performance.

Three model experiments display settings, user prompts, and evaluation results for AI support agents' performance.

A dashboard displays multiple ExperimentJobs queued and running on various workers, showing progress percentages and completed tasks.

A dashboard displays multiple ExperimentJobs queued and running on various workers, showing progress percentages and completed tasks.

Phoenix 14 introduces Experiment Jobs. Hit Run and Phoenix queues an ExperimentJob per instance. A background daemon fans them out to workers and streams progress back.

19 hours ago 0 0 1 0
Post image

This propagates to the clients as well.

1 day ago 0 0 0 0
Post image

phoenix CLI and REST API now support pulling spans by OTEL attribute values so you can start debugging targeted parts of your agent topology.

1 day ago 0 0 1 0
Video

When debugging agents, you need to get to the problematic issues QUICKLY! Phoenix now comes with quick list-details navigations of conversations with agents as well as resizable drawers so you can quickly navigate between conversations as well. Also sessions now supports VIM bindings for pagination!

1 day ago 1 0 0 0
Preview
@arizeai/phoenix-otel - Phoenix Register OpenTelemetry with Phoenix and use the full OpenInference helper surface from one package

arize.com/docs/phoeni...

1 week ago 0 0 0 0

Also: attribute builders for raw OTel spans, OITracer for redacting sensitive data before export, and full openinference-semantic-conventions re-exported so you never need a second dependency.
No config ceremony. Just wrap and ship.

1 week ago 0 0 1 0

traceTool, traceAgent, traceChain, withSpan for functions. @observe for class methods. Context setters propagate session IDs, user IDs, and metadata to every child span.

1 week ago 0 0 1 0
Advertisement

@arizeai/phoenix-otel 1.0 is out.
One import to trace TypeScript agents. Wrap a function, get a span — inputs, outputs, errors, span kind, all recorded automatically.

1 week ago 2 0 1 0
Preview
Release Notes - Phoenix

Some quality of life improvements

Arize Phoenix now supports python 3.14 across all SDKs and has new REST endpoints including API / Secret rotation APIs.

arize.com/docs/phoeni...

2 weeks ago 1 0 0 0
Video

Guess how many 🌟 Phoenix has on GitHub?

Yes, this meme has been all over our internal slack today.

github.com/Arize-ai/ph...

3 weeks ago 2 0 0 0
Video

We just added streamdown rendering for our LLM outputs now look and perform so well. Particularly loving the bash and code rendering 🤟

4 weeks ago 0 0 0 0
Preview
[Security]: litellm PyPI package (v1.82.7 + v1.82.8) compromised — full timeline and status · Issue #24518 · BerriAI/litellm [LITELLM TEAM UPDATES] Compromised packages have been deleted (v1.82.7, v1.82.8) Compromise came from trivvy security scan dependency All maintainer accounts have been rotated (new maintainer accou...

Full details: github.com/BerriAI/lit... (edited)

4 weeks ago 0 0 0 0

The compromised packages have been pulled from PyPI and the LiteLLM team has rotated maintainer credentials. The situation is still developing, and further lateral movement has been reported. We're monitoring and will update if anything changes for Phoenix users.

4 weeks ago 0 0 1 0
Preview
fix: pin litellm <1.82.7 in requirements files by mikeldking · Pull Request #12347 · Arize-ai/phoenix Summary Pin litellm&amp;lt;1.82.7 in requirements/unit-tests.txt, requirements/type-check.txt, and requirements/packages/phoenix-evals.txt The pyproject.toml files were already pinned; these requi...

LiteLLM is NOT a core Phoenix dep. It's an optional extra for phoenix-evals. We've already pinned it below compromised versions and shipped a fix.

PR: github.com/Arize-ai/ph...

The malicious code runs on Python interpreter startup (no import needed). Docker image users of LiteLLM Proxy unaffected.

4 weeks ago 1 0 1 0

Phoenix users:
→ Check your installed version: pip show litellm
→ If you're on 1.82.7 or 1.82.8, uninstall and reinstall at litellm<1.82.7
→ Rotate any credentials in your environment. The payload targeted env vars and secrets
→ Hold off on upgrading litellm until the maintainers confirm all-clear

4 weeks ago 0 0 1 0

Post 1

PSA: LiteLLM versions 1.82.7 and 1.82.8 on PyPI were compromised with a credential-stealing payload.

Are you using Phoenix's optional LiteLLM extra for phoenix-evals directly or through DSPy, Smolagents, or CrewAI?
Did you install or upgrade to litellm 1.82.7 or 1.82.8 from PyPI?

See: 🧵

4 weeks ago 1 0 1 0
Video

Track exactly what changed in your prompts with diffs!

4 weeks ago 1 0 0 0
Advertisement
Arize Phoenix announces new AI providers, highlighting features like prompt running, model comparison, and output evaluation.

Arize Phoenix announces new AI providers, highlighting features like prompt running, model comparison, and output evaluation.

The interface displays various outputs responding to the question about the meaning of life, offering philosophical perspectives and insights.

The interface displays various outputs responding to the question about the meaning of life, offering philosophical perspectives and insights.

Arize Phoenix 13.11.0 — Adds @perplexity_ai and @togethercompute as built-in providers for benchmarking and evaluation.

1 month ago 0 0 0 0
Preview
Release arize-phoenix: v13.10.0 · Arize-ai/phoenix Highlights 🎉 New First-Class Providers in Playground Phoenix now supports Cerebras, Fireworks AI, Groq, and Moonshot (Kimi) as first-class providers in the playground. All four use OpenAI-compatibl...

For more details, visit the release page: github.com/Arize-ai/ph...

1 month ago 0 0 0 0
Post image

Arize AI Phoenix v13.10 now supports Cerebras, Fireworks AI, Groq, and Moonshot (Kimi), as well as OpenAI's GPT 5.4 models, allowing you to compare hundreds of more models side by side for benchmarking, task evaluation, or LLM judge building.

#AI #LLM #OpenSource #Observability #Evals

1 month ago 2 0 1 0
Preview
Claude Agent SDK (Python) - Phoenix Trace Anthropic's Claude Agent SDK applications in Python with Phoenix

Anthropic's Claude Agent SDK lets you build AI agents that autonomously read files, run commands, search the web, and edit code — the same tools and agent loop that power Claude Code, now programmable in Python and TypeScript.

Docs: arize.com/docs/phoeni...

1 month ago 2 0 0 0
Preview
How to Evaluate Tool-Calling Agents When you give an LLM access to tools, you introduce a new surface area for failure — and it breaks in two distinct ways: The model selects the wrong tool...

Once you separate those two, debugging agent behavior becomes dramatically easier. Full demo/blog from Elizabeth Hutton on how to evaluate tool calling agents: arize.com/blog/how-to...

1 month ago 0 0 0 0

You need to measure two different behaviors: did the agent choose the correct tool, and did it call the tool correctly?

1 month ago 0 0 1 0

From the outside everything looks fine because the right tool is triggered. But a single incorrect argument can make the entire action wrong.

This is why evaluating tool-using agents can’t be reduced to a single score.

1 month ago 0 0 1 0

At first glance that looks contradictory. It isn’t. The agent was consistently choosing the correct tool. The failures were happening after the decision, in how the tool was called. Common examples looked like wrong dates, missing parameters, incorrect values, or schema mismatches.

1 month ago 0 0 1 0
Advertisement
Video

This is the chart that should make every AI engineer pause. In the demo agent we evaluated:

⚪Tool selection: 100%
⚪Matches expected tool calls: 36%

1 month ago 1 0 1 0
Preview
03.05.2026 SDK Session Retrieval - Phoenix Get and list sessions programmatically from Python and TypeScript.

Sessions are how your users actually experience your AI app — not as isolated calls, but as multi-turn conversations.

Now available in the Phoenix SDKs.

arize.com/docs/phoeni...

1 month ago 0 0 0 0
Preview
02.27.2026 Sessions API and CLI Support - Phoenix

New Phoenix release: Sessions API & CLI 🔭

LLM apps are multi-turn — your observability should be too.
Sessions group related traces into a conversation timeline so you can see why a chat went wrong, not just what happened on one step.


arize.com/docs/phoeni...

1 month ago 0 0 0 0