Evaluation Samples: add multi-turn conversation evaluation samples S1-S4 by kseager · Pull Request #47034 · Azure/azure-sdk-for-python

kseager · 2026-05-20T23:55:44Z

Add sample_multiturn_conversation_evaluation.py demonstrating:

Custom data source config with messages/tool_definitions schema
Conversation-level evaluators (customer_satisfaction, task_completion, coherence, groundedness)
Dataset upload and evaluation run with evaluation_level=conversation
Polling for results

Includes sample JSONL with 3 conversations: basic multi-turn, tool-calling, and extended support dialog.

Description

Please add an informative description that covers that changes made by the pull request and link all relevant issues.

If an SDK is being regenerated based on a new API spec, a link to the pull request containing these API spec changes should be included above.

All SDK Contribution checklist:

The pull request does not introduce [breaking changes]
CHANGELOG is updated for new features, bug fixes or other significant changes.
I have read the contribution guidelines.

General Guidelines and Best Practices

Title of the pull request is clear and informative.
There are a small number of commits, each of which have an informative message. This means that previously merged commits do not appear in the history of the PR. For more information on cleaning up the commits in your PR, see this page.

Testing Guidelines

Pull request includes test coverage for the included changes.

Add sample_multiturn_conversation_evaluation.py demonstrating: - Custom data source config with messages/tool_definitions schema - Conversation-level evaluators (customer_satisfaction, task_completion, coherence, groundedness) - Dataset upload and evaluation run with evaluation_level=conversation - Polling for results Includes sample JSONL with 3 conversations: basic multi-turn, tool-calling, and extended support dialog. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- S2: sample_multiturn_trace_evaluation_by_id.py Evaluate traces by conversation_id or trace_id - S3: sample_multiturn_trace_evaluation_agent_filter.py Evaluate traces by agent name/version/id with optional smart filtering - S4: sample_multiturn_conversation_simulation.py Simulate multi-turn conversations against an agent and evaluate - Data: sample_data_simulation_scenarios.jsonl (3 seed scenarios for S4) All samples use 4 conversation-level evaluators: customer_satisfaction, task_completion, coherence, groundedness. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

The .id property on upload_file() returns Optional[str], which pyright flags when passed to SourceFileID(id=...) which expects str. Split the chain and add an assert to narrow the type. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

howieleung · 2026-05-21T18:24:33Z

+
+    Before running the sample:
+
+    pip install "azure-ai-projects>=2.0.0" python-dotenv


I believe 2.0.0 is OK. Please confirm.

howieleung

comments

kseager requested review from bobogogo1990, dargilco, glharper, howieleung, kingernupur, nick863, trangevi and trrwilson as code owners May 20, 2026 23:55

github-actions Bot added the AI Projects label May 20, 2026

kseager changed the title ~~feat(samples): add multi-turn conversation evaluation sample (S1)~~ Evaluation Samples: add multi-turn conversation evaluation samples S1-S4 May 21, 2026

howieleung reviewed May 21, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Evaluation Samples: add multi-turn conversation evaluation samples S1-S4#47034

Evaluation Samples: add multi-turn conversation evaluation samples S1-S4#47034
kseager wants to merge 3 commits into
feature/azure-ai-projects/2.2.0from
kaseager/multiturn-eval-samples

kseager commented May 20, 2026 •

edited

Loading

Uh oh!

howieleung May 21, 2026

Uh oh!

howieleung left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants


		Before running the sample:

		pip install "azure-ai-projects>=2.0.0" python-dotenv

Conversation

kseager commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

All SDK Contribution checklist:

General Guidelines and Best Practices

Testing Guidelines

Uh oh!

howieleung May 21, 2026

Choose a reason for hiding this comment

Uh oh!

howieleung left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

kseager commented May 20, 2026 •

edited

Loading