Skip to content

Evaluation Samples: add multi-turn conversation evaluation samples S1-S4#47034

Open
kseager wants to merge 3 commits into
feature/azure-ai-projects/2.2.0from
kaseager/multiturn-eval-samples
Open

Evaluation Samples: add multi-turn conversation evaluation samples S1-S4#47034
kseager wants to merge 3 commits into
feature/azure-ai-projects/2.2.0from
kaseager/multiturn-eval-samples

Conversation

@kseager
Copy link
Copy Markdown
Contributor

@kseager kseager commented May 20, 2026

Add sample_multiturn_conversation_evaluation.py demonstrating:

  • Custom data source config with messages/tool_definitions schema
  • Conversation-level evaluators (customer_satisfaction, task_completion, coherence, groundedness)
  • Dataset upload and evaluation run with evaluation_level=conversation
  • Polling for results

Includes sample JSONL with 3 conversations: basic multi-turn, tool-calling, and extended support dialog.

Description

Please add an informative description that covers that changes made by the pull request and link all relevant issues.

If an SDK is being regenerated based on a new API spec, a link to the pull request containing these API spec changes should be included above.

All SDK Contribution checklist:

  • The pull request does not introduce [breaking changes]
  • CHANGELOG is updated for new features, bug fixes or other significant changes.
  • I have read the contribution guidelines.

General Guidelines and Best Practices

  • Title of the pull request is clear and informative.
  • There are a small number of commits, each of which have an informative message. This means that previously merged commits do not appear in the history of the PR. For more information on cleaning up the commits in your PR, see this page.

Testing Guidelines

  • Pull request includes test coverage for the included changes.

Add sample_multiturn_conversation_evaluation.py demonstrating:
- Custom data source config with messages/tool_definitions schema
- Conversation-level evaluators (customer_satisfaction, task_completion, coherence, groundedness)
- Dataset upload and evaluation run with evaluation_level=conversation
- Polling for results

Includes sample JSONL with 3 conversations: basic multi-turn, tool-calling, and extended support dialog.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- S2: sample_multiturn_trace_evaluation_by_id.py
  Evaluate traces by conversation_id or trace_id
- S3: sample_multiturn_trace_evaluation_agent_filter.py
  Evaluate traces by agent name/version/id with optional smart filtering
- S4: sample_multiturn_conversation_simulation.py
  Simulate multi-turn conversations against an agent and evaluate
- Data: sample_data_simulation_scenarios.jsonl (3 seed scenarios for S4)

All samples use 4 conversation-level evaluators: customer_satisfaction,
task_completion, coherence, groundedness.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@kseager kseager changed the title feat(samples): add multi-turn conversation evaluation sample (S1) Evaluation Samples: add multi-turn conversation evaluation samples S1-S4 May 21, 2026
The .id property on upload_file() returns Optional[str], which pyright
flags when passed to SourceFileID(id=...) which expects str. Split the
chain and add an assert to narrow the type.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Before running the sample:

pip install "azure-ai-projects>=2.0.0" python-dotenv
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe 2.0.0 is OK. Please confirm.

Copy link
Copy Markdown
Member

@howieleung howieleung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comments

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants