Add Supavisor pooler metrics to supabase integration#23749
Conversation
The Supabase customer Metrics API (`/customer/v1/privileged/metrics`) exposes ~24 Supavisor-prefixed Prometheus metrics from Supavisor's PromEx Tenant plugin -- pool client/server connection counts, pool checkout latency histograms, client query/connection latency, network throughput, and client/db handler lifecycle counters. None of these were in the integration's allowlist, so the scraper was dropping them. The gauge `supavisor.connections.active` corresponds to the "Shared Pooler (Supavisor) Client Connections" graph in the Supabase Studio Observability section and is the metric needed to alert on pooler client-slot exhaustion -- a common failure mode in serverless / Vercel Fluid Compute setups where pools leak across function terminations. Reference: https://github.com/supabase/supavisor/blob/main/lib/supavisor/monitoring/tenant.ex
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 778ee7054b
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| @@ -0,0 +1 @@ | |||
| Add support for Supavisor pooler metrics exposed by the Supabase customer metrics endpoint. Covers pool client/server connection counts, pool checkout durations, query/connection latency histograms, and network throughput counters. Enables alerting on pooler client-connection saturation, which previously had no Datadog coverage despite being a common failure mode in serverless / Fluid Compute setups. | |||
There was a problem hiding this comment.
Rename changelog fragment to use numeric PR prefix
This fragment filename (+supavisor.added) will break changelog validation in PR CI, because ddev/src/ddev/utils/scripts/check_pr.py:get_core_repo_changelog_errors parses the part before the first . as an integer PR number (int(entry_pr_num)), which raises ValueError for +supavisor. In practice, any PR containing this file will fail the changelog check before review/release automation can proceed; the fragment should be renamed to <PR_NUMBER>.<type>.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
[Claude on behalf of Gus] Already addressed — the next commit (156c5d3) renames +supavisor.added → 23749.added. Codex was reviewing the first commit only. The Check PR changelog workflow has since passed against the renamed fragment.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files🚀 New features to boost your workflow:
|
What does this PR do?
Adds Supavisor pooler metrics to the Supabase integration's allowlist. Supabase Cloud's customer Metrics API (
/customer/v1/privileged/metrics) exposes ~24supavisor_*metrics from Supavisor's PromEx Tenant plugin, but none were mapped inmetrics.py, so the scraper was silently dropping them.This PR adds:
supavisor.connections.active,supavisor.proxy.connections.active,supavisor.pool.connections.{checked_out,idle},supavisor.tenants.activesupavisor.client.queries_count,supavisor.client.joins.{ok,fail},supavisor.{client,db}.network.{recv,send}supavisor.{client_handler,db_handler}.{started_count,stopped_count},supavisor.db_handler.db_connection_count,supavisor.db_handler.prepared_statements_evicted_countsupavisor.pool.checkout.duration.{local,remote},supavisor.client.query.duration,supavisor.client.connection.duration,supavisor.client.connection.lifetime_ms,supavisor.client_handler.state.durationFixture (
tests/fixtures/privileged_metrics.txt), assertion list (tests/common.py), and metric documentation (metadata.csv) are updated to match.Motivation
The most operationally useful Supavisor metric —
supavisor_connections_active— backs the "Shared Pooler (Supavisor) Client Connections" graph in Supabase Studio's Observability section. It's the right metric to alert on pooler client-slot exhaustion, which is a common failure mode in serverless / Vercel Fluid Compute setups (e.g. supabase/discussions#40671) where pool sockets can leak across function terminations. Without these mappings, customers running the integration have no Datadog signal for this until the database starts rejecting connections.Additional Notes
supabase_cloudtile integration (nosupabase.cloud.supavisor.*metrics ingested). Filing a separate ask with thesaas-integrationsteam so that the SaaS scraper mirrors these mappings.supabase/supavisor.Review checklist (to be filled by the PR author)
changelog/label attachedqa/skip-qalabel