Observability¶
RPC Plane exposes Prometheus metrics and a health endpoint out of the box. No configuration needed.
Prometheus metrics¶
Metrics are served on :9401/metrics (configurable via server.metrics_listen).
Metrics reference¶
| Metric | Type | Labels | Description |
|---|---|---|---|
rpc_plane_requests_total |
Counter | method, provider, status |
Total requests handled |
rpc_plane_request_duration_seconds |
Histogram | method, provider |
Request latency |
rpc_plane_provider_health_score |
Gauge | provider |
Current health score (0.0–1.0) |
rpc_plane_provider_slot_height |
Gauge | provider |
Last observed slot height |
rpc_plane_slot_drift |
Gauge | provider |
Slots behind network tip |
rpc_plane_circuit_breaker_state |
Gauge | provider |
0=closed, 1=half-open, 2=open |
rpc_plane_failover_total |
Counter | from_provider, to_provider |
Failover events |
Prometheus scrape config¶
# prometheus.yml
scrape_configs:
- job_name: rpc-plane
static_configs:
- targets: ["localhost:9401"]
For Prometheus Operator (Kubernetes):
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: rpc-plane
spec:
endpoints:
- port: metrics
interval: 15s
selector:
matchLabels:
app: rpc-plane
Health endpoint¶
Returns a JSON snapshot of each provider's current state: health score, slot height, slot drift, circuit state, latency.
Live status CLI¶
rpc-plane status
# NAME SCORE SLOT DRIFT LATENCY CIRCUIT
# -------- ------- ------------ ------ ---------- -------
# helius 0.912 341892471 0 23.4ms closed
# quicknode 0.841 341892469 2 31.1ms closed
# triton 0.000 — — — open
The proxy must be running. rpc-plane status reads from the health endpoint.
Grafana dashboard¶
The repository ships grafana/dashboard.json — import it directly into any Grafana instance.
Import steps¶
- In Grafana, go to Dashboards → Import.
- Click Upload JSON file and select
grafana/dashboard.jsonfrom the release archive or the repo. - Select your Prometheus datasource.
- Click Import.
Dashboard panels¶
| Panel | Description |
|---|---|
| Provider health score | Score over time per provider (multi-line chart) |
| Slot height | Absolute slot height per provider |
| Slot drift | Slots behind network tip — gauge and line chart |
| Circuit breaker state | Current state per provider (table) |
| Request rate by method | Top 10 methods by request rate |
| Latency p50/p95/p99 | Percentile breakdown over time |
| Failovers | Failover events as timeline annotations |
| Error rate | Per-provider error rate |
| Period totals | Stat cards — today / 7d / 30d (follows dashboard time picker) |