Skip to content

Observability

RPC Plane exposes Prometheus metrics and a health endpoint out of the box. No configuration needed.


Prometheus metrics

Metrics are served on :9401/metrics (configurable via server.metrics_listen).

curl http://localhost:9401/metrics

Metrics reference

Metric Type Labels Description
rpc_plane_requests_total Counter method, provider, status Total JSON-RPC calls handled. A batch request increments by its call count, so a 1000-call batch counts as 1000.
rpc_plane_request_duration_seconds Histogram method, provider Request latency. One observation per request (a batch records its single round-trip latency once).
rpc_plane_provider_health_score Gauge provider Current health score (0.0–1.0)
rpc_plane_provider_slot_height Gauge provider Last observed slot height
rpc_plane_slot_drift Gauge provider Slots behind network tip
rpc_plane_circuit_breaker_state Gauge provider 0=closed, 1=half-open, 2=open
rpc_plane_failover_total Counter from_provider, to_provider Failover events

Batch request labels

For a JSON-RPC batch, the method label is normalized to keep cardinality bounded: distinct method names are deduplicated and capped (up to 5, with a +N suffix for the rest). A homogeneous batch collapses to the bare method name (e.g. a batch of 1000 getTransaction is labeled getTransaction), so it groups with single calls — while the counter still reflects all 1000 calls.

Prometheus scrape config

# prometheus.yml
scrape_configs:
  - job_name: rpc-plane
    static_configs:
      - targets: ["localhost:9401"]

For Prometheus Operator (Kubernetes):

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: rpc-plane
spec:
  endpoints:
    - port: metrics
      interval: 15s
  selector:
    matchLabels:
      app: rpc-plane

Health endpoint

curl http://localhost:9401/health | jq

Returns a JSON snapshot of each provider's current state: health score, slot height, slot drift, circuit state, latency.


Live status CLI

rpc-plane status
#   NAME          SCORE          SLOT   DRIFT     LATENCY  CIRCUIT
#   ----------  -------  ------------  ------  ----------  -------
#   provider-a    0.912   341892471       0      23.4ms     closed
#   provider-b    0.841   341892469       2      31.1ms     closed
#   provider-c    0.724   341892468       3      38.7ms     closed

The proxy must be running. rpc-plane status reads from the health endpoint.


Grafana dashboard

The repository ships grafana/dashboard.json — import it directly into any Grafana instance.

Import steps

  1. In Grafana, go to Dashboards → Import.
  2. Click Upload JSON file and select grafana/dashboard.json from the release archive or the repo.
  3. Select your Prometheus datasource.
  4. Click Import.

Dashboard panels

Panel Description
Provider health score Score over time per provider (multi-line chart)
Slot height Absolute slot height per provider
Slot drift Slots behind network tip — gauge and line chart
Circuit breaker state Current state per provider (table)
Request rate by method Top 10 methods by request rate
Latency p50/p95/p99 Percentile breakdown over time
Failovers Failover events as timeline annotations
Error rate Per-provider error rate
Period totals Stat cards — today / 7d / 30d (follows dashboard time picker)