Skip to content

Observability

Observability in Satusky spans three different things that are easy to confuse:

LayerQuestion
WorkloadAre pods built, scheduled, and ready?
EndpointCan users reach the public hostname over valid HTTPS?
MachineIs the underlying capacity healthy and usable?

A complete operator view needs all three.

SourceCurrent use
Kubernetes watch / informer datadeployment status and pod readiness
Kubernetes metrics APICPU / memory observations
Prometheusapplication, network, and machine/network time-series
Hubble-derived metrics pathsdeployment network quality / latency inputs
Talos APIlow-level machine resources and state
SideroLinkmachine connectivity and discovery
WebSocketslive deployment, machine, log, and notification streams
PostgreSQLhistorical metrics, billing records, persisted metadata

The backend already contains many useful observability pieces:

  • deployment live-status WebSockets,
  • machine status WebSockets,
  • deployment metrics endpoints,
  • Prometheus-backed application and network metrics,
  • machine health jobs,
  • Talos resource inspection,
  • machine logs and events.

The gap is less “no observability exists” and more “the user-facing model is not yet unified.”

status = concise current state
metrics = changing measurements over time
logs = emitted events / text streams
events = lifecycle and system transitions
check = cross-plane validation, especially for domains

That means:

  • deploy status should remain workload-focused,
  • domain checks should own public endpoint diagnostics,
  • machine commands should own fleet health,
  • dashboards may compose them, but the architecture should not blur them.

A deployment can be:

  • pod-ready but publicly unreachable,
  • publicly routable but backed by a degraded node,
  • healthy in one cluster and unhealthy in another,
  • consuming resources normally while billing state is impaired.

The platform should preserve these dimensions instead of collapsing them into one opaque “healthy” bit.

GapTarget
Machine telemetry exists mostly behind APIs, not a mature CLI contract.First-class machine status and metrics workflows.
Public endpoint readiness is not yet reported with the same rigor as pod readiness.Route/DNS/TLS/HTTP checks become standard.
Metrics sources are rich but scattered.One documented observability model with clear source-of-truth boundaries.
Historical billing observations and live operational metrics can be conflated.Distinguish accounting records from live telemetry.