Skip to content

Grafana settings lost when host ethernet interface goes down and back up

Reporters

  • Mika Silvola (Infinera)
  • Lluis Gifre (CTTC)

Description

Host running TFS had network issues and interface went down and back up. As results Grafana restarted and comes back on default settings without TFS related templates. To recovery from situation TFS must be restarted. Root cause of the problem is that deploy/tfs.sh scripts configures Granana setup via script once, but this settings is not re-played in case of Grafana restart.

Deployment environment

  • Ubunut 22.04
  • MicroK8s (1.2.24):
  enabled:
    linkerd              # (community) Linkerd is a service mesh for Kubernetes and other frameworks
    community            # (core) The community addons repository
    dns                  # (core) CoreDNS
    ha-cluster           # (core) Configure high availability on the current node
    helm3                # (core) Helm 3 - Kubernetes package manager
    hostpath-storage     # (core) Storage class; allocates storage from host directory
    ingress              # (core) Ingress controller for external access
    metrics-server       # (core) K8s Metrics Server for API access to service metrics
    prometheus           # (core) Prometheus operator for monitoring and logging
    registry             # (core) Private image registry exposed on localhost:32000
    storage              # (core) Alias to hostpath-storage add-on, deprecated
  • TeraFlowSDN (include release/branch-name/commit-id): commit 7335ba27 (origin/develop) Date: Thu Feb 8 10:35:40 2024 +0000

TFS deployment settings

  • base/minimal settings, +ztp and +monitoring enabled

Sequence of actions that resulted in the bug

  • easy to re-produce:

  • create setup which generates statistic events and monitor via grafana

  • sudo ifconfig down

  • grafana restarts and after comes back has no TFS settings present

  • sudo ifconfig up

  • Lluis gave quick analysis:

    • probably the reason is because the failure causes grafana to self-redeploy and grafana settings are not persisted in disk (it uses a deployment set not a stateful set)

Document the explicit error

  • grafana lost settings

Expected behaviour

  • Expected behavior would be that Grafana doesn't restart on interface down/up transition, or alternatively in case restart re-plays TFS related settings back.

References

Edited by Lluis Gifre Renom