Errors in automationservice deployment for OFC'22 test
Reporters
- Pooyan Safari (Fraunhofer HHI)
Description
While trying to run the OFC'22 Demo, running ./src/tests/ofc22/run_test_01_bootstrap.sh
fails as in:
================================================ test session starts =================================================
platform linux -- Python 3.9.17, pytest-6.2.5, py-1.11.0, pluggy-1.2.0 -- /home/ubuntu/.pyenv/versions/3.9.17/envs/tfs/bin/python
cachedir: .pytest_cache
benchmark: 3.4.1 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /home/ubuntu/tfs-ctrl
plugins: depends-1.0.1, benchmark-3.4.1, requests-mock-1.9.3
collected 3 items
src/tests/ofc22/tests/test_functional_bootstrap.py::test_scenario_bootstrap PASSED [ 33%]
src/tests/ofc22/tests/test_functional_bootstrap.py::test_scenario_devices_enabled PASSED [ 66%]
src/tests/ofc22/tests/test_functional_bootstrap.py::test_scenario_kpis_created FAILED [100%]
====================================================== FAILURES ======================================================
_____________________________________________ test_scenario_kpis_created _____________________________________________
context_client = <context.client.ContextClient.ContextClient object at 0x7fee50876bb0>
monitoring_client = <monitoring.client.MonitoringClient.MonitoringClient object at 0x7fee50896d90>
def test_scenario_kpis_created(
context_client : ContextClient, # pylint: disable=redefined-outer-name
monitoring_client: MonitoringClient, # pylint: disable=redefined-outer-name
) -> None:
"""
This test validates that KPIs related to the service/device/endpoint were created
during the service creation process.
"""
response = context_client.ListDevices(Empty())
kpis_expected = set()
for device in response.devices:
device_uuid = device.device_id.device_uuid.uuid
for endpoint in device.device_endpoints:
endpoint_uuid = endpoint.endpoint_id.endpoint_uuid.uuid
for kpi_sample_type in endpoint.kpi_sample_types:
kpis_expected.add((device_uuid, endpoint_uuid, kpi_sample_type))
num_kpis_expected = len(kpis_expected)
LOGGER.info('Num KPIs expected: {:d}'.format(num_kpis_expected))
num_kpis_created, num_retry = 0, 0
while (num_kpis_created != num_kpis_expected) and (num_retry < 10):
response: KpiDescriptorList = monitoring_client.GetKpiDescriptorList(Empty())
num_kpis_created = len(response.kpi_descriptor_list)
LOGGER.info('Num KPIs created: {:d}'.format(num_kpis_created))
time.sleep(.5)
num_retry += 1
> assert num_kpis_created == num_kpis_expected
E assert 0 == 16
E +0
E -16
src/tests/ofc22/tests/test_functional_bootstrap.py:97: AssertionError
------------------------------------------------- Captured log call --------------------------------------------------
INFO src.tests.ofc22.tests.test_functional_bootstrap:test_functional_bootstrap.py:88 Num KPIs expected: 16
INFO src.tests.ofc22.tests.test_functional_bootstrap:test_functional_bootstrap.py:94 Num KPIs created: 0
INFO src.tests.ofc22.tests.test_functional_bootstrap:test_functional_bootstrap.py:94 Num KPIs created: 0
INFO src.tests.ofc22.tests.test_functional_bootstrap:test_functional_bootstrap.py:94 Num KPIs created: 0
INFO src.tests.ofc22.tests.test_functional_bootstrap:test_functional_bootstrap.py:94 Num KPIs created: 0
INFO src.tests.ofc22.tests.test_functional_bootstrap:test_functional_bootstrap.py:94 Num KPIs created: 0
INFO src.tests.ofc22.tests.test_functional_bootstrap:test_functional_bootstrap.py:94 Num KPIs created: 0
INFO src.tests.ofc22.tests.test_functional_bootstrap:test_functional_bootstrap.py:94 Num KPIs created: 0
INFO src.tests.ofc22.tests.test_functional_bootstrap:test_functional_bootstrap.py:94 Num KPIs created: 0
INFO src.tests.ofc22.tests.test_functional_bootstrap:test_functional_bootstrap.py:94 Num KPIs created: 0
INFO src.tests.ofc22.tests.test_functional_bootstrap:test_functional_bootstrap.py:94 Num KPIs created: 0
============================================== short test summary info ===============================================
FAILED src/tests/ofc22/tests/test_functional_bootstrap.py::test_scenario_kpis_created - assert 0 == 16
============================================ 1 failed, 2 passed in 7.77s =============================================
Deployment environment
- Operating System: Ubuntu 22.04.2 LTS
- MicroK8s: 1.24/stable (v1.24.16)
- TeraFlowSDN (include release/branch-name/commit-id): 8cd8e0d9
TFS deployment settings
- list of components deployed
- particular configurations you applied
Running kubectl get pods -n tfs
returns:
NAME READY STATUS RESTARTS AGE
pathcompservice-d56c649bc-fx47q 3/3 Running 0 86m
serviceservice-7d7674d498-2shq2 2/2 Running 0 86m
deviceservice-96c665455-trnw6 2/2 Running 0 86m
sliceservice-74c9d84d9d-2drfw 2/2 Running 0 86m
computeservice-5dcc874689-v4n4l 2/2 Running 0 86m
monitoringservice-6b6fb8bcff-sc8h5 2/2 Running 0 86m
contextservice-7c778979fd-tz5ct 2/2 Running 0 86m
webuiservice-5bf5df4cc5-4c9ct 3/3 Running 0 86m
automationservice-5d7f8ffcbb-54qq4 1/2 CrashLoopBackOff 21 (2m5s ago) 84m
This is the build_automation.log.
Running kubectl describe pod/automationservice-<id> -n tfs automationservice
returns:
Name: automationservice-5d7f8ffcbb-54qq4
Namespace: tfs
Priority: 0
Node: dsp-ai-ubn17/192.168.127.158
Start Time: Tue, 01 Aug 2023 13:20:57 +0000
Labels: app=automationservice
app.kubernetes.io/name=automationservice
linkerd.io/control-plane-ns=linkerd
linkerd.io/proxy-deployment=automationservice
linkerd.io/workload-ns=tfs
pod-template-hash=5d7f8ffcbb
Annotations: app.quarkus.io/build-timestamp: 2022-09-19 - 10:48:18 +0000
cni.projectcalico.org/containerID: fb2079ea0531e0834116a0b82424bb1fcb830294ca4979a1cad045d86431590b
cni.projectcalico.org/podIP: 10.1.240.143/32
cni.projectcalico.org/podIPs: 10.1.240.143/32
config.linkerd.io/proxy-cpu-limit: 1
config.linkerd.io/proxy-cpu-request: 10m
config.linkerd.io/proxy-memory-limit: 256Mi
config.linkerd.io/proxy-memory-request: 64Mi
linkerd.io/created-by: linkerd/proxy-injector stable-2.11.1
linkerd.io/identity-mode: default
linkerd.io/inject: enabled
linkerd.io/proxy-version: stable-2.11.1
viz.linkerd.io/tap-enabled: true
Status: Running
IP: 10.1.240.143
IPs:
IP: 10.1.240.143
Controlled By: ReplicaSet/automationservice-5d7f8ffcbb
Init Containers:
linkerd-init:
Container ID: containerd://aafcb7292c24799d5607afcde023bbbd775e0159c3e64ce748852693f7f30633
Image: cr.l5d.io/linkerd/proxy-init:v1.4.0
Image ID: cr.l5d.io/linkerd/proxy-init@sha256:60d12fbb0b4a53962a5c2a59b496b3ee20052d26c0c56fd2ee38fd7fae62146e
Port: <none>
Host Port: <none>
Args:
--incoming-proxy-port
4143
--outgoing-proxy-port
4140
--proxy-uid
2102
--inbound-ports-to-ignore
4190,4191,4567,4568
--outbound-ports-to-ignore
4567,4568
State: Terminated
Reason: Completed
Exit Code: 0
Started: Tue, 01 Aug 2023 13:20:58 +0000
Finished: Tue, 01 Aug 2023 13:20:58 +0000
Ready: True
Restart Count: 0
Limits:
cpu: 100m
memory: 50Mi
Requests:
cpu: 10m
memory: 10Mi
Environment: <none>
Mounts:
/run from linkerd-proxy-init-xtables-lock (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-x87hh (ro)
Containers:
linkerd-proxy:
Container ID: containerd://ce1dad59ce5807618a61499d444fde33e395b5564c9043377372039a6cf25da2
Image: cr.l5d.io/linkerd/proxy:stable-2.11.1
Image ID: cr.l5d.io/linkerd/proxy@sha256:91b53d4b39e4c058e5fc63b72dd7ab6fe7f7051869ec5251dc9c0d8287b2771f
Ports: 4143/TCP, 4191/TCP
Host Ports: 0/TCP, 0/TCP
State: Running
Started: Tue, 01 Aug 2023 13:20:59 +0000
Ready: True
Restart Count: 0
Limits:
cpu: 1
memory: 256Mi
Requests:
cpu: 10m
memory: 64Mi
Liveness: http-get http://:4191/live delay=10s timeout=1s period=10s #success=1 #failure=3
Readiness: http-get http://:4191/ready delay=2s timeout=1s period=10s #success=1 #failure=3
Environment:
_pod_name: automationservice-5d7f8ffcbb-54qq4 (v1:metadata.name)
_pod_ns: tfs (v1:metadata.namespace)
_pod_nodeName: (v1:spec.nodeName)
LINKERD2_PROXY_CORES: 1
LINKERD2_PROXY_LOG: warn,linkerd=info
LINKERD2_PROXY_LOG_FORMAT: plain
LINKERD2_PROXY_DESTINATION_SVC_ADDR: linkerd-dst-headless.linkerd.svc.cluster.local.:8086
LINKERD2_PROXY_DESTINATION_PROFILE_NETWORKS: 10.0.0.0/8,100.64.0.0/10,172.16.0.0/12,192.168.0.0/16
LINKERD2_PROXY_POLICY_SVC_ADDR: linkerd-policy.linkerd.svc.cluster.local.:8090
LINKERD2_PROXY_POLICY_WORKLOAD: $(_pod_ns):$(_pod_name)
LINKERD2_PROXY_INBOUND_DEFAULT_POLICY: all-unauthenticated
LINKERD2_PROXY_POLICY_CLUSTER_NETWORKS: 10.0.0.0/8,100.64.0.0/10,172.16.0.0/12,192.168.0.0/16
LINKERD2_PROXY_INBOUND_CONNECT_TIMEOUT: 100ms
LINKERD2_PROXY_OUTBOUND_CONNECT_TIMEOUT: 1000ms
LINKERD2_PROXY_CONTROL_LISTEN_ADDR: 0.0.0.0:4190
LINKERD2_PROXY_ADMIN_LISTEN_ADDR: 0.0.0.0:4191
LINKERD2_PROXY_OUTBOUND_LISTEN_ADDR: 127.0.0.1:4140
LINKERD2_PROXY_INBOUND_LISTEN_ADDR: 0.0.0.0:4143
LINKERD2_PROXY_INBOUND_IPS: (v1:status.podIPs)
LINKERD2_PROXY_INBOUND_PORTS: 5050,8080
LINKERD2_PROXY_DESTINATION_PROFILE_SUFFIXES: svc.cluster.local.
LINKERD2_PROXY_INBOUND_ACCEPT_KEEPALIVE: 10000ms
LINKERD2_PROXY_OUTBOUND_CONNECT_KEEPALIVE: 10000ms
LINKERD2_PROXY_INBOUND_PORTS_DISABLE_PROTOCOL_DETECTION: 25,587,3306,4444,5432,6379,9300,11211
LINKERD2_PROXY_DESTINATION_CONTEXT: {"ns":"$(_pod_ns)", "nodeName":"$(_pod_nodeName)"}
_pod_sa: (v1:spec.serviceAccountName)
_l5d_ns: linkerd
_l5d_trustdomain: cluster.local
LINKERD2_PROXY_IDENTITY_DIR: /var/run/linkerd/identity/end-entity
LINKERD2_PROXY_IDENTITY_TRUST_ANCHORS: -----BEGIN CERTIFICATE-----
MIIBiTCCAS6gAwIBAgIBATAKBggqhkjOPQQDAjAcMRowGAYDVQQDExFpZGVudGl0
eS5saW5rZXJkLjAeFw0yMzA3MjgxMTE3MTZaFw0yNDA3MjcxMTE3MzZaMBwxGjAY
BgNVBAMTEWlkZW50aXR5LmxpbmtlcmQuMFkwEwYHKoZIzj0CAQYIKoZIzj0DAQcD
QgAEz3DKOGGWtZuQwe7KGGBeDRs9++YoTWY5DJb+6Z9nPCZwYCmQS2mfZ6zhHThG
gKSrL9Fwe6qqxkZwLzgjOK1E5KNhMF8wDgYDVR0PAQH/BAQDAgEGMB0GA1UdJQQW
MBQGCCsGAQUFBwMBBggrBgEFBQcDAjAPBgNVHRMBAf8EBTADAQH/MB0GA1UdDgQW
BBSst/3DVP4w6eUnSREOuZUCrmqcrzAKBggqhkjOPQQDAgNJADBGAiEApPwW2MkY
O9JvtppHaTQEO7ysMCbkGQt+wS2L4slerA0CIQDJ+XPM3OZpeb66naP3bVmRbjtn
RfqDMwEr9KDVCdutLQ==
-----END CERTIFICATE-----
LINKERD2_PROXY_IDENTITY_TOKEN_FILE: /var/run/secrets/kubernetes.io/serviceaccount/token
LINKERD2_PROXY_IDENTITY_SVC_ADDR: linkerd-identity-headless.linkerd.svc.cluster.local.:8080
LINKERD2_PROXY_IDENTITY_LOCAL_NAME: $(_pod_sa).$(_pod_ns).serviceaccount.identity.linkerd.cluster.local
LINKERD2_PROXY_IDENTITY_SVC_NAME: linkerd-identity.linkerd.serviceaccount.identity.linkerd.cluster.local
LINKERD2_PROXY_DESTINATION_SVC_NAME: linkerd-destination.linkerd.serviceaccount.identity.linkerd.cluster.local
LINKERD2_PROXY_POLICY_SVC_NAME: linkerd-destination.linkerd.serviceaccount.identity.linkerd.cluster.local
LINKERD2_PROXY_TAP_SVC_NAME: tap.linkerd-viz.serviceaccount.identity.linkerd.cluster.local
Mounts:
/var/run/linkerd/identity/end-entity from linkerd-identity-end-entity (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-x87hh (ro)
automationservice:
Container ID: containerd://a6acf8ad3a55fa0fbdcd26cbd39a7c65a56ecf55ee02f3a00257e60586edcce9
Image: localhost:32000/tfs/automation:dev
Image ID: localhost:32000/tfs/automation@sha256:6a83b0b3b3c39b5cd09b836645bc3b32bf78982e9cfbb812488db1ff76ecc5c3
Ports: 5050/TCP, 8080/TCP
Host Ports: 0/TCP, 0/TCP
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Tue, 01 Aug 2023 14:48:41 +0000
Finished: Tue, 01 Aug 2023 14:48:41 +0000
Ready: False
Restart Count: 22
Limits:
cpu: 2
memory: 2Gi
Requests:
cpu: 50m
memory: 512Mi
Liveness: http-get http://:8080/q/health/live delay=2s timeout=10s period=10s #success=1 #failure=3
Readiness: http-get http://:8080/q/health/ready delay=2s timeout=10s period=10s #success=1 #failure=3
Environment:
KUBERNETES_NAMESPACE: tfs (v1:metadata.namespace)
CONTEXT_SERVICE_HOST: contextservice
DEVICE_SERVICE_HOST: deviceservice
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-x87hh (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
kube-api-access-x87hh:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
linkerd-proxy-init-xtables-lock:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
linkerd-identity-end-entity:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium: Memory
SizeLimit: <unset>
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning BackOff 65s (x448 over 90m) kubelet Back-off restarting failed container
Running kubectl logs pod/automationservice-<id> -n tfs automationservice
returns:
exec java -Dquarkus.http.host=0.0.0.0 -Djava.util.logging.manager=org.jboss.logmanager.LogManager -XX:+ExitOnOutOfMemoryError -cp . -jar /deployments/quarkus-run.jar
Exception in thread "main" java.lang.reflect.InvocationTargetException
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at io.quarkus.bootstrap.runner.QuarkusEntryPoint.doRun(QuarkusEntryPoint.java:48)
at io.quarkus.bootstrap.runner.QuarkusEntryPoint.main(QuarkusEntryPoint.java:25)
Caused by: java.lang.Error: java.io.FileNotFoundException: /usr/lib/jvm/java-11-openjdk-11.0.20.0.8-2.el8.x86_64/lib/tzdb.dat (No such file or directory)
at java.base/sun.util.calendar.ZoneInfoFile$1.run(ZoneInfoFile.java:261)
at java.base/sun.util.calendar.ZoneInfoFile$1.run(ZoneInfoFile.java:251)
at java.base/java.security.AccessController.doPrivileged(Native Method)
at java.base/sun.util.calendar.ZoneInfoFile.<clinit>(ZoneInfoFile.java:251)
at java.base/sun.util.calendar.ZoneInfo.getTimeZone(ZoneInfo.java:588)
at java.base/java.util.TimeZone.getTimeZone(TimeZone.java:577)
at java.base/java.util.TimeZone.setDefaultZone(TimeZone.java:682)
at java.base/java.util.TimeZone.getDefaultRef(TimeZone.java:653)
at java.base/java.util.TimeZone.getDefault(TimeZone.java:642)
at java.base/java.time.ZoneId.systemDefault(ZoneId.java:272)
at jdk.zipfs/jdk.nio.zipfs.ZipUtils.dosToJavaTime(ZipUtils.java:120)
at jdk.zipfs/jdk.nio.zipfs.ZipFileSystem$Entry.cen(ZipFileSystem.java:2299)
at jdk.zipfs/jdk.nio.zipfs.ZipFileSystem$Entry.readCEN(ZipFileSystem.java:2286)
at jdk.zipfs/jdk.nio.zipfs.ZipFileSystem.getEntry(ZipFileSystem.java:1427)
at jdk.zipfs/jdk.nio.zipfs.ZipFileSystem.newInputStream(ZipFileSystem.java:594)
at jdk.zipfs/jdk.nio.zipfs.ZipPath.newInputStream(ZipPath.java:721)
at jdk.zipfs/jdk.nio.zipfs.ZipFileSystemProvider.newInputStream(ZipFileSystemProvider.java:275)
at java.base/java.nio.file.Files.newInputStream(Files.java:156)
at io.smallrye.common.classloader.ClassPathUtils.readStream(ClassPathUtils.java:196)
at io.smallrye.config.source.yaml.YamlConfigSource.<init>(YamlConfigSource.java:68)
at io.quarkus.config.yaml.runtime.ApplicationYamlConfigSourceLoader.loadConfigSource(ApplicationYamlConfigSourceLoader.java:26)
at io.smallrye.config.AbstractLocationConfigSourceLoader.addConfigSource(AbstractLocationConfigSourceLoader.java:217)
at io.smallrye.config.AbstractLocationConfigSourceLoader.addConfigSource(AbstractLocationConfigSourceLoader.java:212)
at io.smallrye.config.AbstractLocationConfigSourceLoader.access$300(AbstractLocationConfigSourceLoader.java:49)
at io.smallrye.config.AbstractLocationConfigSourceLoader$ConfigSourcePathConsumer.accept(AbstractLocationConfigSourceLoader.java:303)
at io.smallrye.config.AbstractLocationConfigSourceLoader$ConfigSourcePathConsumer.accept(AbstractLocationConfigSourceLoader.java:290)
at io.smallrye.common.classloader.ClassPathUtils.lambda$consumeAsPath$0(ClassPathUtils.java:102)
at io.smallrye.common.classloader.ClassPathUtils.processAsPath(ClassPathUtils.java:133)
at io.smallrye.common.classloader.ClassPathUtils.consumeAsPath(ClassPathUtils.java:101)
at io.smallrye.common.classloader.ClassPathUtils.consumeAsPaths(ClassPathUtils.java:85)
at io.smallrye.config.AbstractLocationConfigSourceLoader.tryClassPath(AbstractLocationConfigSourceLoader.java:128)
at io.smallrye.config.AbstractLocationConfigSourceLoader.loadConfigSources(AbstractLocationConfigSourceLoader.java:93)
at io.smallrye.config.AbstractLocationConfigSourceLoader.loadConfigSources(AbstractLocationConfigSourceLoader.java:76)
at io.quarkus.config.yaml.runtime.ApplicationYamlConfigSourceLoader$InClassPath.getConfigSources(ApplicationYamlConfigSourceLoader.java:34)
at io.quarkus.config.yaml.runtime.ApplicationYamlConfigSourceLoader$InClassPath.getConfigSources(ApplicationYamlConfigSourceLoader.java:29)
at io.quarkus.runtime.configuration.RuntimeConfigSourceProvider.getConfigSources(RuntimeConfigSourceProvider.java:21)
at io.quarkus.runtime.configuration.ConfigUtils.addSourceProvider(ConfigUtils.java:172)
at io.quarkus.runtime.generated.Config.<clinit>(Config.zig:431)
at io.quarkus.runner.ApplicationImpl.<clinit>(ApplicationImpl.zig:60)
at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:490)
at java.base/java.lang.Class.newInstance(Class.java:584)
at io.quarkus.runtime.Quarkus.run(Quarkus.java:65)
at io.quarkus.runtime.Quarkus.run(Quarkus.java:42)
at io.quarkus.runtime.Quarkus.run(Quarkus.java:119)
at io.quarkus.runner.GeneratedMain.main(GeneratedMain.zig:29)
... 6 more
Caused by: java.io.FileNotFoundException: /usr/lib/jvm/java-11-openjdk-11.0.20.0.8-2.el8.x86_64/lib/tzdb.dat (No such file or directory)
at java.base/java.io.FileInputStream.open0(Native Method)
at java.base/java.io.FileInputStream.open(FileInputStream.java:219)
at java.base/java.io.FileInputStream.<init>(FileInputStream.java:157)
at java.base/sun.util.calendar.ZoneInfoFile$1.run(ZoneInfoFile.java:255)
... 53 more
Sequence of actions that resulted in the bug
- be explicit and enumerate each step.
- if possible, include example calls, request files, descriptor files, etc.
Document the explicit error
- include the traceback of the error
- include the log files of the components / scheenshoots of WebUI (attached files)
- if known, include the list of TFS components affected by the bug
Expected behaviour
Pass all the tests.
References
N/A
Edited by Lluis Gifre Renom