This work is funded by the European Commission through the P2CODE - “Programming Platform for intelligent COllaborative DEployments over heterogeneous edge-IoT environments” under GA No. 101093069
Edited
Designs
Child items
...
Show closed items
Linked items
0
Link issues together to show that they're related or that one is blocking others.
Learn more.
A fix can be found in the branch feat/139-ubi-p4-driver-does-not-correctly-retrieve-resources.
The relevant MR (!217 (closed)) is marked as draft atm, in order to make sure that we do not have any further errors.
For the same reasons, the current issue is kept open
I will check that once again (either in clean hackfest3 VM or "normal" TFS VM), maybe there was some merge error on my side because even though I added these changes bootstrap functional test still fails (attachment).
@famelis how exactly did you manage to have the P4 devices onboarded to TFS and 01_bootstrap passed? A log from your execution after the patch would be helpful. The part I'm not sure about is whether you used fabric_v1model.p4 files or default main.p4 sitting in the TFS.
the logs that are already available are using fabric-int-v1model compilation files, the ones with which I started this thread. Common issue for these logs is this part, once again for AddDevice function:
My findings so far: this error message, i.e. "A object has no attribute B" (python AttributeError) pertains to getattr function implemented in p4_manager, which is defined for _MeterConfig and _IdleTimeout classes. From the highlighted screenshot I cannot identify which class actually caused that, so I need to look closer.
For the latter log I actually applied pytest flag --full-trace to get even more verbose output, but it only gives a little more detailed insight into how pytest is instantiated during test execution, not much more info about the root cause itself.
I can see that P4Manager class defines p4_objects field, which is self-explanatory along the way.
__getattr__ is supposed to return getattr(self._msg, name). If this object is seen as 'NoneType', there is a chance that self._msg = p4runtime_pb2.MeterConfig() is of type None (?)
This might come as irrelevant, but what I noticed as well is a behaviour of this bootstrap program that occurs at first execution after VM reboot / reinstantiation, i.e. test progress stops at 50%, as if it was waiting for something. Test rerun shows the error as it is described above.
On the left pane there is functional test execution (with verbose output from pytest), on the right - kubectl logs for Device pod. What can be seen is that during test_devices_bootstraping method execution, where 'NoneType' object has no attribute 'p4_objects' occurs, from the device pod standpoint it leads to AddDevice exception, i.e.
File "/var/teraflow/device/service/drivers/p4/p4_driver.py", line 197, in GetConfig obj_name for obj_name, _ in self.__manager.p4_objects.items()AttributeError: 'NoneType' object has no attribute 'p4_objects'
I forgot to add this comment a couple of days ago, I need to verify that but most likely that is the case. The error you actually see above comes from "vanilla" TFS VM, i.e. there are no adjustments made specifically for hackfest, and - what is important device-wise - no mininet environment. TFS VM out of the box does not include it, there is even an instruction from previous hackfest event how to set it up.
So there is a chance that simply not having mininet installed and started in the background (with the proper topology) could cause this AddDevice exception. After all, it'd be at least naive to assume that P4 device could be registered out of thin air, without the actual environment to run it. My bad for messing with the other VM than described in the issue
I know I'm not supposed to be working on different branch, but simply out of curiosity I tried to recreate this behavior in the situation where there is feat/hackfest3 branch on which changes to p4driver/p4manager are applied by hand.
Going down this road I found out that files I want to work on (fabric_v1model compilation files) pass the tests just fine. I attach the logs to back it up. 2024-04-03-tsh.log
I am still trying to troubleshoot abovementioned bug, for which I will attach separate log.
Hi @katsikasg, have you or Alex/Pantelis tried to recreate this setup and described problem? After a short break I will resume my activities, but let me know how it's been going on your end so far.
For the recreation of this open issue it is also important to notice that there is a drift between the state of hackfest3 VM as it was delivered for the event and the current state of feat/hackfest3 branch that can be accessed in the repo. This is important mainly from the pytest behavior standpoint, because in the latest version of hackfest3 branch there are changes in the P4 service handler which are relevant for the last part of the demo, namely for the telemetry toy case.
Therefore in the attachment you can see a log where for the main.p4 program - the very basic one presented at the beginning - the output for the functional test which handles the service will try to look for INT-related tables (because the implementation of P4 service handler contains simple JSON-based functions that operate on them) which are not included in the main.p4 program.
It will not block the mininet ping, though - just that the functional test will not be marked as PASSED and in the web ui service will have status SERVICESTATUS_PLANNED.
I am mentioning this because then it is propagated in the relevant feat/139-ubi-p4-driver-does-not-correctly-retrieve-resources branch, obviously.
Also in the log you will see that for the device bootstrap test I had some minor issues with experimental/irrelevant features, which I simply commented out in the DeviceClient: