01-19-2023, 12:21 AM (This post was last modified: 01-19-2023, 12:23 AM by sig.johnnson.)
I am attempting to use both the C and Python APIs to communicate with the same instance of RoboDK. While both are connected to RoboDK at the same time, they will not issue API calls at the same time. So far, I have tested with both APIs connected on the same port.
So far, I have observed that the C API is significantly less reliable when the Python API is simultaneously connected. When using both, the Python API was used to open the RoboDK instance and established a connection first by ~10 seconds. When using just the C API, RoboDK was opened manually then a connection was established by the C API ~10 seconds later.
1. Should I have both API's connect to the same port?
2. Is having 2 API's connected at once a supported use-case, or would you expect that it's going to be pretty much impossible to get it to work like this?
3. I noticed that in general, the C API connection to RoboDK is much less reliable than the Python connection. Calls often hang on struct Item_t item = _RoboDK_recv_Item(inst); or _RoboDK_check_status(inst). Is this expected? Any tips to improve the robustness, such as checking for timeouts and attempting to reconnect?
My system: Windows 10 C API, building an .exe through Visual Studio Python 3.7 RoboDK 5.4.3
in the Python API at least 2 seconds before attempting to connect to the C API. This should prevent any complications.
I am now seeing what I believe are 2 possibly related issues.
Issue 1
When the C API starts, I see "The RoboDK API is connected!" (in _RoboDK_connect_smart) and then it just stops there without executing any more of my code. Tracing this back, it looks like I call RoboDK_getItem(), which succeeds 3 times, getting the following items:
- ROBOT
- FRAME
- OBJECT
On the 4th call, it attempts to get many targets, but after getting the item, it calls _RoboDK_check_status(inst). In that function, it follows else if (status < 100), calls _RoboDK_recv_Line(inst, strProblems), and hangs there there is no crash, exception, or error. It just stops. In this routine, I am attempting to get many targets, the last of which doesn't exist, so will not be valid, which indicates that the routine is done.
What is really weird is that if I start and restart it several times, it sometimes works.
Issue 2
In this case, the C API starts successfully and executes 100s of commands successfully. Unfortunately, it eventually just hangs. I have traced this back to a call to RoboDK_getItem() for a target. I believe that it is always (or nearly always) happening on a target that does not exist. My intent is to check whether the target exists by using Item_Valid() on the next line, but it seems that it just gets stuck when targets don't exist.
It's a little hard for me to tell due to how my code works right now, but I am pretty sure that it often successfully gets targets that don't exist and recognizes them as invalid (which is the behavior I intended to use).
Is there something weird happening when trying to return NULL through the API that could cause intermittent failure?
Can you try connecting the C API first, then the Python API? This issue could be related to the Safe mode and Auto Update flags which are defined with every new connection.
The API should work even if you have 2 or more clients connected to the same instance of RoboDK and the same port.
There may be a bug with a specific function such as getItem when you use multiple clients. Calling RDK.Render(False) may help so you won't provoke a new render event with every call of the API.
It would be great if you can share a RoboDK project to reproduce this issue. Otherwise, sharing a debug log file can also help us chase this crash. You can obtain this debug log file by following these steps:
1. Close RoboDK
2. Start RoboDK by double clicking the file:
C:/RoboDK/RoboDK-Debug.bat
3. Try to reproduce the problem in RoboDK until it crashes.
4. Right after the crash, send me this file:
C:/RoboDK/bin/RoboDK.debug.txt
I created a shareable version of my workstation that has proprietary object models removed. Hopefully this will not impact its usefulness for debugging. (Attached as "shareable.rdk")
When conducting the debugging you recommended, I am already running with the -NOUI flag, so I did not attempt to call RDK.Render(False). Let me know if I should go back and try that.
Since I am running with -NOUI, I added the command line arguments from RoboDK-Debug.bat directly to my code rather than starting RoboDK using the .bat file. Again, let me know if I should go back and try starting it that way. Command line arguments below:
["-NOSPLASH", "-SKIPMAINT", "-NOUI", "-DEBUG", "--enable-logging --log-level=3 --v=1", "^>logfile.stdout.txt"]
Connecting with the Python API first provoked a hang, as described in issue 1. The log file is attached: "python_first_issue_1.txt". Note that:
1. Both the Python and C APIs are attempting to get handles for almost every item in the workstation. This is the intended behavior.
2. It looks like some of my console prints ended up in the debug output. I believe most lines without a timestamp are something that I added. Go ahead and ignore those.
Connecting with the C API first was interesting. The C API connected then provoked a hang, as described in issue 1. This produced the attached log file "c_first_issue_1_c_log.txt". (I removed some file path information from the log file that might be too confidential for the forums. The placeholder text should be obvious.) After the C API hung, I started the Python API, which was able to successfully connect to RoboDK and produced the attached log file "c_first_issue_1_python_log.txt". Note that as usual, the Python calls never hung. The Python program successfully retrieved handles for all desired workstation objects.
I did notice one more thing that may be useful. My current architecture involves farming out operations to multiple virtual machines running on my laptop (exploits multiple cores and gets around Python's GIL). My physical machine sends information to the virtual machine, then the virtual machine conducts RoboDK operations based on that and returns a result to the physical machine.
I noticed that the issues I am describing with hangs occur exclusively on virtual machines. When running code directly on my laptop, I am not able to reproduce any of the issues described in this post.
The virtual machines are on VMware Workstation 15 Pro, 15.5.7 build-17171714, running windows 10, 64-bit. They network with the physical machine through a socket created in Python. The Python port does NOT conflict with the RoboDK port.
Thank you for such detailed report. Taking a look at the log of the first issue I see the following which was a bug on our end when we implemented the RoboDK API in C:
Code:
Debug: Running API Command: "\x00"
This has just been fixed on GitHub with a small edit (tempString variable when establishing the connection): https://github.com/RoboDK/RoboDK-API/blo...dk_api_c.c
This may help properly establishing the connection without delays and should make the C API more robust.
Also, on Linux we noticed that adding the -API_NODELAY argument helped speed things up. Example:
Code:
["-NOUI", "-DEBUG", ..., "-API_NODELAY"]
The TCP No delay could also be enabled on the C client side by changing this new variable in _RoboDK_connect:
Code:
int use_nodelay = 1; // Change from 0 to 1
Your STDOUT/Debug prints should not do any harm to your project. Quite the opposite, it can help narrow the issue.
After pulling the updated API and making the no delay modifications you mentioned, I am still experiencing the same issue.
Unfortunately, I have found that things are still working perfectly on my physical machine but badly on my virtual machines. I think this is likely due to a newer version of the RoboDK client and/or API that's on the virtual machines.
My physical machine is running RoboDK 5.4.3, while my virtual machines are running the one available on the website (5.5.2). Can you please provide the historical installer for 5.4.3 so that I can check whether the RoboDK version impacts the issues I am seeing?
By issue in your last post, do you mean the connection is very slow?
When you compare physical machine vs. virtual machine: can you provide more information about the system? (Windows, Linux, ...)
There are ways to speed up getting all items in your tree. This is possible with one API call (but not documented). What information about the tree do you need to retrieve?
I mean that I am still experiencing issue 1 from my earlier post, where calls to get tree items hang fairly often.
Physical machine:
- Windows 10 Pro | Version 20H2 | OS Build 19042.1706 | Windows Feature Experience Pack 120.2212.4170.0
- Python 3.11.1
- C code compiled to .exe (Visual Studio is installed on the physical machine only, so it is possible that Visual Studio is intervening somehow. I think this is unlikely, because the executable binary I am using is identical between virtual and physical machines.)
Virtual machine:
- Windows 10 Enterprise LTSC | Version 1809 | OS Build 17763.3887
- Python 3.11.1
- C code compiled to .exe
- VMWare Workstation 15 Pro | Version 15.5.7 build-17171714
I could potentially try getting all tree items, but that seems like a sort of hacky workaround. I need to be able to leave this program running unattended for days at a time, so I would like to chase down all intermittent issues to prevent them from causing problems down the road. If I have to do an automated reconnection, I do not want it to hang.
I did some more testing with the link to V5.4.3 that you provided. It looks like I am continuing to experience the same issue on my virtual machine even when using RoboDK 5.4.3. However, I noticed that the version of RoboDK on my physical machine is the build from 2022-05-26, while the link to the installer you shared seems to install the build from 2022-07-24. I have attached screenshots of the about for each version.
It looks like between those two versions, there was some change to the API. (2022-07-13: "Improved API") I suspect that this change may affect the instability that I am seeing. Do you still have a way to build the V5.4.3 version from 2022-05-26 that is working for me? I would like to attempt to install it on my virtual machine to attempt to narrow-down the issue.
This was causing too much trouble, so I decided to go back and write more stuff in C to avoid using virtual machines altogether. I am going to go ahead and mark this as resolved (even though the issue is still lurking out there somewhere). Feel free to reopen it, but I am out of time to commit to investigating the problem.
This refactor totally fixed my issue, but I am a little nervous that whatever underlying problem exists is still out there. If you end up finding a smoking-gun at a later date, please let me know.
(Also, when doing the refactor, I found and opened an unrelated issue on the GitHub -- should be an easy fix.)
I'm sorry for the late reply. @Phillip and I are still chasing this issue.
We started implementing a function called RoboDK_getItemList which allows you to build the tree. For some reason it is failing to properly retrieve the item pointers. This function is available with the latest version of the API but still not working. Once we find the issue you should be able to retrieve all the tree information like this (or with your custom call):
Code:
// Test retrieving all items at once
#define MAX_ITEMS 1000
struct Item_t itemlist[MAX_ITEMS];
int size_out = 0;
RoboDK_getItemList(&rdk, itemlist, MAX_ITEMS, &size_out);
if (size_out > MAX_ITEMS){
fprintf(stderr, "Warning! Max size item size exceeded\n");
}
size_out = min(size_out, MAX_ITEMS);
printf("Items in the station: %i\n", size_out);
for (int i=0; i<size_out; i++){
char item_name[MAX_STR_LENGTH];
Item_Name(&itemlist[i], item_name);
printf(" %i -> %s\n", i, item_name);
}
Is this something that would work for you? We should fix the bug to make sure it is 100% robust.
Can you confirm if you are able to reproduce this issue in C but not in Python?