Login at RoboDK

cso-mark · 06-06-2023, 09:28 PM

We have a couple of objects in the scene (loaded from obj / stl files), say, a Coke can and a pen. We can already take a simulated RGB image of these objects. However, we would also like to simulate a semantic segmentation. Specifically, this is an image corresponding to the RGB image where every pixel black, red, or green. The pixel should be red if it corresponds to the Coke can, and green if it corresponds to the pen, and black otherwise. I have attached a picture demonstrating the type of output I'm aiming for.

The dumb way to do this is to set visibility=False for every item in the scene, except the Coke can, and then take an image. Repeat for the pen. Then binarize both images and compost them on top of each other. But I'm wondering if RoboDK has some existing functionality to accomplish this?

**Albert** · 06-07-2023, 06:41 AM

I understand you achieved the semantic segmentation and you want to speed it up?

Can you share the sample project or image you are referring to?

cso-mark · 06-08-2023, 05:04 PM

Sorry it seem like the uploaded image did not attach. See the image labeled RGB mask https://raw.githubusercontent.com/dmlc/w.../masks.png for the type of output we are trying to achieve.

Unfortunately I cannot share the project.

We have a real neural net that can generate these sorts of outputs on real image data. However our RoboDK scene is not photo-realistic so we cannot run the neural net in simulation. However, we do have "ground truth" poses of all the objects during simulation, one can in principle generate the semantic simulation by rendering each object as a pure color into the simulated RoboDK camera.

**Sam** · 06-13-2023, 12:44 PM

Doing each individual object won't work if you have overlaps between objects.

Here's my suggestion:
Set your object's color (all surfaces) to the desired segmentation color, you can do this with the API or manually.
Take a screenshot with a simulated camera, make sure to use the proper intrinsic parameters. The image can be retrieved with the API.
Using OpenCV, use color segmentation and binary thresholds to create masks of each color.
Merge the masks into one final image.

We used this principle to train a model in synthetic environment based on ground truth, but we only needed binary masks.

Simulating Semantic Segmentation