IMEDNets: Image-to-Motion Encoder-Decoder Networks | barr.ai

This strand of research focuses on developing deep neural network architectures to enable robots to learn visuomotor skills, specifically translating visual inputs into robot motion trajectories. Early contributions involved using convolutional encoder-decoder networks to robustly map digit images into dynamic movement primitives (DMPs) for robot handwriting tasks and improving generalization by incorporating spatial transformer modules to handle arbitrarily transformed inputs. Some of the core research work introduced a specialized trajectory-level loss function to enhance training effectiveness by directly optimizing trajectory similarity rather than abstract parameters, leading to better-quality learned motions. Recent advances have further evolved the approach by employing recurrent neural architectures capable of predicting complex interactions, such as human–robot or robot–robot handover tasks, using simulation-based training data augmentation to achieve strong real-world performance even with minimal calibration. This body of work collectively advances the capability of robots to flexibly and reliably translate visual information into precise motor actions in diverse environments.

Extended prior work on image-to-motion encoder-decoder deep neural network architectures to handle sequential data in human-robot interaction scenarios, work that ultimately led to the development of the RIMEDNet (recurrent image-to-motion encoder-decoder network) model for use in handover prediction tasks.

Designed novel STIMEDNet (spatial transformer image-to-motion encoder-decoder network) architecture allowing a robot to learn hand-writing trajectories from images of digits in different poses.
Extended prior IMEDNet (image-to-motion encoder-decoder network) architecture to use convolutional input layers, demonstrating improved performance with the novel CIMEDNet (convolutional image-to-motion encoder-decoder network) model.

Video: Matija Mavsar’s YouTube Channel. Credit: Matija Mavsar, Jožef Stefan Institute.

Video: Barry Ridge’s YouTube Channel. Credit: Barry Ridge, Rok Pahič, Jožef Stefan Institute.

Matija Mavsar, Barry Ridge, Rok Pahič, Jun Morimoto, Aleš Ude. Simulation-Aided Handover Prediction From Video Using Recurrent Image-to-Motion Networks. IEEE Transactions on Neural Networks and Learning Systems, 2024.

Rok Pahič, Barry Ridge, Andrej Gams, Jun Morimoto, Aleš Ude. Training of Deep Neural Networks for the Generation of Dynamic Movement Primitives. Neural Networks, 2020.

Barry Ridge, Rok Pahič, Aleš Ude, Jun Morimoto. Learning to Write Anywhere with Spatial Transformer Image-to-Motion Encoder-Decoder Networks. 2019 International Conference on Robotics and Automation (ICRA), 2019.

Barry Ridge, Rok Pahič, Aleš Ude, Jun Morimoto. Convolutional Encoder-Decoder Networks for Robust Image-to-Motion Prediction. Proceedings of the 28th International Conference on Robotics in Alpe-Adria-Danube Region (RAAD 2019), 2019.