IMEDNets: Image-to-Motion Encoder-Decoder Networks

Image-to-Motion Encoder-Decoder Networks

This strand of research focuses on developing deep neural network architectures to enable robots to learn visuomotor skills, specifically translating visual inputs into robot motion trajectories. Early contributions involved using convolutional encoder-decoder networks to robustly map digit images into dynamic movement primitives (DMPs) for robot handwriting tasks, improving generalization by incorporating spatial transformer modules to handle arbitrarily transformed inputs. Other work introduced a specialized trajectory-level loss function to enhance training effectiveness by directly optimizing trajectory similarity rather than abstract parameters, leading to better-quality learned motions. Recent advances have further evolved the approach by employing recurrent neural architectures capable of predicting complex interactions, such as human–robot or robot–robot handover tasks, using simulation-based training data augmentation to achieve strong real-world performance even with minimal calibration. This body of work collectively advances the capability of robots to flexibly and reliably translate visual information into precise motor actions in diverse environments.

Roles

Aug 1 2019 - Oct 31 2019: Visiting Researcher @ ATR & Nov 1 2019 - Feb 7 2020: Senior Assistant | Postdoc. @ JSI

May 1 2018 - Apr 30 2019: Guest Researcher | Postdoc. @ ATR

Awards

Videos

IEEE TNNLS 2024: Simulation-aided handover prediction from video using recurrent image-to-motion networks

Video: Matija Mavsar’s YouTube Channel. Credit: Matija Mavsar, Jožef Stefan Institute.

ICRA 2019: Learning to Write Anywhere with Spatial Transformer Image-to-Motion Encoder-Decoder Networks

Video: Barry Ridge’s YouTube Channel. Credit: Barry Ridge, Rok Pahič, Jožef Stefan Institute.

Publications

Simulation-Aided Handover Prediction From Video Using Recurrent Image-to-Motion Networks. IEEE Transactions on Neural Networks and Learning Systems, 2024.
Training of Deep Neural Networks for the Generation of Dynamic Movement Primitives. Neural Networks, 2020.
Learning to Write Anywhere with Spatial Transformer Image-to-Motion Encoder-Decoder Networks. 2019 International Conference on Robotics and Automation (ICRA), 2019.
Convolutional Encoder-Decoder Networks for Robust Image-to-Motion Prediction. Proceedings of the 28th International Conference on Robotics in Alpe-Adria-Danube Region (RAAD 2019), 2019.