TY - GEN
T1 - Marker-Free Multi-Modal Motion Capture for 6-DoF Object Position and Orientation Estimation
AU - JIA, Fuhua
AU - Yang, Xiaoying
AU - Wang, Jiamin
AU - Xue, Ning
AU - Li, Jiawei
AU - Cui, Tianxiang
N1 - Publisher Copyright:
© 2025 IEEE.
PY - 2025
Y1 - 2025
N2 - In this work, we present a novel multi-modal, end-to-end, marker-free motion capture framework designed to estimate the six degrees of freedom (6-DoF) states of objects. Traditional motion capture systems often rely on infrared optical, inertial, or magnetic markers to identify and track objects. However, in many application scenarios, such as outdoor environments and robotics development, the use of markers interferes with system operation, and the markers themselves are prone to environmental interference. Our proposed framework tackles these challenges using a two-stage approach, leveraging multimodal sensor fusion techniques. The framework integrates cameras and Light Detection and Ranging (Lidar) sensors around the workspace, each operating at different frequencies. A data synchronizer controls the triggering of these sensors, ensuring synchronized data collection from multiple sensor streams. In stage I, the framework focuses on multimodal feature extraction, utilizing multiple modules to process the sensor data streams and extract spatial features. In stage II, the position and pose extraction module calculates the spatial state of the object by combining the extracted features with the object's spatial state context from previous frames. We validate the framework through experiments on the Nvidia ISAAC digital twin platform and in real-world environments, demonstrating its feasibility and robustness across a variety of test objects. This approach provides a reliable and flexible solution for motion capture in complex environments, eliminating the need for invasive markers. High real-time performance is achieved at each stage and within each submodule by using lightweight neural networks and time-aligned data synchronization. By integrating multimodal sensor fusion and context-based spatial state computation, the proposed method ensures high recognition accuracy, even in challenging symmetrical objects cases.
AB - In this work, we present a novel multi-modal, end-to-end, marker-free motion capture framework designed to estimate the six degrees of freedom (6-DoF) states of objects. Traditional motion capture systems often rely on infrared optical, inertial, or magnetic markers to identify and track objects. However, in many application scenarios, such as outdoor environments and robotics development, the use of markers interferes with system operation, and the markers themselves are prone to environmental interference. Our proposed framework tackles these challenges using a two-stage approach, leveraging multimodal sensor fusion techniques. The framework integrates cameras and Light Detection and Ranging (Lidar) sensors around the workspace, each operating at different frequencies. A data synchronizer controls the triggering of these sensors, ensuring synchronized data collection from multiple sensor streams. In stage I, the framework focuses on multimodal feature extraction, utilizing multiple modules to process the sensor data streams and extract spatial features. In stage II, the position and pose extraction module calculates the spatial state of the object by combining the extracted features with the object's spatial state context from previous frames. We validate the framework through experiments on the Nvidia ISAAC digital twin platform and in real-world environments, demonstrating its feasibility and robustness across a variety of test objects. This approach provides a reliable and flexible solution for motion capture in complex environments, eliminating the need for invasive markers. High real-time performance is achieved at each stage and within each submodule by using lightweight neural networks and time-aligned data synchronization. By integrating multimodal sensor fusion and context-based spatial state computation, the proposed method ensures high recognition accuracy, even in challenging symmetrical objects cases.
KW - 6-DoF estimation
KW - Marker-free motion capture
KW - Sensor fusion
UR - https://www.scopus.com/pages/publications/105012090601
U2 - 10.1109/CISM64958.2025.11060859
DO - 10.1109/CISM64958.2025.11060859
M3 - Conference contribution
AN - SCOPUS:105012090601
T3 - 2025 IEEE Symposium on Computational Intelligence in Image, Signal Processing and Synthetic Media, CISM 2025
BT - 2025 IEEE Symposium on Computational Intelligence in Image, Signal Processing and Synthetic Media, CISM 2025
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2025 IEEE Symposium on Computational Intelligence in Image, Signal Processing and Synthetic Media, CISM 2025
Y2 - 17 March 2025 through 20 March 2025
ER -