Goto

Collaborating Authors

 motion signal


TokenMotion: Decoupled Motion Control via Token Disentanglement for Human-centric Video Generation

arXiv.org Artificial Intelligence

Human-centric motion control in video generation remains a critical challenge, particularly when jointly controlling camera movements and human poses in scenarios like the iconic Grammy Glambot moment. While recent video diffusion models have made significant progress, existing approaches struggle with limited motion representations and inadequate integration of camera and human motion controls. In this work, we present TokenMotion, the first DiT-based video diffusion framework that enables fine-grained control over camera motion, human motion, and their joint interaction. We represent camera trajectories and human poses as spatio-temporal tokens to enable local control granularity. Our approach introduces a unified modeling framework utilizing a decouple-and-fuse strategy, bridged by a human-aware dynamic mask that effectively handles the spatially-and-temporally varying nature of combined motion signals. Through extensive experiments, we demonstrate TokenMotion's effectiveness across both text-to-video and image-to-video paradigms, consistently outperforming current state-of-the-art methods in human-centric motion control tasks. Our work represents a significant advancement in controllable video generation, with particular relevance for creative production applications.


From Motion Signals to Insights: A Unified Framework for Student Behavior Analysis and Feedback in Physical Education Classes

arXiv.org Artificial Intelligence

Analyzing student behavior in educational scenarios is crucial for enhancing teaching quality and student engagement. Existing AI-based models often rely on classroom video footage to identify and analyze student behavior. While these video-based methods can partially capture and analyze student actions, they struggle to accurately track each student's actions in physical education classes, which take place in outdoor, open spaces with diverse activities, and are challenging to generalize to the specialized technical movements involved in these settings. Furthermore, current methods typically lack the ability to integrate specialized pedagogical knowledge, limiting their ability to provide in-depth insights into student behavior and offer feedback for optimizing instructional design. To address these limitations, we propose a unified end-to-end framework that leverages human activity recognition technologies based on motion signals, combined with advanced large language models, to conduct more detailed analyses and feedback of student behavior in physical education classes. Our framework begins with the teacher's instructional designs and the motion signals from students during physical education sessions, ultimately generating automated reports with teaching insights and suggestions for improving both learning and class instructions. This solution provides a motion signal-based approach for analyzing student behavior and optimizing instructional design tailored to physical education classes. Experimental results demonstrate that our framework can accurately identify student behaviors and produce meaningful pedagogical insights.


Cloud and IoT based Smart Agent-driven Simulation of Human Gait for Detecting Muscles Disorder

arXiv.org Artificial Intelligence

Motion disorders pose a significant global health concern and are often managed with pharmacological treatments that may lead to undesirable long-term effects. Current therapeutic strategies lack differentiation between healthy and unhealthy muscles in a patient, necessitating a targeted approach to distinguish between musculature. There is still no motion analyzer application for this purpose. Additionally, there is a deep gap in motion analysis software as some studies prioritize simulation, neglecting software needs, while others concentrate on computational aspects, disregarding simulation nuances. We introduce a comprehensive five-phase methodology to analyze the neuromuscular system of the lower body during gait. The first phase employs an innovative IoT-based method for motion signal capture. The second and third phases involve an agent-driven biomechanical model of the lower body skeleton and a model of human voluntary muscle. Thus, using an agent-driven approach, motion-captured signals can be converted to neural stimuli. The simulation results are then analyzed by our proposed ensemble neural network framework in the fourth step in order to detect abnormal motion in each joint. Finally, the results are shown by a userfriendly graphical interface which promotes the usability of the method. Utilizing the developed application, we simulate the neuromusculoskeletal system of some patients during the gait cycle, enabling the classification of healthy and pathological muscle activity through joint-based analysis. This study leverages cloud computing to create an infrastructure-independent application which is globally accessible. The proposed application enables experts to differentiate between healthy and unhealthy muscles in a patient by simulating his gait.


Neural Dynamics of Motion Segmentation and Grouping

Neural Information Processing Systems

A neural network model of motion segmentation by visual cortex is de(cid:173) scribed. The model clarifies how preprocessing of motion signals by a Motion Oriented Contrast Filter (MOC Filter) is joined to long-range co(cid:173) operative motion mechanisms in a motion Cooperative Competitive Loop (CC Loop) to control phenomena such as as induced motion, motion cap(cid:173) ture, and motion aftereffects. The total model system is a motion Bound(cid:173) ary Contour System (BCS) that is computed in parallel with a static BCS before both systems cooperate to generate a boundary representation for three dimensional visual form perception. The present investigations clari(cid:173) fy how the static BCS can be modified for use in motion segmentation prob(cid:173) lems, notably for analyzing how ambiguous local movements (the aperture problem) on a complex moving shape are suppressed and actively reorga(cid:173) nized into a coherent global motion signal. INTRODUCTION: WHY ARE STATIC AND MOTION BOUNDARY CONTOUR SYSTEMS NEEDED?


Non-Negative Kernel Sparse Coding for the Classification of Motion Data

arXiv.org Machine Learning

We are interested in the decomposition of motion data into a sparse linear combination of base functions which enable efficient data processing. We combine two prominent frameworks: dynamic time warping (DTW), which offers particularly successful pairwise motion data comparison, and sparse coding (SC), which enables an automatic decomposition of vectorial data into a sparse linear combination of base vectors. We enhance SC as follows: an efficient kernelization which extends its application domain to general similarity data such as offered by DTW, and its restriction to non-negative linear representations of signals and base vectors in order to guarantee a meaningful dictionary. Empirical evaluations on motion capture benchmarks show the effectiveness of our framework regarding interpretation and discrimination concerns.


PerceptionNet: A Deep Convolutional Neural Network for Late Sensor Fusion

arXiv.org Machine Learning

Abstract-- Human Activity Recognition (HAR) based on motion sensors has drawn a lot of attention over the last few years, since perceiving the human status enables context-aware applications to adapt their services on users' needs. However, motion sensor fusion and feature extraction have not reached their full potentials, remaining still an open issue. In this paper, we introduce PerceptionNet, a deep Convolutional Neural Network (CNN) that applies a late 2D convolution to multimodal time-series sensor data, in order to extract automatically efficient features for HAR. We evaluate our approach on two public available HAR datasets to demonstrate that the proposed model fuses effectively multimodal sensors and improves the performance of HAR. In particular, PerceptionNet surpasses the performance of state-of-the-art HAR methods based on: (i) features extracted from humans, (ii) deep CNNs exploiting early fusion approaches, and (iii) Long Short-Term Memory (LSTM), by an average accuracy of more than 3%. The proliferation of the Internet of Things (IoT) over the last few years, has contributed to the collection of huge amounts of time-series data. An IoT device with high sampling rates, such as a wearable, produces hundreds of data every second, resulting to a data explosion, considering the vast number of such devices connected over the internet. Through real-time or batch data processing, meaningful information is extracted, revealing daily patterns of individual owners or social groups.


Detection of First and Second Order Motion

Neural Information Processing Systems

A model of motion detection is presented. The model contains three stages. The first stage is unoriented and is selective for contrast polarities. The next two stages work in parallel. A phase insensitive stage pools across different contrast polarities through a spatiotemporal filter and thus can detect first and second order motion.


Detection of First and Second Order Motion

Neural Information Processing Systems

A model of motion detection is presented. The model contains three stages. The first stage is unoriented and is selective for contrast polarities. The next two stages work in parallel. A phase insensitive stage pools across different contrast polarities through a spatiotemporal filter and thus can detect first and second order motion.


Detection of First and Second Order Motion

Neural Information Processing Systems

A model of motion detection is presented. The model contains three stages. The first stage is unoriented and is selective for contrast polarities.The next two stages work in parallel. A phase insensitive stage pools across different contrast polarities through a spatiotemporal filter and thus can detect first and second order motion. A phase sensitive stage keeps contrast polarities separate, each of which is filtered through a spatiotemporal filter, and thus only first order motion can be detected.


Neural Dynamics of Motion Segmentation and Grouping

Neural Information Processing Systems

A neural network model of motion segmentation by visual cortex is described. The model clarifies how preprocessing of motion signals by a Motion Oriented Contrast Filter (MOC Filter) is joined to long-range cooperative motion mechanisms in a motion Cooperative Competitive Loop (CC Loop) to control phenomena such as as induced motion, motion capture, and motion aftereffects. The total model system is a motion Boundary Contour System (BCS) that is computed in parallel with a static BCS before both systems cooperate to generate a boundary representation for three dimensional visual form perception. The present investigations clarify how the static BCS can be modified for use in motion segmentation problems, notably for analyzing how ambiguous local movements (the aperture problem) on a complex moving shape are suppressed and actively reorganized into a coherent global motion signal. 1 INTRODUCTION: WHY ARE STATIC AND MOTION BOUNDARY CONTOUR SYSTEMS NEEDED? Some regions, notably MT, of visual cortex are specialized for motion processing. However, even the earliest stages of visual cortex processing, such as simple cells in VI, require stimuli that change through time for their maximal activation and are direction-sensitive. Why has evolution generated regions such as MT, when even VI is change-sensitive and direction-sensitive? What computational properties are achieved by MT that are not already available in VI?