Goto

Collaborating Authors

 Asia



NeMF: Neural Motion Fields for Kinematic Animation

Neural Information Processing Systems

We present an implicit neural representation to learn the spatio-temporal space of kinematic motions. Unlike previous work that represents motion as discrete sequential samples, we propose to express the vast motion space as a continuous function over time, hence the name Neural Motion Fields (NeMF). Specifically, we use a neural network to learn this function for miscellaneous sets of motions, which is designed to be a generative model conditioned on a temporal coordinate t and a random vector z for controlling the style. The model is then trained as a Variational Autoencoder (VAE) with motion encoders to sample the latent space. We train our model with a diverse human motion dataset and quadruped dataset to prove its versatility, and finally deploy it as a generic motion prior to solve task-agnostic problems and show its superiority in different motion generation and editing applications, such as motion interpolation, in-betweening, and re-navigating. More details can be found on our project page: https://cs.yale.edu/homes/




1b115b1feab2198dd0881c57b869ddb7-Supplemental-Conference.pdf

Neural Information Processing Systems

In order to expand the polynomial surface fitting in 3D dimensional space into the high dimensional feature space using a neural network with parameter ฮ˜, we define f1(gฯ‰):= g and f2(cฯ…):= c, where f means MLP layer. Then, the multiplication of real numbers gฯ‰ cฯ… in the polynomial function is represented as g c, i.e., gฯ‰ cฯ…:= g c, and the orders ฯ‰,ฯ… [0,1,...,ฯ„]. Then, the final bivariate function used in our hyper surface fitting is Nฮธ,ฯ„(G,C) = ฮ˜(G C), where Gand C are high dimensional features of the 3D point clouds extracted by the two different modules, which are introduced in Sec.3.3 and Sec.3.4 of the paper, respectively. The other terms except the principal terms in the polynomial equation are not used in the estimation of the normal. Based on this, we use the max-pooling over all features from the hyper surface fitting 2 Figure 1: Visualization of the contribution of each 3D point to estimate the normal of the query point (black).



3DPose Transfer with Correspondence Learning and Mesh Refinement

Neural Information Processing Systems

It aims to transfer the pose of a source mesh to a target mesh and keep the identity (e.g., body shape) of the target mesh. Some previous works require key point annotations to build reliable correspondence between the source and target meshes, while other methods do not consider any shape correspondence between sources and targets, which leads to limited generation quality. In this work, we propose a correspondence-refinement network to achieve the 3D pose transfer for both human and animal meshes. The correspondence between source and target meshes is first established by solving an optimal transport problem. Then, we warp the source mesh according to the dense correspondence and obtain a coarse warped mesh. The warped mesh will be better refined with our proposed Elastic Instance Normalization, which is a conditional normalization layer and can help to generate highquality meshes. Extensive experimental results show that the proposed architecture can effectively transfer the poses from source to target meshes and produce better results with satisfied visual performance than state-of-the-art methods.



Three reasons why DeepSeek's new model matters

MIT Technology Review

The long-awaited V4 is more efficient and a win for Chinese chipmakers. On Friday, Chinese AI firm DeepSeek released a preview of V4, its long-awaited new flagship model. Notably, the model can process much longer prompts than its last generation, thanks to a new design that helps it handle large amounts of text more efficiently. Like DeepSeek's previous models, V4 is open source, meaning it is available for anyone to download, use, and modify. V4 marks DeepSeek's most significant release since R1, the reasoning model it launched in January 2025. R1, which was trained on limited computing resources, stunned the global AI industry with its strong performance and efficiency, turning DeepSeek from a little-known research team into China's best-known AI company almost overnight.


Gender: FemaleAge: YoungHair Color: BlondeSkin: WhiteEmotion: SeriousBeard: NoMakeup: No

Neural Information Processing Systems

Machine learning models can frequently produce systematic errors on critical subsets (or slices) of data that share common attributes. Discovering and explaining such model bugs is crucial for reliable model deployment. However, existing bug discovery and interpretation methods usually involve heavy human intervention and annotation, which can be cumbersome and have low bug coverage. In this paper, we propose HiBug, an automated framework for interpretable model debugging. Our approach utilizes large pre-trained models, such as chatGPT, to suggest human-understandable attributes that are related to the targeted computer vision tasks. By leveraging pre-trained vision-language models, we can efficiently identify common visual attributes of underperforming data slices using humanunderstandable terms. This enables us to uncover rare cases in the training data, identify spurious correlations in the model, and use the interpretable debug results to select or generate new training data for model improvement. Experimental results demonstrate the efficacy of the HiBug framework. Code is available at: https://github.com/cure-lab/HiBug.