Plotting

FineMoGen: Fine-Grained Spatio-Temporal Motion Generation and Editing

Neural Information Processing Systems

Text-driven motion generation has achieved substantial progress with the emergence of diffusion models. However, existing methods still struggle to generate complex motion sequences that correspond to fine-grained descriptions, depicting detailed and accurate spatio-temporal actions. This lack of fine controllability limits the usage of motion generation to a larger audience. To tackle these challenges, we present FineMoGen, a diffusion-based motion generation and editing framework that can synthesize fine-grained motions, with spatial-temporal composition to the user instructions. Specifically, FineMoGen builds upon diffusion model with a novel transformer architecture dubbed Spatio-Temporal Mixture Attention (SAMI). SAMI optimizes the generation of the global attention template from two perspectives: 1) explicitly modeling the constraints of spatio-temporal composition; and 2) utilizing sparsely-activated mixture-of-experts to adaptively extract fine-grained features. To facilitate a large-scale study on this new fine-grained motion generation task, we contribute the HuMMan-MoGen dataset, which consists of 2,968 videos and 102,336 fine-grained spatio-temporal descriptions. Extensive experiments validate that FineMoGen exhibits superior motion generation quality over state-of-the-art methods. Notably, FineMoGen further enables zero-shot motion editing capabilities with the aid of modern large language models (LLM), which faithfully manipulates motion sequences with fine-grained instructions.


Highlights From Starships Test Flight 9: Everything That Happened in 17 Minutes

Mashable

Highlights From Starship's Test Flight 9: Everything That Happened in 17 Minutes Mashable Tech Science Life Social Good Entertainment Deals Shopping Games Search Cancel * * Search Result Tech Apps & Software Artificial Intelligence Cybersecurity Cryptocurrency Mobile Smart Home Social Media Tech Industry Transportation All Tech Science Space Climate Change Environment All Science Life Digital Culture Family & Parenting Health & Wellness Sex, Dating & Relationships Sleep Careers Mental Health All Life Social Good Activism Gender LGBTQ Racial Justice Sustainability Politics All Social Good Entertainment Games Movies Podcasts TV Shows Watch Guides All Entertainment SHOP THE BEST Laptops Budget Laptops Dating Apps Sexting Apps Hookup Apps VPNs Robot Vaccuums Robot Vaccum & Mop Headphones Speakers Kindles Gift Guides Mashable Choice Mashable Selects All Sex, Dating & Relationships All Laptops All Headphones All Robot Vacuums All VPN All Shopping Games Product Reviews Adult Friend Finder Bumble Premium Tinder Platinum Kindle Paperwhite PS5 vs PS5 Slim All Reviews All Shopping Deals Newsletters VIDEOS Mashable Shows All Videos Home Science Space Highlights From Starship's Test Flight 9: Everything That Happened in 17 Minutes Starship Test Flight 9 ends with "confirmation that the booster did demise." By Mashable Video on May 28, 2025 Share on Facebook Share on Twitter Share on Flipboard Watch Next Qualcomm's 2025 Computex Highlights: Everything Announced in 20 Minutes 20:09 Everything Announced at AMD's 2025 Computex Keynote in 19 Minutes 19:31 Everything Revealed at Nvidia's 2025 Computex Press Conference in 19 Minutes 19:55 Microsoft Build 2025 keynote: Everything announced, in 14 minutes 14:43 SpaceX conducted its ninth test flight of the Starship Launch Vehicle atop a Falcon Heavy booster from Starbase, Texas. See all the highlights from the test launch. Topics SpaceX Elon Musk Rocket Launches Latest Videos'Good Fortune' trailer: Keanu Reeves plays a guardian angel in Aziz Ansari's directorial debut Keanu Reeves as an angel? And, well... 05/24/2025 By Leah Stodart Say More: R.L. Stine on'Fear Street: Prom Queen' and Matt Wolf on'Pee-wee as Himself' From teen-defining terror to a childhood icon, '80s nostalgia thrives.



Beyond Slow Signs in High-fidelity Model Extraction Hanna Foerster and Robert Mullins Ilia Shumailov and Jamie Hayes University of Cambridge

Neural Information Processing Systems

Deep neural networks, costly to train and rich in intellectual property value, are increasingly threatened by model extraction attacks that compromise their confidentiality. Previous attacks have succeeded in reverse-engineering model parameters up to a precision of float64 for models trained on random data with at most three hidden layers using cryptanalytical techniques. However, the process was identified to be very time consuming and not feasible for larger and deeper models trained on standard benchmarks. Our study evaluates the feasibility of parameter extraction methods of Carlini et al. [1] further enhanced by Canales-Martรญnez et al. [2] for models trained on standard benchmarks. We introduce a unified codebase that integrates previous methods and reveal that computational tools can significantly influence performance. We develop further optimisations to the end-to-end attack and improve the efficiency of extracting weight signs by up to 14.8 times compared to former methods through the identification of easier and harder to extract neurons. Contrary to prior assumptions, we identify extraction of weights, not extraction of weight signs, as the critical bottleneck. With our improvements, a 16,721 parameter model with 2 hidden layers trained on MNIST is extracted within only 98 minutes compared to at least 150 minutes previously. Finally, addressing methodological deficiencies observed in previous studies, we propose new ways of robust benchmarking for future model extraction attacks.


ShareGPT4Video: Improving Video Understanding and Generation with Better Captions 2,4

Neural Information Processing Systems

We present the ShareGPT4Video series, aiming to facilitate the video understanding of large video-language models (LVLMs) and the video generation of text-to-video models (T2VMs) via dense and precise captions. To achieve this, taking aside the non-scalable costly human annotators, we find using GPT4V to caption video with a naive multi-frame or frame-concatenation input strategy leads to less detailed and sometimes temporal-confused results. We argue the challenge of designing a high-quality video captioning strategy lies in three aspects: 1) Inter-frame precise temporal change understanding.


Revisiting Adversarial Training for ImageNet: Architectures, Training and Generalization across Threat Models

Neural Information Processing Systems

While adversarial training has been extensively studied for ResNet architectures and low resolution datasets like CIFAR-10, much less is known for ImageNet. Given the recent debate about whether transformers are more robust than convnets, we revisit adversarial training on ImageNet comparing ViTs and ConvNeXts. Extensive experiments show that minor changes in architecture, most notably replacing PatchStem with ConvStem, and training scheme have a significant impact on the achieved robustness.


OpenAI explores sign in with ChatGPT for other apps

Mashable

You may soon be able to sign in to third party apps using ChatGPT -- but it probably won't be for a while yet. OpenAI recently shared a "Sign in with ChatGPT" interest form on its website, targeting developers who may be interested in the capability. "OpenAI is exploring ways for users to sign into third-party apps using their ChatGPT accounts," reads the page. "We're looking for developers interested in integrating this capability into their own apps." A preview of the experience is linked, along with a short form for interested developers to fill out.


Explaining V1 Properties with a Biologically Constrained Deep Learning Architecture

Neural Information Processing Systems

Convolutional neural networks (CNNs) have recently emerged as promising models of the ventral visual stream, despite their lack of biological specificity. While current state-of-the-art models of the primary visual cortex (V1) have surfaced from training with adversarial examples and extensively augmented data, these models are still unable to explain key neural properties observed in V1 that arise from biological circuitry. To address this gap, we systematically incorporated neurosciencederived architectural components into CNNs to identify a set of mechanisms and architectures that more comprehensively explain V1 activity. Upon enhancing task-driven CNNs with architectural components that simulate center-surround antagonism, local receptive fields, tuned normalization, and cortical magnification, we uncover models with latent representations that yield state-of-the-art explanation of V1 neural activity and tuning properties. Moreover, analyses of the learned parameters of these components and stimuli that maximally activate neurons of the evaluated networks provide support for their role in explaining neural properties of V1. Our results highlight an important advancement in the field of NeuroAI, as we systematically establish a set of architectural components that contribute to unprecedented explanation of V1. The neuroscience insights that could be gleaned from increasingly accurate in-silico models of the brain have the potential to greatly advance the fields of both neuroscience and artificial intelligence.


An Expectation-Maximization Algorithm for Training Clean Diffusion Models from Corrupted Observations Weimin Bai Yifei Wang 4 Wenzheng Chen 5,6 He Sun

Neural Information Processing Systems

Diffusion models excel in solving imaging inverse problems due to their ability to model complex image priors. However, their reliance on large, clean datasets for training limits their practical use where clean data is scarce. In this paper, we propose EMDiffusion, an expectation-maximization (EM) approach to train diffusion models from corrupted observations. Our method alternates between reconstructing clean images from corrupted data using a known diffusion model (Estep) and refining diffusion model weights based on these reconstructions (M-step). This iterative process leads the learned diffusion model to gradually converge to a local optimum, that is, to approximate the true clean data distribution.


Robust Neural Contextual Bandit against Adversarial Corruptions

Neural Information Processing Systems

Contextual bandit algorithms aim to identify the optimal arm with the highest reward among a set of candidates, based on the accessible contextual information. Among these algorithms, neural contextual bandit methods have shown generally superior performances against linear and kernel ones, due to the representation power of neural networks. However, similar to other neural network applications, neural bandit algorithms can be vulnerable to adversarial attacks or corruptions on the received labels (i.e., arm rewards), which can lead to unexpected performance degradation without proper treatments. As a result, it is necessary to improve the robustness of neural bandit models against potential reward corruptions. In this work, we propose a novel neural contextual bandit algorithm named R-NeuralUCB, which utilizes a novel context-aware Gradient Descent (GD) training strategy to improve the robustness against adversarial reward corruptions. Under over-parameterized neural network settings, we provide regret analysis for R-NeuralUCB to quantify reward corruption impacts, without the commonly adopted arm separateness assumption in existing neural bandit works. We also conduct experiments against baselines on real data sets under different scenarios, in order to demonstrate the effectiveness of our proposed R-NeuralUCB.