Goto

Collaborating Authors

 Industry


DISCOVER: Automated Curricula for Sparse-Reward Reinforcement Learning

Neural Information Processing Systems

Sparse-reward reinforcement learning (RL) can model a wide range of highly complex tasks. Solving sparse-reward tasks is RL's core premise--requiring efficient exploration coupled with long-horizon credit assignment--and overcoming these challenges is key for building self-improving agents with superhuman ability. Prior work commonly explores with the objective of solving many sparse-reward tasks, making exploration of individual high-dimensional, long-horizon tasks intractable. We argue that solving such challenging tasks requires solving simpler tasks that are relevant to the target task, i.e., whose achieval will teach the agent skills required for solving the target task. We demonstrate that this sense of direction, necessary for effective exploration, can be extracted from existing RL algorithms, without leveraging any prior information. To this end, we propose a method for directed sparse-reward goal-conditioned very long-horizon RL (DISCOVER), which selects exploratory goals in the direction of the target task. We connect DISCOVER to principled exploration in bandits, formally bounding the time until the target task becomes achievable in terms of the agent's initial distance to the target, but independent of the volume of the space of all tasks. We then perform a thorough evaluation in high-dimensional environments. We find that the directed goal selection of DISCOVER solves exploration problems that are beyond the reach of prior state-of-the-art exploration methods in RL.


'Positive' or 'unnecessary'? - UK teens on social media ban

BBC News

School children in Preston and Manchester had mixed feelings about a proposed social media ban for under-16s following an announcement from Prime Minister Sir Keir Starmer. On Monday, Starmer said under-16s will be banned from social media platforms such as Snapchat, TikTok, YouTube, Instagram, Facebook and X by spring 2027. Speaking to the BBC, some pupils described the ban as unnecessary as they asked for more responsibility for parents. One student said she hoped the ban will have a positive impact on young people's lives and their mental health. How much screen time is too much for under fives?


OmniTalker: One-shot Real-time Text-Driven Talking Audio-Video Generation With Multimodal Style Mimicking

Neural Information Processing Systems

Although significant progress has been made in audio-driven talking head generation, text-driven methods remain underexplored. In this work, we present OmniTalker, a unified framework that jointly generates synchronized talking audiovideo content from input text while emulating the target identity's speaking and facial movement styles, including speech characteristics, head motion, and facial dynamics. Our framework adopts a dual-branch diffusion transformer (DiT) architecture, with one branch dedicated to audio generation and the other to video synthesis. At the shallow layers, cross-modal fusion modules are introduced to integrate information between the two modalities. In deeper layers, each modality is processed independently, with the generated audio decoded by a vocoder and the video rendered using a GAN-based high-quality visual renderer. Leveraging DiT's in-context learning capability through a masked-infilling strategy, our model can simultaneously capture both audio and visual styles without requiring explicit style extraction modules. Thanks to the efficiency of the DiT backbone and the optimized visual renderer, OmniTalker achieves real-time inference at 25 FPS. To the best of our knowledge, OmniTalker is the first one-shot framework capable of jointly modeling speech and facial styles in real time. Extensive experiments demonstrate its superiority over existing methods in terms of generation quality, particularly in preserving style consistency and ensuring precise audio-video synchronization, all while maintaining efficient inference.


Approximate Domain Unlearning for Vision-Language Models

Neural Information Processing Systems

Pre-trained Vision-Language Models (VLMs) exhibit strong generalization capabilities, enabling them to recognize a wide range of objects across diverse domains without additional training. However, they often retain irrelevant information beyond the requirements of specific target downstream tasks, raising concerns about computational efficiency and potential information leakage. This has motivated growing interest in approximate unlearning, which aims to selectively remove unnecessary knowledge while preserving overall model performance. Existing approaches to approximate unlearning have primarily focused on class unlearning, where a VLM is retrained to fail to recognize specified object classes while maintaining accuracy for others. However, merely forgetting object classes is often insufficient in practical applications.


TrajMamba: An Efficient and Semantic-rich Vehicle Trajectory Pre-training Model

Neural Information Processing Systems

Vehicle GPS trajectories record how vehicles move over time, storing valuable travel semantics, including movement patterns and travel purposes. Learning travel semantics effectively and efficiently is crucial for real-world applications of trajectory data, which is hindered by two major challenges. First, travel purposes are tied to the functions of the roads and points-of-interest (POIs) involved in a trip. Such information is encoded in textual addresses and descriptions and introduces heavy computational burden to modeling. Second, real-world trajectories often contain redundant points, which harm both computational efficiency and trajectory embedding quality.


Intel wants cheap Windows laptops to stop feeling cheap

PCWorld

PCWorld reports on Intel's Project Firefly initiative, which aims to bring premium laptop features like all-metal construction and fanless design to budget-friendly devices. The project centers on Intel's new Core Series 3 'Wildcat Lake' processor, engineered with cost-reduction technologies and simplified motherboard designs to make laptops more affordable. Major manufacturers including Dell, HP, Lenovo, Acer, and Asus will ship these reimagined mainstream laptops targeting students and small businesses. People everywhere are talking about Apple's cheaper MacBook Neo laptop. Now Windows is preparing to retake the mainstream laptop market with Project Firefly, inspired by smartphone design.


Jumping spiders inspire wildly efficient 3D camera

Popular Science

The arachnids have multiple layers of retinas in each eye. More information Adding us as a Preferred Source in Google by using this link indicates that you would like to see more of our content in Google News results. Breakthroughs, discoveries, and DIY tips sent six days a week. By signing up, you confirm you are 16+, will receive newsletters and promotional content and agree to our Terms of Use and acknowledge the data practices in our Privacy Policy . Technological advancements sometimes feel like a perpetual fight against nature itself, with scientists working out how to do things faster, more powerfully, and more efficiently.


Overleaf Example

Neural Information Processing Systems

This section outlines the design and evaluation of distractor choices in our VQA dataset, which play a critical role in determining question difficulty and diagnostic value. We begin by examining the impact of introducing a "None of the Above" (NAB%) option, which systematically increases task ambiguity and reduces model performance across the board (Figure 1). We then detail the principles and heuristics used to generate diverse and context-aware distractors for different question types. These include binary negations, categorical sampling, spatial reasoning perturbations, and contentaware language distractors. Special emphasis is placed on generating plausible incorrect choices that reflect partial knowledge, ambiguity, or visually confusable elements. Finally, we describe how randomized shuffling and probabilistic replacement with NAB options further strengthen the challenge by discouraging rote pattern matching. Together, these strategies enhance the dataset's ability to probe fine-grained reasoning, visual grounding, and robustness to uncertainty in large vision-language models.


Overleaf Example

Neural Information Processing Systems

Vision-Language Models (VLMs) acquire real-world knowledge and general reasoning ability through Internet-scale image-text corpora. They can augment robotic systems with scene understanding and task planning, and assist visuomotor policies that are trained on robot trajectory data. We explore the reverse paradigm -- using rich, real, multi-modal robot trajectory data to enhance and evaluate VLMs.


ImageSentinel: Protecting Visual Datasets from Unauthorized Retrieval-Augmented Image Generation

Neural Information Processing Systems

The widespread adoption of Retrieval-Augmented Image Generation (RAIG) has raised significant concerns about the unauthorized use of private image datasets. While these systems have shown remarkable capabilities in enhancing generation quality through reference images, protecting visual datasets from unauthorized use in such systems remains a challenging problem. Traditional digital watermarking approaches face limitations in RAIG systems, as the complex feature extraction and recombination processes fail to preserve watermark signals during generation. To address these challenges, we propose ImageSentinel, a novel framework for protecting visual datasets in RAIG. Our framework synthesizes sentinel images that maintain visual consistency with the original dataset. These sentinels enable protection verification through randomly generated character sequences that serve as retrieval keys. To ensure seamless integration, we leverage vision-language models to generate the sentinel images. Experimental results demonstrate that ImageSentinel effectively detects unauthorized dataset usage while preserving generation quality for authorized applications.