Goto

Collaborating Authors

 Technology


usually such(X estimatean

Neural Information Processing Systems

The go-to strategy to apply deep networks in settings where uncertainty informs decisions--ensembling multiple training runs with random initializations--is illsuited for the extremely large-scale models and practical fine-tuning workflows of today. We introduce a new cost-effective strategy for improving the uncertainty quantification and downstream decisions of a large model (e.g. a fine-tuned ViTB): coupling it with a less accurate but much smaller "sidekick" (e.g. a fine-tuned ResNet-34) with a fraction of the computational cost. We propose aggregating the predictions of this Asymmetric Duo by simple learned weighted averaging. Surprisingly, despite their inherent asymmetry, the sidekick model almost never harms the performance of the larger model. In fact, across five image classification benchmarks and a variety of model architectures and training schemes (including soups), Asymmetric Duos significantly improve accuracy, uncertainty quantification, and selective classification metrics with only 10 20%more computation.


ADifferential and Pointwise Control Approach to Reinforcement Learning

Neural Information Processing Systems

Reinforcement learning (RL) in continuous state-action spaces remains challenging in scientific computing due to poor sample efficiency and lack of pathwise physical consistency. We introduce Differential Reinforcement Learning (Differential RL), a novel framework that reformulates RL from a continuous-time control perspective via a differential dual formulation. This induces a Hamiltonian structure that embeds physics priors and ensures consistent trajectories without requiring explicit constraints. To implement Differential RL, we develop Differential Policy Optimization (dfPO), a pointwise, stage-wise algorithm that refines local movement operators along the trajectory for improved sample efficiency and dynamic alignment. We establish pointwise convergence guarantees, a property not available in standard RL, and derive a competitive theoretical regret bound of O(K5/6). Empirically, dfPO outperforms standard RL baselines on representative scientific computing tasks, including surface modeling, grid control, and molecular dynamics, under low-data and physics-constrained conditions.


Addressing Mark Imbalance in Integration-free Neural Marked Temporal Point Processes

Neural Information Processing Systems

Marked Temporal Point Process (MTPP) has been well studied to model the event distribution in marked event streams, which can be used to predict the mark and arrival time of the next event. However, existing studies overlook that the distribution of event marks is highly imbalanced in many real-world applications, with some marks being frequent but others rare. The imbalance poses a significant challenge to the performance of the next event prediction, especially for events of rare marks. To address this issue, we propose a thresholding method, which learns thresholds to tune the mark probability normalized by the mark's prior probability to optimize mark prediction, rather than predicting the mark directly based on the mark probability as in existing studies. In conjunction with this method, we predict the mark first and then the time. In particular, we develop a novel neural MTPP model to support effective time sampling and estimation of mark probability without computationally expensive numerical improper integration. Extensive experiments on real-world datasets demonstrate the superior performance of our solution against various baselines for the next event mark and time prediction.


Inference-Time Reward Hacking in Large Language Models

Neural Information Processing Systems

A common paradigm to improve the performance of large language models is optimizing for a reward model. Reward models assign a numerical score to an LLM's output that indicates, for example, how likely it is to align with user preferences or safety goals. However, reward models are never perfect. They inevitably function as proxies for complex desiderata such as correctness, helpfulness, and safety. By overoptimizing for a misspecified reward, we can subvert intended alignment goals and reduce overall performance - a phenomenon commonly referred to as reward hacking.


On Universality Classes of Equivariant Networks

Neural Information Processing Systems

Equivariant neural networks provide a principled framework for incorporating symmetry into learning architectures and have been extensively analyzed through the lens of their separation power, that is, the ability to distinguish inputs modulo symmetry. This notion plays a central role in settings such as graph learning, where it is often formalized via the Weisfeiler-Leman hierarchy. In contrast, the universality of equivariant models--their capacity to approximate target functions--remains comparatively underexplored. In this work, we investigate the approximation power of equivariant neural networks beyond separation constraints. We show that separation power does not fully capture expressivity: models with identical separation power may differ in their approximation ability. To demonstrate this, we characterize the universality classes of shallow invariant networks, providing a general framework for understanding which functions these architectures can approximate. Since equivariant models reduce to invariant ones under projection, this analysis yields sufficient conditions under which shallow equivariant networks fail to be universal. Conversely, we identify settings where shallow models do achieve separation-constrained universality. These positive results, however, depend critically on structural properties of the symmetry group, such as the existence of adequate normal subgroups, which may not hold in important cases like permutation symmetry.


Attention on the Sphere

Neural Information Processing Systems

We introduce a generalized attention mechanism for spherical domains, enabling Transformer architectures to natively process data defined on the two-dimensional sphere - a critical need in fields such as atmospheric physics, cosmology, and robotics, where preserving spherical symmetries and topology is essential for physical accuracy. By integrating numerical quadrature weights into the attention mechanism, we obtain a geometrically faithful spherical attention that is approximately rotationally equivariant, providing strong inductive biases and leading to better performance than Cartesian approaches. To further enhance both scalability and model performance, we propose neighborhood attention on the sphere, which confines interactions to geodesic neighborhoods. This approach reduces computational complexity and introduces the additional inductive bias for locality, while retaining the symmetry properties of our method. We provide optimized CUDA kernels and memory-efficient implementations to ensure practical applicability. The method is validated on three diverse tasks: simulating shallow water equations on the rotating sphere, spherical image segmentation, and spherical depth estimation. Across all tasks, our spherical Transformers consistently outperform their planar counterparts, highlighting the advantage of geometric priors for learning on spherical domains.


You Only Communicate Once: One-shot Federated Low-Rank Adaptation of MLLM

Neural Information Processing Systems

Multimodal Large Language Models (MLLMs) with Federated Learning (FL) can quickly adapt to privacy-sensitive tasks, but face significant challenges such as high communication costs and increased attack risks, due to their reliance on multiround communication. To address this, One-shot FL (OFL) has emerged, aiming to complete adaptation in a single client-server communication. However, existing adaptive ensemble OFL methods still need more than one round of communication, because correcting heterogeneity-induced local bias relies on aggregated global supervision, meaning they still do not achieve true one-shot communication. In this work, we make the first attempt to achieve true one-shot communication for MLLMs under OFL, by investigating whether implicit (i.e., initial rather than aggregated) global supervision alone can effectively correct local training bias. Our key finding from the empirical study is that imposing directional supervision on local training substantially mitigates client conflicts and local bias. Building on this insight, we propose YOCO, in which directional supervision with sign-regularized LoRAB enforces global consistency, while sparsely regularized LoRAA preserves client-specific adaptability. Experiments demonstrate that YOCO cuts communication to 0.03% of multi-round FL while surpassing those methods in several multimodal scenarios and consistently outperforming all one-shot competitors.


O(T) Static Regret and Instance Dependent Constraint Violation for Constrained Online Convex Optimization

Neural Information Processing Systems

The constrained version of the standard online convex optimization (OCO) framework, called COCO is considered, where on every round, a convex cost function and a convex constraint function are revealed to the learner after it chooses the action for that round. The objective is to simultaneously minimize the static regret and cumulative constraint violation (CCV). An algorithm is proposed that guarantees a static regret of O( T) and a CCV of min{V,O( Tlog T)}, where V depends on the distance between the consecutively revealed constraint sets, the shape of constraint sets, dimension of action space and the diameter of the action space. When constraint sets have additional structure, V = O(1). Compared to the state of the art results, static regret of O( T) and CCV of O( T log T), that were universal, the new result on CCV is instance dependent, which is derived by exploiting the geometric properties of the constraint sets.


MS-GS: Multi-Appearance Sparse-View 3DGaussian Splatting in the Wild

Neural Information Processing Systems

In-the-wild photo collections often contain limited volumes of imagery and exhibit multiple appearances, e.g., taken at different times of day or seasons, posing significant challenges to scene reconstruction and novel view synthesis. Although recent adaptations of Neural Radiance Field (NeRF) and 3DGaussian Splatting (3DGS) have improved in these areas, they tend to oversmooth and are prone to overfitting. In this paper, we present MS-GS, a novel framework designed with Multi-appearance capabilities in Sparse-view scenarios using 3DGS. To address the lack of support due to sparse initializations, our approach is built on the geometric priors elicited from monocular depth estimations. The key lies in extracting and utilizing local semantic regions with a Structure-from-Motion (SfM) points anchored algorithm for reliable alignment and geometry cues. Then, to introduce multi-view constraints, we propose a series of geometry-guided supervision steps at virtual views in pixel and feature levels to encourage 3D consistency and reduce overfitting. We also introduce a dataset and an in-the-wild experiment setting to set up more realistic benchmarks. We demonstrate that MS-GS achieves photorealistic renderings under various challenging sparse-view and multi-appearance conditions, and outperforms existing approaches significantly across different datasets.


Who will win the World Cup? Mathematician's 11 models predict four possible champions (but NOT England!)

Daily Mail - Science & tech

Embattled Gavin Newsom's stunning confession to Justin Trudeau caught on camera at World Cup when he thought no one was watching Secret list of celebrities attending billionaire Peter Thiel's invite-only society where elites learn about sex, cults and the next world war Malia and Sasha Obama steal the show during rare family outing for grand opening of dad Barack's library Haunting final video of beloved Bay Area coffee shop owner, 52, who vanished without a trace: Investigator reveals'unnerving' new clues found inside her home Watch horrifying drone video that follows woman's plunge to death after bungee team threw her from bridge without rope Tragic final moments of Hollywood legend's daughter and her husband revealed before being mysteriously found dead in their running SUV Ivanka Trump's youngest son, 8, spotted in middle of Knicks victory parade Scientists create first-ever'map' of female pleasure center that's confused men for centuries All my friends are suddenly getting divorced. Mid-life wives share taboo sex confessions about why they really leave... including common position that made one hate her husband: JANA HOCKING Taylor Swift's bottomless thirst for attention, her greed and sheer tackiness are now truly unbearable... this latest stunt has shown her true colors: MAUREEN CALLAHAN Mystery surrounds JD Vance's dash to Switzerland as world holds breath for Iranians to confirm peace deal Male Israeli hostage sexually assaulted by Hamas captor describes multiple attacks he suffered - blindfolded and stripped naked at knifepoint... and'brutal' 20-minute ordeal Boy, three, is thrown into crocodile enclosure at zoo: Man, 30, 'not known to him' arrested on suspicion of attempted murder Infection found in wildlife evolved to spread between humans, experts fear... after two clusters are identified Florida man hailed as a hero for jumping off of his bike to wrangle a dangerous 8-foot python... only to then be slapped with a $180 FINE Sensational REAL reason Jelly Roll is divorcing Bunnie XO: Insiders reveal'preacher's wife' bombshell that's the talk of Nashville... truth about legendary rocker cuckolding rumor... and G-string mishap Who will win the World Cup? Mathematician's 11 models predict four possible champions (but NOT England!) READ MORE: Supercomputer predicts England's World Cup journey England's World Cup journey begins tonight, but a mathematician warns that fans shouldn't get their hopes up. Dr Ari Joury, a particle physicist and founder of AI firm Wangari, created 11 different models to predict who will win this year's tournament. These digital tipsters crowned four different champions between them, but not a single one picked England. Seven models backed Spain, two singled out Argentina as the likeliest winner, while France and the Netherlands were each the favourite of one prediction system.