Goto

Collaborating Authors

 Country


Vision-and-Language Training Helps Deploy Taxonomic Knowledge but Does Not Fundamentally Alter It

Neural Information Processing Systems

Does vision-and-language (VL) training change the linguistic representations of language models in meaningful ways? Most results in the literature have shown inconsistent or marginal differences, both behaviorally and representationally. In this work, we start from the hypothesis that the domain in which VL training could have a significant effect is lexical-conceptual knowledge, in particular its taxonomic organization. Through comparing minimal pairs of text-only LMs and their VL-trained counterparts, we first show that the VL models often outperform their text-only counterparts on a text-only question-answering task that requires taxonomic understanding of concepts mentioned in the questions. Using an array of targeted behavioral and representational analyses, we show that the LMs and VLMs do not differ significantly in terms of their taxonomic knowledge itself, but they differ in how they represent questions that contain concepts in a taxonomic relation vs. a non-taxonomic relation. This implies that the taxonomic knowledge itself does not change substantially through additional VL training, but VL training does improve the deployment of this knowledge in the context of a specific task, even when the presentation of the task is purely linguistic.


Conformal Prediction for Time-series Forecasting with Change Points

Neural Information Processing Systems

Conformal prediction has been explored as a general and efficient way to provide uncertainty quantification for time series. However, current methods struggle to handle time series data with change points -- sudden shifts in the underlying data-generating process. In this paper, we propose a novel Conformal Prediction for Time-series with Change points (CPTC) algorithm, addressing this gap by integrating a model to predict the underlying state with online conformal prediction to model uncertainties in non-stationary time series. We prove CPTC's validity and improved adaptivity in the time series setting under minimum assumptions, and demonstrate CPTC's practical effectiveness on 6 synthetic and real-world datasets, showing improved validity and adaptivity compared to state-of-the-art baselines.


ADynamic Learning Strategy for Dempster-Shafer Theory with Applications in Classification and Enhancement

Neural Information Processing Systems

Effective modelling of uncertain information is crucial for quantifying uncertainty. Dempster-Shafer evidence (DSE) theory is a widely recognized approach for handling uncertain information. However, current methods often neglect the inherent a priori information within data during modelling, and imbalanced data lead to insufficient attention to key information in the model. To address these limitations, this paper presents a dynamic learning strategy based on nonuniform splitting mechanism and Hilbert space mapping. First, the framework uses a nonuniform splitting mechanism to dynamically adjust the weights of data subsets and combines the diffusion factor to effectively incorporate the data a priori information, thereby flexibly addressing uncertainty and conflict. Second, the conflict in the information fusion process is reduced by Hilbert space mapping. Experimental results on multiple tasks show that the proposed method significantly outperforms state-of-the-art methods and effectively improves the performance of classification and low-light image enhancement (LLIE) tasks. The code is available at https://anonymous.4open.science/r/Third-ED16.


Global Prompt Refinement with Non-Interfering Attention Masking for One-Shot Federated Learning

Neural Information Processing Systems

Federated Prompt Learning (FPL) enables communication-efficient adaptation by tuning lightweight prompts on top of frozen pre-trained models. Existing FPL methods typically rely on global information, which is only available after the second training round, to facilitate collaboration among client models. Therefore, they are inherently dependent on multi-round communication to fully exhibit their strengths. Moreover, existing one-shot federated learning methods typically focus on fitting seen tasks, but lack cross-task generalization. To bridge this gap, we propose the Global Prompt Refinement with Non-Interfering Attention Masking (GPR-NIAM) method for one-shot FPL.


Domain Adaptation for and Real Policy Co Training

Neural Information Processing Systems

Behavior cloning has shown promise for robot manipulation, but real-world demonstrations are costly to acquire at scale. While simulated data offers a scalable alternative, particularly with advances in automated demonstration generation, transferring policies to the real world is hampered by various simulation and real domain gaps. In this work, we propose a unified sim-and-real co-training framework for learning generalizable manipulation policies that primarily leverages simulation and only requires a few real-world demonstrations. Central to our approach is learning a domain-invariant, task-relevant feature space. Our key insight is that aligning the joint distributions of observations and their corresponding actions across domains provides a richer signal than aligning observations (marginals) alone. We achieve this by embedding an Optimal Transport (OT)-inspired loss within the co-training framework, and extend this to an Unbalanced OT framework to handle the imbalance between abundant simulation data and limited real-world examples. We validate our method on challenging manipulation tasks, showing it can leverage abundant simulation data to achieve up to a 30% improvement in the real-world success rate and even generalize to scenarios seen only in simulation.


Feature Unlearning: Theoretical Foundations and Practical Applications with Shuffling

Neural Information Processing Systems

Machine unlearning has become a focal point in recent research, yet the specific area of feature unlearning has not been thoroughly explored. Feature unlearning involves eliminating specific features' effects from an already trained model, presenting distinct challenges that are not yet comprehensively addressed. This paper presents a novel and straightforward approach to feature unlearning that employs a tactical shuffling of the features designated for removal. By redistributing the values of the features targeted for unlearning throughout the original training dataset and subsequently fine-tuning the model with this shuffled data, our proposed method provides a theoretical guarantee for effective feature unlearning. Under mild assumptions, our method can effectively disrupt the established correlations between unlearned features and the label, while preserving the relationships between the remaining features and the label. Across both tabular and image datasets, our empirical results show that our method not only effectively and efficiently removes the influence of designated features but also preserves the information content of the remaining features.


Let Brain Rhythm Shape Machine Intelligence for Connecting Dots on Graphs

Neural Information Processing Systems

In both neuroscience and artificial intelligence (AI), it is well-established that neural "coupling" gives rise to dynamically distributed systems. These systems exhibit selforganized spatiotemporal patterns of synchronized neural oscillations, enabling the representation of abstract concepts. By capitalizing on the unprecedented amount of human neuroimaging data, we propose that advancing the theoretical understanding of rhythmic coordination in neural circuits can offer powerful design principles for the next generation of machine learning models with improved efficiency and robustness. To this end, we introduce a physics-informed deep learning framework for Brain Rhythm Identification by Kuramoto and Control (coined BRICK) to characterize the synchronization of neural oscillations that shapes the dynamics of evolving cognitive states. Recognizing that brain networks are structurally connected yet behaviorally dynamic, we further conceptualize rhythmic neural activity as an artificial dynamical system of coupled oscillators, offering a shared mechanistic bridge to brain-inspired machine intelligence. By treating each node as an oscillator interacting with its neighbors, this approach moves beyond the conventional paradigm of graph heat diffusion and establishes a new regime of representation compression through oscillatory synchronization. Empirical evaluations demonstrate that this synchronization-driven mechanism not only mitigates over-smoothing in deep GNNs but also enhances the model's capacity for reasoning and solving complex graph-based problems.


GAMMA: Gated Multi-hop Message Passing for Homophily-Agnostic Node Representation in GNNs

Neural Information Processing Systems

The success of Graph Neural Networks (GNNs) leverages the homophily principle, where connected nodes share similar features and labels. However, this assumption breaks down in heterophilic graphs, where same-class nodes are often distributed across distant neighborhoods rather than immediate connections. Recent attempts expand the receptive field through multi-hop aggregation schemes that explicitly preserve intermediate representations from each hop distance. While effective at capturing heterophilic patterns, these methods require separate weight matrices per hop and feature concatenation, causing parameters to scale linearly with hop count. This leads to high computational complexity and GPU memory consumption. We propose Gated Multi-hop Message Passing (GAMMA), where nodes assess how relevant the aggregated information is from their k-hop neighbors. This assessment occurs through multiple refinement steps where the node compares each hop's embedding with its current representation, allowing it to focus on the most informative hops. During the forward pass, GAMMA finds the optimal mix of multi-hop information local to each node using a single feature vector without needing separate representations for each hop, thereby maintaining dimensionality comparable to single hop GNNs. In addition, we propose a weight sharing scheme that leverages a unified transformation for aggregated features from multiple hops so the global heterophilic patterns specific to each hop are learned during training.


Preference-Driven Multi-Objective Combinatorial Optimization with Conditional Computation

Neural Information Processing Systems

Recent deep reinforcement learning methods have achieved remarkable success in solving multi-objective combinatorial optimization problems (MOCOPs) by decomposing them into multiple subproblems, each associated with a specific weight vector. However, these methods typically treat all subproblems equally and solve them using a single model, hindering the effective exploration of the solution space and thus leading to suboptimal performance. To overcome the limitation, we propose POCCO, a novel plug-and-play framework that enables adaptive selection of model structures for subproblems, which are subsequently optimized based on preference signals rather than explicit reward values.


Cascaded Language Models for Cost-Effective Human-AI Decision-Making

Neural Information Processing Systems

A challenge in human-AI decision-making is to balance three factors: the correctness of predictions, the cost of knowledge and reasoning complexity, and the confidence about whether to abstain from automated answers or escalate to human experts. In this work, we present a cascaded LLM decision framework that adaptively delegates tasks across multiple tiers of expertise - a base model for initial candidate answers, a more capable and knowledgeable (but costlier) large model, and a human expert for when the model cascade abstains.