AITopics

On Causal Discovery in the Presence of Deterministic Relations Loka Li

Neural Information Processing SystemsMar-27-2025, 13:49:17 GMT

Many causal discovery methods typically rely on the assumption of independent noise, yet real-life situations often involve deterministic relationships. In these cases, observed variables are represented as deterministic functions of their parental variables without noise. When determinism is present, constraint-based methods encounter challenges due to the violation of the faithfulness assumption. In this paper, we find, supported by both theoretical analysis and empirical evidence, that score-based methods with exact search can naturally address the issues of deterministic relations under rather mild assumptions. Nonetheless, exact scorebased methods can be computationally expensive.

artificial intelligence, machine learning, optimization problem, (17 more...)

Neural Information Processing Systems

Country:

North America > Canada > Quebec (0.14)
North America > United States > California (0.14)

Genre: Research Report > Experimental Study (1.00)

Industry:

Health & Medicine (1.00)
Information Technology (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)

Add feedback

964b1c8dd5667fd647c09c8772829fd1-Paper-Conference.pdf

Neural Information Processing SystemsMar-27-2025, 13:49:10 GMT

artificial intelligence, information, machine learning, (20 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.95)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)

Add feedback

CONTRAST: Continual Multi-source Adaptation to Dynamic Distributions

Neural Information Processing SystemsMar-27-2025, 13:49:03 GMT

Adapting to dynamic data distributions is a practical yet challenging task. One effective strategy is to use a model ensemble, which leverages the diverse expertise of different models to transfer knowledge to evolving data distributions. However, this approach faces difficulties when the dynamic test distribution is available only in small batches and without access to the original source data. To address the challenge of adapting to dynamic distributions in such practical settings, we propose CONtinual mulTi-souRce Adaptation to dynamic diStribuTions (CONTRAST), a novel method that optimally combines multiple source models to adapt to the dynamic test data. CONTRAST has two distinguishing features. First, it efficiently computes the optimal combination weights to combine the source models to adapt to the test data distribution continuously as a function of time. Second, it identifies which of the source model parameters to update so that only the model which is most correlated to the target data is adapted, leaving the less correlated ones untouched; this mitigates the issue of "forgetting" the source model parameters by focusing only on the source model that exhibits the strongest correlation with the test batch distribution. Through theoretical analysis we show that the proposed method is able to optimally combine the source models and prioritize updates to the model least prone to forgetting. Experimental analysis on diverse datasets demonstrates that the combination of multiple source models does at least as well as the best source (with hindsight knowledge), and performance does not degrade as the test data distribution changes over time (robust to forgetting).

artificial intelligence, machine learning, source model, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > Michigan (0.14)
North America > United States > California (0.14)
Asia > Middle East > Israel (0.14)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Sensing and Signal Processing > Image Processing (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

b8362385b08348d21162310c5b4e9541-Supplemental-Conference.pdf

Neural Information Processing SystemsMar-27-2025, 13:48:55 GMT

artificial intelligence, machine learning, node, (15 more...)

Neural Information Processing Systems

Industry: Leisure & Entertainment > Games (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

b8362385b08348d21162310c5b4e9541-Paper-Conference.pdf

Neural Information Processing SystemsMar-27-2025, 13:48:52 GMT

artificial intelligence, inductive learning, machine learning, (17 more...)

Neural Information Processing Systems

Industry: Education (0.31)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.30)

Add feedback

b7c12689a89e98a61bcaa65285a41b7c-Supplemental-Conference.pdf

Neural Information Processing SystemsMar-27-2025, 13:48:45 GMT

artificial intelligence, machine learning, natural language, (17 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Natural Language (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)

Add feedback

Few-shot Task-agnostic Neural Architecture Search for Distilling Large Language Models

Neural Information Processing SystemsMar-27-2025, 13:48:41 GMT

Traditional knowledge distillation (KD) methods manually design student architectures to compress large models given pre-specified computational cost. This requires several trials to find viable students, and repeating the process with change in computational budget. We use Neural Architecture Search (NAS) to automatically distill several compressed students with variable cost from a large model. Existing NAS methods train a single SuperLM consisting of millions of subnetworks with weight-sharing, resulting in interference between subnetworks of different sizes. Additionally, many of these works are task-specific requiring task labels for SuperLM training. Our framework AutoDistil addresses above challenges with the following steps: (a) Incorporates inductive bias and heuristics to partition Transformer search space into K compact sub-spaces (e.g., K=3 can generate typical student sizes of base, small and tiny); (b) Trains one SuperLM for each sub-space using task-agnostic objective (e.g., self-attention distillation) with weight-sharing of students; (c) Lightweight search for the optimal student without re-training. Task-agnostic training and search allow students to be reused for fine-tuning on any downstream task. Experiments on GLUE benchmark demonstrate AutoDistil to outperform state-of-the-art KD and NAS methods with upto 41x reduction in computational cost. Code and models are available at aka.ms/autodistil.

machine learning, natural language, subnetwork, (20 more...)

Neural Information Processing Systems

Country: North America > United States > Minnesota (0.28)

Industry: Education (0.49)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Implicit Multimodal Alignment: On the Generalization of Frozen LLMs to Multimodal Inputs Matthieu Cord

Neural Information Processing SystemsMar-27-2025, 13:48:36 GMT

Large Language Models (LLMs) have demonstrated impressive performance on multimodal tasks, without any multimodal finetuning. They are the de facto building block for Large Multimodal Models (LMMs), yet, we still lack a proper understanding of their success. In this work, we expose frozen LLMs to image, video, audio and text inputs and analyse their internal representation aiming to understand their generalization beyond textual inputs. Our work provides the following findings. Perceptual tokens (1) are easily distinguishable from textual ones inside LLMs, with significantly different representations (e.g.

arxiv preprint arxiv, large language model, machine learning, (19 more...)

Neural Information Processing Systems

Country: