AITopics | Learning Graphical Models

Learning Dynamical Systems via Koopman Operator Regression in Reproducing Kernel Hilbert Spaces

Neural Information Processing SystemsApr-24-2026, 21:31:28 GMT

We study a class of dynamical systems modelled as Markov chains that admit an invariant distribution via the corresponding transfer, or Koopman, operator. While data-driven algorithms to reconstruct such operators are well known, their relationship with statistical learning is largely unexplored. We formalize a framework to learn the Koopman operator from finite data trajectories of the dynamical system. We consider the restriction of this operator to a reproducing kernel Hilbert space and introduce a notion of risk, from which different estimators naturally arise. We link the risk with the estimation of the spectral decomposition of the Koopman operator. These observations motivate a reduced-rank operator regression (RRR) estimator. We derive learning bounds for the proposed estimator, holding both in i.i.d. and non i.i.d.

artificial intelligence, machine learning, scientific computing, (16 more...)

Neural Information Processing Systems

Country: North America > United States (0.28)

Genre: Research Report (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Scientific Computing (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.35)

Add feedback

147540e129e096fa91700e9db6588354-Paper.pdf

Neural Information Processing SystemsApr-24-2026, 19:35:14 GMT

evolutionary algorithm, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Country: North America > United States (0.68)

Genre: Research Report (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Natural Language (0.69)
(3 more...)

Add feedback

144a3f71a03ab7c4f46f9656608efdb2-Paper.pdf

Neural Information Processing SystemsApr-24-2026, 19:34:09 GMT

artificial intelligence, conditioning, machine learning, (19 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

Finite-Time Logarithmic Bayes Regret Upper Bounds

Neural Information Processing SystemsApr-24-2026, 19:33:57 GMT

We derive the first finite-time logarithmic Bayes regret upper bounds for Bayesian bandits. In a multi-armed bandit, we obtain O(c logn)and O(ch log2 n)upper bounds for an upper confidence bound algorithm, where ch and c are constants depending on the prior distribution and the gaps of bandit instances sampled from it, respectively. The latter bound asymptotically matches the lower bound of Lai (1987). Our proofs are a major technical departure from prior works, while being simple and general. To show the generality of our techniques, we apply them to linear bandits. Our results provide insights on the value of prior in the Bayesian setting, both in the objective and as a side information given to the learner. They significantly improve upon existing O( n)bounds, which have become standard in the literature despite the logarithmic lower bound of Lai (1987).

bandit, data mining, machine learning, (22 more...)

Neural Information Processing Systems

Country: North America > United States > Texas (0.14)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Data Science > Data Mining > Big Data (0.67)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

Robustifying Algorithms of Learning Latent Trees with Vector Variables

Neural Information Processing SystemsApr-24-2026, 18:50:09 GMT

We consider learning the structures of Gaussian latent tree models with vector observations when a subset of them are arbitrarily corrupted. First, we present the sample complexities of Recursive Grouping (RG) and Chow-Liu Recursive Grouping (CLRG) without the assumption that the effective depth is bounded in the number of observed nodes, significantly generalizing the results in Choi et al. (2011). We show that Chow-Liu initialization in CLRG greatly reduces the sample complexity of RG from being exponential in the diameter of the tree to only logarithmic in the diameter for the hidden Markov model (HMM).

artificial intelligence, machine learning, sample complexity, (17 more...)

Neural Information Processing Systems

Country: North America > United States (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

Efficient Equivariant Transfer Learning from Pretrained Models

Neural Information Processing SystemsApr-24-2026, 18:48:58 GMT

Efficient transfer learning algorithms are key to the success of foundation models on diverse downstream tasks even with limited data. Recent works of Basu et al. (2023) and Kaba et al. (2022) propose group averaging (equitune) and optimizationbased methods, respectively, over features from group-transformed inputs to obtain equivariant outputs from non-equivariant neural networks. While Kaba et al. (2022) are only concerned with training from scratch, we find that equitune performs poorly on equivariant zero-shot tasks despite good finetuning results. We hypothesize that this is because pretrained models provide better quality features for certain transformations than others and simply averaging them is deleterious. Hence, we propose λ-equitune that averages the features using importance weights, λs. These weights are learned directly from the data using a small neural network, leading to excellent zero-shot and finetuned results that outperform equitune. Further, we prove that λ-equitune is equivariant and a universal approximator of equivariant functions. Additionally, we show that the method of Kaba et al. (2022) used with appropriate loss functions, which we call equizero, also gives excellent zero-shot and finetuned performance.

large language model, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

15349e1c554406b7719d047a498e7117-Supplemental-Conference.pdf

Neural Information Processing SystemsApr-24-2026, 18:13:10 GMT

artificial intelligence, machine learning, proceedings, (11 more...)

Neural Information Processing Systems

Country: North America > United States > California > Los Angeles County > Los Angeles (0.28)

Genre:

Research Report > Experimental Study (0.94)
Research Report > Strength High (0.68)

Industry: Health & Medicine (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.68)

Add feedback

XXXXX

XXX

Neural Information Processing SystemsApr-24-2026, 17:52:21 GMT

In contrast to the advances in characterizing the sample complexity for solving Markov decision processes (MDPs), the optimal statistical complexity for solving constrained MDPs (CMDPs) remains unknown. We resolve this question by providing minimax upper and lower bounds on the sample complexity for learning near-optimal policies in a discounted CMDP with access to a generative model (simulator). In particular, we design a model-based algorithm that addresses two settings: (i) relaxed feasibility, where small constraint violations are allowed, and (ii) strict feasibility, where the output policy is required to satisfy the constraint.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

Neural Information Processing Systems

Country:

North America > Canada (0.46)
North America > United States > California (0.28)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.35)

Add feedback

0c79d6ed1788653643a1ac67b6ea32a7-Paper-Conference.pdf

Neural Information Processing SystemsApr-24-2026, 17:31:40 GMT

artificial intelligence, exp null, machine learning, (20 more...)

Neural Information Processing Systems

Country: North America > United States (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

Appendix

Neural Information Processing SystemsApr-24-2026, 16:50:13 GMT

In this section we motivate the design choices and inductive biases that we encode into our neural encoder network e, which is the network that is used to model the relative accuracies of the weak supervision sources λ. Recall that we model the probability of a particular sample x X having the class label y Y = {1,...,C}as Pθ(y|λ) = softmax(s)yP(y), (4) s = θ(λ,x)Tλ RC . Connection to prior PGM models We now motivate this choice by deriving a less expressive variant of it from the standard Markov Random Field (MRF) used in the related work. If we view the attention scores θ(λ,x) Rm, that assign sample-dependent accuracies to each labeling function, as sample-independent parameters θ1 and, by that, drop the features from the equation - as is done in the related work [30, 32, 19, 11] - we can rewrite Eq. 4 as exp θT1 1 {λ = y} P We can recognize Pθ as a distribution from the exponential familiy, and more specifically as a pairwise MRF, or factor graph, with canonical parameters θ = (θ1,θ2) and corresponding sufficient statistics, or factors, φ(λ,y) = (φ1(λ,y),φ2(λ)), as well as the log partition function Zθ. The accuracy factors and parameters φ1,θ1 are the core component of this model and sometimes take the form φ1(λy) = λy in binary models as in [30, 19, 11]. The label-independent factors φ2(λ) have, as can be seen from the derivation above, no direct influence on the latent label posterior, but are often used to model labeling propensities 1 {λ 6= 0}and correlation dependencies 1 {λi = λj}, which can be important for PGM parameter learning, but are susceptible to misspecifications [39, 11, 8].

artificial intelligence, experiment, machine learning, (19 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.46)

Technology: