AITopics

2311.16769

Country:

Europe > Austria > Vienna (0.14)
North America > United States > New York > New York County > New York City (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
(7 more...)

Genre: Research Report > New Finding (1.00)

Industry: Information Technology (1.00)

Technology:

Information Technology > Communications > Networks (1.00)
Information Technology > Cloud Computing (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
(6 more...)

arXiv.org Artificial IntelligenceNov-28-2023

Adaptive Bayesian Learning with Action and State-Dependent Signal Variance

Hou, Kaiwen

Bayesian learning, a fundamental concept in statistical inference and decision-making, has gained significant traction across various fields due to its ability to integrate prior knowledge with new information. As a robust methodology, Bayesian learning has been widely acknowledged for its adaptability and precision in handling uncertainty and updating beliefs (Gelman et al., 1995). This manuscript expands upon the Bayesian learning framework (Baley and Veldkamp, 2023) through uniquely addressing the action and state-dependent signal variance in the agents' information set. At the core of this framework is the concept that the precision of the signal received by an agent is contingent upon both the agent's action and the actual state, for example, based on their congruence or tracking error (Daly, 2018; du Sart and van Vuuren, 2021; Orlik and Veldkamp, 2014; Rompotis, 2011; Stone et al., 2013; Yang and Huang, 2022).

agent, variance, veldkamp, (13 more...)

2311.12878

Genre: Research Report (0.40)

Industry:

Banking & Finance > Trading (1.00)
Banking & Finance > Economy (0.96)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Donker, H. C., Neijzen, D., Lunter, G. A.

Multinomial belief networks

A Bayesian approach to machine learning is attractive when we need to quantify uncertainty, deal with missing observations, when samples are scarce, or when the data is sparse. All of these commonly apply when analysing healthcare data. To address these analytical requirements, we propose a deep generative model for multinomial count data where both the weights and hidden units of the network are Dirichlet distributed. A Gibbs sampling procedure is formulated that takes advantage of a series of augmentation relations, analogous to the Zhou-Cong-Chen model. We apply the model on small handwritten digits, and a large experimental dataset of DNA mutations in cancer, and we show how the model is able to extract biologically meaningful meta-signatures in a fully data-driven way.

artificial intelligence, bayesian inference, machine learning, (19 more...)

2311.16909

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Asia > Middle East > Jordan (0.04)
North America > United States > New York > New York County > New York City (0.04)
(8 more...)

Genre:

Overview (0.67)
Research Report (0.50)
Workflow (0.45)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Optimal minimax rate of learning interaction kernels

Wang, Xiong, Seroussi, Inbar, Lu, Fei

Nonparametric estimation of nonlocal interaction kernels is crucial in various applications involving interacting particle systems. The inference challenge, situated at the nexus of statistical learning and inverse problems, comes from the nonlocal dependency. A central question is whether the optimal minimax rate of convergence for this problem aligns with the rate of $M^{-\frac{2\beta}{2\beta+1}}$ in classical nonparametric regression, where $M$ is the sample size and $\beta$ represents the smoothness exponent of the radial kernel. Our study confirms this alignment for systems with a finite number of particles. We introduce a tamed least squares estimator (tLSE) that attains the optimal convergence rate for a broad class of exchangeable distributions. The tLSE bridges the smallest eigenvalue of random matrices and Sobolev embedding. This estimator relies on nonasymptotic estimates for the left tail probability of the smallest eigenvalue of the normal matrix. The lower minimax rate is derived using the Fano-Tsybakov hypothesis testing method. Our findings reveal that provided the inverse problem in the large sample limit satisfies a coercivity condition, the left tail probability does not alter the bias-variance tradeoff, and the optimal minimax rate remains intact. Our tLSE method offers a straightforward approach for establishing the optimal minimax rate for models with either local or nonlocal dependency.

artificial intelligence, bayesian inference, machine learning, (18 more...)

2311.16852

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.04)
North America > United States > New York > New York County > New York City (0.04)
(2 more...)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.45)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.45)

Pseudo-Likelihood Inference

Gruner, Theo, Belousov, Boris, Muratore, Fabio, Palenicek, Daniel, Peters, Jan

Simulation-Based Inference (SBI) is a common name for an emerging family of approaches that infer the model parameters when the likelihood is intractable. Existing SBI methods either approximate the likelihood, such as Approximate Bayesian Computation (ABC), or directly model the posterior, such as Sequential Neural Posterior Estimation (SNPE). While ABC is efficient on low-dimensional problems, on higher-dimensional tasks, it is generally outperformed by SNPE which leverages function approximation. In this paper, we propose Pseudo-Likelihood Inference (PLI), a new method that brings neural approximation into ABC, making it competitive on challenging Bayesian system identification tasks. By utilizing integral probability metrics, we introduce a smooth likelihood kernel with an adaptive bandwidth that is updated based on information-theoretic trust regions. Thanks to this formulation, our method (i) allows for optimizing neural posteriors via gradient descent, (ii) does not rely on summary statistics, and (iii) enables multiple observations as input. In comparison to SNPE, it leads to improved performance when more data is available. The effectiveness of PLI is evaluated on four classical SBI benchmark tasks and on a highly dynamic physical system, showing particular advantages on stochastic simulations and multi-modal posterior landscapes.

artificial intelligence, machine learning, posterior, (17 more...)

2311.16656

Country: Europe > Germany > Hesse > Darmstadt Region > Darmstadt (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.89)

Lam, Kevin, Walder, Christian, Penev, Spiridon, Nock, Richard

LegendreTron: Uprising Proper Multiclass Loss Learning

Loss functions serve as the foundation of supervised learning and are often chosen prior to model development. To avoid potentially ad hoc choices of losses, statistical decision theory describes a desirable property for losses known as \emph{properness}, which asserts that Bayes' rule is optimal. Recent works have sought to \emph{learn losses} and models jointly. Existing methods do this by fitting an inverse canonical link function which monotonically maps $\mathbb{R}$ to $[0,1]$ to estimate probabilities for binary problems. In this paper, we extend monotonicity to maps between $\mathbb{R}^{C-1}$ and the projected probability simplex $\tilde{\Delta}^{C-1}$ by using monotonicity of gradients of convex functions. We present {\sc LegendreTron} as a novel and practical method that jointly learns \emph{proper canonical losses} and probabilities for multiclass problems. Tested on a benchmark of domains with up to 1,000 classes, our experimental results show that our method consistently outperforms the natural multiclass baseline under a $t$-test at 99% significance on all datasets with greater than 10 classes.

artificial intelligence, bayesian inference, machine learning, (16 more...)

2301.11695

Country:

South America > Paraguay > Asunción > Asunción (0.04)
Oceania > Australia > New South Wales > Sydney (0.04)
North America > United States > New York (0.04)
(3 more...)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.48)

arXiv.org Machine LearningNov-27-2023

Asymptotic Bounds for Smoothness Parameter Estimates in Gaussian Process Interpolation

Karvonen, Toni

It is common to model a deterministic response function, such as the output of a computer experiment, as a Gaussian process with a Mat\'ern covariance kernel. The smoothness parameter of a Mat\'ern kernel determines many important properties of the model in the large data limit, including the rate of convergence of the conditional mean to the response function. We prove that the maximum likelihood estimate of the smoothness parameter cannot asymptotically undersmooth the truth when the data are obtained on a fixed bounded subset of $\mathbb{R}^d$. That is, if the data-generating response function has Sobolev smoothness $\nu_0 > d/2$, then the smoothness parameter estimate cannot be asymptotically less than $\nu_0$. The lower bound is sharp. Additionally, we show that maximum likelihood estimation recovers the true smoothness for a class of compactly supported self-similar functions. For cross-validation we prove an asymptotic lower bound $\nu_0 - d/2$, which however is unlikely to be sharp. The results are based on approximation theory in Sobolev spaces and some general theorems that restrict the set of values that the parameter estimators can take.

estimation, gaussian process, kernel, (15 more...)

2203.054

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Japan > Honshū > Kantō > Kanagawa Prefecture (0.04)
Europe > Finland > Uusimaa > Helsinki (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.91)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.91)

arXiv.org Artificial IntelligenceNov-27-2023

IG Captioner: Information Gain Captioners are Strong Zero-shot Classifiers

Yang, Chenglin, Qiao, Siyuan, Cao, Yuan, Zhang, Yu, Zhu, Tao, Yuille, Alan, Yu, Jiahui

Generative training has been demonstrated to be powerful for building visual-language models. However, on zero-shot discriminative benchmarks, there is still a performance gap between models trained with generative and discriminative objectives. In this paper, we aim to narrow this gap by improving the efficacy of generative training on classification tasks, without any finetuning processes or additional modules. Specifically, we focus on narrowing the gap between the generative captioner and the CLIP classifier. We begin by analysing the predictions made by the captioner and classifier and observe that the caption generation inherits the distribution bias from the language model trained with pure text modality, making it less grounded on the visual signal. To tackle this problem, we redesign the scoring objective for the captioner to alleviate the distributional bias and focus on measuring the gain of information brought by the visual inputs. We further design a generative training objective to match the evaluation objective. We name our model trained and evaluated from the novel procedures as Information Gain (IG) captioner. We pretrain the models on the public Laion-5B dataset and perform a series of discriminative evaluations. For the zero-shot classification on ImageNet, IG captioner achieves $> 18\%$ improvements over the standard captioner, achieving comparable performances with the CLIP classifier. IG captioner also demonstrated strong performance on zero-shot image-text retrieval tasks on MSCOCO and Flickr30K. We hope this paper inspires further research towards unifying generative and discriminative training procedures for visual-language models.

captioner, classifier, ig captioner, (14 more...)

2311.17072

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.68)

arXiv.org Machine LearningNov-27-2023

Maximum Likelihood Estimation is All You Need for Well-Specified Covariate Shift

Ge, Jiawei, Tang, Shange, Fan, Jianqing, Ma, Cong, Jin, Chi

A key challenge of modern machine learning systems is to achieve Out-of-Distribution (OOD) generalization -- generalizing to target data whose distribution differs from that of source data. Despite its significant importance, the fundamental question of ``what are the most effective algorithms for OOD generalization'' remains open even under the standard setting of covariate shift. This paper addresses this fundamental question by proving that, surprisingly, classical Maximum Likelihood Estimation (MLE) purely using source data (without any modification) achieves the minimax optimality for covariate shift under the well-specified setting. That is, no algorithm performs better than MLE in this setting (up to a constant factor), justifying MLE is all you need. Our result holds for a very rich class of parametric models, and does not require any boundedness condition on the density ratio. We illustrate the wide applicability of our framework by instantiating it to three concrete examples -- linear regression, logistic regression, and phase retrieval. This paper further complement the study by proving that, under the misspecified setting, MLE is no longer the optimal choice, whereas Maximum Weighted Likelihood Estimator (MWLE) emerges as minimax optimal in certain scenarios.

exp, nulli 1, nulli 1 2, (16 more...)

2311.15961

Country:

North America > United States > Illinois > Cook County > Chicago (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > New Finding (0.54)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Coulombe, Philippe Goulet, Frenette, Mikael, Klieber, Karin

From Reactive to Proactive Volatility Modeling with Hemisphere Neural Networks

arXiv.org Artificial IntelligenceNov-27-2023

We reinvigorate maximum likelihood estimation (MLE) for macroeconomic density forecasting through a novel neural network architecture with dedicated mean and variance hemispheres. Our architecture features several key ingredients making MLE work in this context. First, the hemispheres share a common core at the entrance of the network which accommodates for various forms of time variation in the error variance. Second, we introduce a volatility emphasis constraint that breaks mean/variance indeterminacy in this class of overparametrized nonlinear models. Third, we conduct a blocked out-of-bag reality check to curb overfitting in both conditional moments. Fourth, the algorithm utilizes standard deep learning software and thus handles large data sets - both computationally and statistically. Ergo, our Hemisphere Neural Network (HNN) provides proactive volatility forecasts based on leading indicators when it can, and reactive volatility based on the magnitude of previous prediction errors when it must. We evaluate point and density forecasts with an extensive out-of-sample experiment and benchmark against a suite of models ranging from classics to more modern machine learning-based offerings. In all cases, HNN fares well by consistently providing accurate mean/variance forecasts for all targets and horizons. Studying the resulting volatility paths reveals its versatility, while probabilistic forecasting evaluation metrics showcase its enviable reliability. Finally, we also demonstrate how this machinery can be merged with other structured deep learning models by revisiting Goulet Coulombe (2022)'s Neural Phillips Curve.

forecast, hemisphere, variance, (15 more...)

2311.16333

Country:

North America > Canada > Quebec > Montreal (0.04)
North America > United States > Missouri > Jackson County > Kansas City (0.04)
North America > Cuba (0.04)
(3 more...)

Genre: Research Report > New Finding (0.67)

Industry:

Government > Regional Government > North America Government > United States Government (1.00)
Banking & Finance > Economy (1.00)
Banking & Finance > Trading (0.94)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)