AITopics

2305.13301

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)
Asia > Middle East > Republic of Türkiye (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

arXiv.org Artificial IntelligenceJan-4-2024

Hyperparameter Estimation for Sparse Bayesian Learning Models

Yu, Feng, Shen, Lixin, Song, Guohui

Sparse Bayesian Learning (SBL) models are extensively used in signal processing and machine learning for promoting sparsity through hierarchical priors. The hyperparameters in SBL models are crucial for the model's performance, but they are often difficult to estimate due to the non-convexity and the high-dimensionality of the associated objective function. This paper presents a comprehensive framework for hyperparameter estimation in SBL models, encompassing well-known algorithms such as the expectation-maximization (EM), MacKay, and convex bounding (CB) algorithms. These algorithms are cohesively interpreted within an alternating minimization and linearization (AML) paradigm, distinguished by their unique linearized surrogate functions. Additionally, a novel algorithm within the AML framework is introduced, showing enhanced efficiency, especially under low signal noise ratios. This is further improved by a new alternating minimization and quadratic approximation (AMQ) paradigm, which includes a proximal regularization term. The paper substantiates these advancements with thorough convergence analysis and numerical experiments, demonstrating the algorithm's effectiveness in various noise conditions and signal-to-noise ratios.

algorithm, convergence, mk algorithm, (16 more...)

2401.02544

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.28)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > Virginia > Norfolk City County > Norfolk (0.04)
(5 more...)

Genre: Research Report (0.81)

Industry: Health & Medicine > Therapeutic Area (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

arXiv.org Artificial IntelligenceJan-4-2024

Better and Simpler Lower Bounds for Differentially Private Statistical Estimation

Narayanan, Shyam

We provide optimal lower bounds for two well-known parameter estimation (also known as statistical estimation) tasks in high dimensions with approximate differential privacy. First, we prove that for any $\alpha \le O(1)$, estimating the covariance of a Gaussian up to spectral error $\alpha$ requires $\tilde{\Omega}\left(\frac{d^{3/2}}{\alpha \varepsilon} + \frac{d}{\alpha^2}\right)$ samples, which is tight up to logarithmic factors. This result improves over previous work which established this for $\alpha \le O\left(\frac{1}{\sqrt{d}}\right)$, and is also simpler than previous work. Next, we prove that estimating the mean of a heavy-tailed distribution with bounded $k$th moments requires $\tilde{\Omega}\left(\frac{d}{\alpha^{k/(k-1)} \varepsilon} + \frac{d}{\alpha^2}\right)$ samples. Previous work for this problem was only able to establish this lower bound against pure differential privacy, or in the special case of $k = 2$. Our techniques follow the method of fingerprinting and are generally quite simple. Our lower bound for heavy-tailed estimation is based on a black-box reduction from privately estimating identity-covariance Gaussians. Our lower bound for covariance estimation utilizes a Bayesian approach to show that, under an Inverse Wishart prior distribution for the covariance matrix, no private estimator can be accurate even in expectation, without sufficiently many samples.

covariance estimation, differential privacy, estimation, (15 more...)

2310.06289

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Germany (0.04)

Genre: Research Report (0.50)

Industry: Information Technology > Security & Privacy (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.48)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.34)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.34)

arXiv.org Machine LearningJan-4-2024

Robust bilinear factor analysis based on the matrix-variate $t$ distribution

Ma, Xuan, Zhao, Jianhua, Shang, Changchun, Jiang, Fen, Yu, Philip L. H.

Factor Analysis based on multivariate $t$ distribution ($t$fa) is a useful robust tool for extracting common factors on heavy-tailed or contaminated data. However, $t$fa is only applicable to vector data. When $t$fa is applied to matrix data, it is common to first vectorize the matrix observations. This introduces two challenges for $t$fa: (i) the inherent matrix structure of the data is broken, and (ii) robustness may be lost, as vectorized matrix data typically results in a high data dimension, which could easily lead to the breakdown of $t$fa. To address these issues, starting from the intrinsic matrix structure of matrix data, a novel robust factor analysis model, namely bilinear factor analysis built on the matrix-variate $t$ distribution ($t$bfa), is proposed in this paper. The novelty is that it is capable to simultaneously extract common factors for both row and column variables of interest on heavy-tailed or contaminated matrix data. Two efficient algorithms for maximum likelihood estimation of $t$bfa are developed. Closed-form expression for the Fisher information matrix to calculate the accuracy of parameter estimates are derived. Empirical studies are conducted to understand the proposed $t$bfa model and compare with related competitors. The results demonstrate the superiority and practicality of $t$bfa. Importantly, $t$bfa exhibits a significantly higher breakdown point than $t$fa, making it more suitable for matrix data.

artificial intelligence, data mining, machine learning, (19 more...)

2401.02203

Country:

North America > United States > New York (0.04)
Asia > China (0.04)

Genre: Research Report > New Finding (0.87)

Industry: Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Data Science > Data Mining (0.92)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.68)

arXiv.org Machine LearningJan-4-2024

Entropy and the Kullback-Leibler Divergence for Bayesian Networks: Computational Complexity and Efficient Implementation

Scutari, Marco

Bayesian networks (BNs) are a foundational model in machine learning and causal inference. Their graphical structure can handle high-dimensional problems, divide them into a sparse collection of smaller ones, underlies Judea Pearl's causality, and determines their explainability and interpretability. Despite their popularity, there are almost no resources in the literature on how to compute Shannon's entropy and the Kullback-Leibler (KL) divergence for BNs under their most common distributional assumptions. In this paper, we provide computationally efficient algorithms for both by leveraging BNs' graphical structure, and we illustrate them with a complete set of numerical examples. In the process, we show it is possible to reduce the computational complexity of KL from cubic to quadratic for Gaussian BNs.

artificial intelligence, log 0, machine learning, (19 more...)

2312.0152

Country:

Oceania > New Zealand (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Switzerland > Basel-City > Basel (0.04)

Genre: Research Report (0.50)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Cheng, Kai, Zimmermann, Ralf

Sliced gradient-enhanced Kriging for high-dimensional function approximation

arXiv.org Machine LearningJan-4-2024

Gradient-enhanced Kriging (GE-Kriging) is a well-established surrogate modelling technique for approximating expensive computational models. However, it tends to get impractical for high-dimensional problems due to the size of the inherent correlation matrix and the associated high-dimensional hyper-parameter tuning problem. To address these issues, a new method, called sliced GE-Kriging (SGE-Kriging), is developed in this paper for reducing both the size of the correlation matrix and the number of hyper-parameters. We first split the training sample set into multiple slices, and invoke Bayes' theorem to approximate the full likelihood function via a sliced likelihood function, in which multiple small correlation matrices are utilized to describe the correlation of the sample set rather than one large one. Then, we replace the original high-dimensional hyper-parameter tuning problem with a low-dimensional counterpart by learning the relationship between the hyper-parameters and the derivative-based global sensitivity indices. The performance of SGE-Kriging is finally validated by means of numerical experiments with several benchmarks and a high-dimensional aerodynamic modeling problem. The results show that the SGE-Kriging model features an accuracy and robustness that is comparable to the standard one but comes at much less training costs. The benefits are most evident for high-dimensional problems with tens of variables.

artificial intelligence, machine learning, sge-kriging, (17 more...)

doi: 10.1137/22M154315X

2204.03562

Country:

Europe > Denmark > Southern Denmark (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.48)
(2 more...)

Balabanov, Oleksandr, Mehlig, Bernhard, Linander, Hampus

Bayesian posterior approximation with stochastic ensembles

arXiv.org Machine LearningJan-3-2024

To further reduce the computational effort in evaluating the approximate posterior, stochastic methods such as We introduce ensembles of stochastic neural networks to Monte Carlo dropout [47] and DropConnect [51] inference approximate the Bayesian posterior, combining stochastic have also been used extensively [13, 14, 40]. They benefit methods such as dropout with deep ensembles. The stochastic from computationally cheaper inference by virtue of sampling ensembles are formulated as families of distributions and stochastically from a single model. Formulated as a trained to approximate the Bayesian posterior with variational variational approximation to the posterior, dropout samples inference. We implement stochastic ensembles based from a family of parameter distributions where parameters on Monte Carlo dropout, DropConnect and a novel nonparametric can be randomly set to zero. Although this particular family version of dropout and evaluate them on a toy of distributions might seem unnatural [11], it turns out problem and CIFAR image classification. For both tasks, that the stochastic property can help to find more robust regions we test the quality of the posteriors directly against Hamiltonian of the parameter space, a fact well-known from a long Monte Carlo simulations. Our results show that history of using dropout as a regularization method.

ensemble, neural network, posterior, (15 more...)

doi: 10.1109/CVPR52729.2023.01317

2212.08123

Country:

North America > United States > New York > New York County > New York City (0.14)
North America > United States > California (0.04)
Europe > Sweden > Vaestra Goetaland > Gothenburg (0.04)
Europe > Sweden > Stockholm > Stockholm (0.04)

Genre: Research Report > New Finding (0.86)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.93)

Chugg, Ben, Wang, Hongjian, Ramdas, Aaditya

A unified recipe for deriving (time-uniform) PAC-Bayes bounds

arXiv.org Machine LearningJan-3-2024

We present a unified framework for deriving PAC-Bayesian generalization bounds. Unlike most previous literature on this topic, our bounds are anytime-valid (i.e., time-uniform), meaning that they hold at all stopping times, not only for a fixed sample size. Our approach combines four tools in the following order: (a) nonnegative supermartingales or reverse submartingales, (b) the method of mixtures, (c) the Donsker-Varadhan formula (or other convex duality principles), and (d) Ville's inequality. Our main result is a PAC-Bayes theorem which holds for a wide class of discrete stochastic processes. We show how this result implies time-uniform versions of well-known classical PAC-Bayes bounds, such as those of Seeger, McAllester, Maurer, and Catoni, in addition to many recent bounds. We also present several novel bounds. Our framework also enables us to relax traditional assumptions; in particular, we consider nonstationary loss functions and non-i.i.d. data. In sum, we unify the derivation of past bounds and ease the search for future bounds: one may simply check if our supermartingale or submartingale conditions are met and, if so, be guaranteed a (time-uniform) PAC-Bayes bound.

inequality, sequence, supermartingale, (15 more...)

2302.03421

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > New York (0.04)
(2 more...)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.92)

arXiv.org Artificial IntelligenceJan-3-2024

On the hierarchical Bayesian modelling of frequency response functions

Dardeno, T. A., Worden, K., Dervilis, N., Mills, R. S., Bull, L. A.

For situations that may benefit from information sharing among datasets, e.g., population-based SHM of similar structures, the hierarchical Bayesian approach provides a useful modelling structure. Hierarchical Bayesian models learn statistical distributions at the population (or parent) and the domain levels simultaneously, to bolster statistical strength among the parameters. As a result, variance is reduced among the parameter estimates, particularly when data are limited. In this paper, a combined probabilistic FRF model is developed for a small population of nominally-identical helicopter blades, using a hierarchical Bayesian structure, to support information transfer in the context of sparse data. The modelling approach is also demonstrated in a traditional SHM context, for a single helicopter blade exposed to varying temperatures, to show how the inclusion of physics-based knowledge can improve generalisation beyond the training data, in the context of scarce data. These models address critical challenges in SHM, by accommodating benign variations that present as differences in the underlying dynamics, while also considering (and utilising), the similarities among the domains.

frf, natural frequency, variance, (14 more...)

doi: 10.1016/j.ymssp.2023.111072

2307.06263

Country:

Europe > United Kingdom > England > South Yorkshire > Sheffield (0.04)
North America > United States > Florida > Palm Beach County > Boca Raton (0.04)
North America > United States > California > Santa Barbara County > Santa Barbara (0.04)
(4 more...)

Genre: Research Report (0.64)

Industry: Energy (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

arXiv.org Machine LearningJan-2-2024

SLEM: Machine Learning for Path Modeling and Causal Inference with Super Learner Equation Modeling

Vowels, Matthew J.

Causal inference is a crucial goal of science, enabling researchers to arrive at meaningful conclusions regarding the predictions of hypothetical interventions using observational data. Path models, Structural Equation Models (SEMs), and, more generally, Directed Acyclic Graphs (DAGs), provide a means to unambiguously specify assumptions regarding the causal structure underlying a phenomenon. Unlike DAGs, which make very few assumptions about the functional and parametric form, SEM assumes linearity. This can result in functional misspecification which prevents researchers from undertaking reliable effect size estimation. In contrast, we propose Super Learner Equation Modeling, a path modeling technique integrating machine learning Super Learner ensembles. We empirically demonstrate its ability to provide consistent and unbiased estimates of causal effects, its competitive performance for linear models when compared with SEM, and highlight its superiority over SEM when dealing with non-linear relationships. We provide open-source code, and a tutorial notebook with example usage, accentuating the easy-to-use nature of the method.

artificial intelligence, bayesian inference, machine learning, (18 more...)

2308.04365

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > New York (0.04)
Europe > Switzerland > Vaud > Lausanne (0.04)
(3 more...)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.68)