AITopics

2411.06646

Country:

North America > United States > Georgia > Fulton County > Atlanta (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Feng, Brandon R., Majumder, Reetam, Reich, Brian J., Abba, Mohamed A.

Amortized Bayesian Local Interpolation NetworK: Fast covariance parameter estimation for Gaussian Processes

Gaussian processes (GPs) are a ubiquitous tool for geostatistical modeling with high levels of flexibility and interpretability, and the ability to make predictions at unseen spatial locations through a process called Kriging. Estimation of Kriging weights relies on the inversion of the process' covariance matrix, creating a computational bottleneck for large spatial datasets. In this paper, we propose an Amortized Bayesian Local Interpolation NetworK (A-BLINK) for fast covariance parameter estimation, which uses two pre-trained deep neural networks to learn a mapping from spatial location coordinates and covariance function parameters to Kriging weights and the spatial variance, respectively. The fast prediction time of these networks allows us to bypass the matrix inversion step, creating large computational speedups over competing methods in both frequentist and Bayesian settings, and also provides full posterior inference and predictions using Markov chain Monte Carlo sampling methods. We show significant increases in computational efficiency over comparable scalable GP methodology in an extensive simulation study with lower parameter estimation error. The efficacy of our approach is also demonstrated using a temperature dataset of US climate normals for 1991--2020 based on over 7,000 weather stations.

artificial intelligence, bayesian inference, machine learning, (20 more...)

2411.06324

Country: North America > United States (0.14)

Genre: Research Report (0.65)

Industry: Energy > Oil & Gas > Upstream (0.91)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Nguyen, Nicolas, Ridgway, James, Vernade, Claire

Variational Bayes Portfolio Construction

Portfolio construction is the science of balancing reward and risk; it is at the core of modern finance. In this paper, we tackle the question of optimal decision-making within a Bayesian paradigm, starting from a decision-theoretic formulation. Despite the inherent intractability of the optimal decision in any interesting scenarios, we manage to rewrite it as a saddle-point problem. Leveraging the literature on variational Bayes (VB), we propose a relaxation of the original problem. This novel methodology results in an efficient algorithm that not only performs well but is also provably convergent. Furthermore, we provide theoretical results on the statistical consistency of the resulting decision with the optimal Bayesian decision. Using real data, our proposal significantly enhances the speed and scalability of portfolio selection problems. We benchmark our results against state-of-the-art algorithms, as well as a Monte Carlo algorithm targeting the optimal decision.

artificial intelligence, decision support system, machine learning, (18 more...)

2411.06192

Country:

Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.14)
Asia > Middle East > Jordan (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Genre: Research Report > New Finding (0.48)

Industry: Banking & Finance > Trading (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Decision Support Systems (0.88)

Dilworth, Robert, Gudla, Charan

Harnessing PU Learning for Enhanced Cloud-based DDoS Detection: A Comparative Analysis

arXiv.org Artificial IntelligenceNov-9-2024

This paper explores the application of Positive-Unlabeled (PU) learning for enhanced Distributed Denial-of-Service (DDoS) detection in cloud environments. Utilizing the $\texttt{BCCC-cPacket-Cloud-DDoS-2024}$ dataset, we implement PU learning with four machine learning algorithms: XGBoost, Random Forest, Support Vector Machine, and Na\"{i}ve Bayes. Our results demonstrate the superior performance of ensemble methods, with XGBoost and Random Forest achieving $F_{1}$ scores exceeding 98%. We quantify the efficacy of each approach using metrics including $F_{1}$ score, ROC AUC, Recall, and Precision. This study bridges the gap between PU learning and cloud-based anomaly detection, providing a foundation for addressing Context-Aware DDoS Detection in multi-cloud environments. Our findings highlight the potential of PU learning in scenarios with limited labeled data, offering valuable insights for developing more robust and adaptive cloud security mechanisms.

artificial intelligence, learning, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2410.1838

Country: North America > United States > Mississippi > Oktibbeha County > Starkville (0.04)

Genre: Research Report > New Finding (1.00)

Industry:

Information Technology > Services (1.00)
Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.31)

Variational Low-Rank Adaptation Using IVON

Cong, Bai, Daheim, Nico, Shen, Yuesong, Cremers, Daniel, Yokota, Rio, Khan, Mohammad Emtiyaz, Möllenhoff, Thomas

We show that variational learning can significantly improve the accuracy and calibration of Low-Rank Adaptation (LoRA) without a substantial increase in the cost. We replace AdamW by the Improved Variational Online Newton (IVON) algorithm to finetune large language models. For Llama-2 with 7 billion parameters, IVON improves the accuracy over AdamW by 2.8% and expected calibration error by 4.6%. The accuracy is also better than the other Bayesian alternatives, yet the cost is lower and the implementation is easier. Our work provides additional evidence for the effectiveness of IVON for large language models.

ivon, large language model, machine learning, (16 more...)

2411.04421

Country:

Europe > Germany > Bavaria > Upper Bavaria > Munich (0.05)
Europe > Germany > Hesse > Darmstadt Region > Darmstadt (0.04)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)

Genre: Research Report (1.00)

Industry: Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.91)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Mutual-energy inner product optimization method for constructing feature coordinates and image classification in Machine Learning

Wang, Yuanxiu

As a key task in machine learning, data classification is essentially to find a suitable coordinate system to represent data features of different classes of samples. This paper proposes the mutual-energy inner product optimization method for constructing a feature coordinate system. First, by analyzing the solution space and eigenfunctions of partial differential equations describing a non-uniform membrane, the mutual-energy inner product is defined. Second, by expressing the mutual-energy inner product as a series of eigenfunctions, it shows a significant advantage of enhancing low-frequency features and suppressing high-frequency noise, compared with the Euclidean inner product. And then, a mutual-energy inner product optimization model is built to extract data features, and convexity and concavity properties of its objective function are discussed. Next, by combining the finite element method, a stable and efficient sequential linearization algorithm is constructed to solve the optimization model. This algorithm only solves equations including positive definite symmetric matrix and linear programming with a few constraints, and its vectorized implementation is discussed. Finally, the mutual-energy inner product optimization method is used to construct feature coordinates, and multi-class Gaussian classifiers are trained on the MINST training set. Good prediction results of Gaussian classifiers are achieved on the MINST test set.

artificial intelligence, expression, machine learning, (16 more...)

2411.061

Country:

North America > United States > California > Alameda County > Berkeley (0.14)
South America (0.04)
North America > United States > New York (0.04)
(2 more...)

Genre: Research Report (0.50)

Industry: Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Sengupta, Ayan, Seth, Vaibhav, Pathak, Arinjay, Raman, Natraj, Gopalakrishnan, Sriram, Chakraborty, Tanmoy

Robust and Efficient Fine-tuning of LLMs with Bayesian Reparameterization of Low-Rank Adaptation

arXiv.org Artificial IntelligenceNov-8-2024

Large Language Models (LLMs) are highly resource-intensive to fine-tune due to their enormous size. While low-rank adaptation is a prominent parameter-efficient fine-tuning approach, it suffers from sensitivity to hyperparameter choices, leading to instability in model performance on fine-tuning downstream tasks. This paper highlights the importance of effective parameterization in low-rank fine-tuning to reduce estimator variance and enhance the stability of final model outputs. We propose MonteCLoRA, an efficient fine-tuning technique, employing Monte Carlo estimation to learn an unbiased posterior estimation of low-rank parameters with low expected variance, which stabilizes fine-tuned LLMs with only O(1) additional parameters. MonteCLoRA shows significant improvements in accuracy and robustness, achieving up to 3.8% higher accuracy and 8.6% greater robustness than existing efficient fine-tuning methods on natural language understanding tasks with pre-trained RoBERTa-base. Furthermore, in generative tasks with pre-trained LLaMA-1-7B, MonteCLoRA demonstrates robust zero-shot performance with 50% lower variance than the contemporary efficient fine-tuning methods. The theoretical and empirical results presented in the paper underscore how parameterization and hyperpriors balance exploration-exploitation in the low-rank parametric space, therefore leading to more optimal and robust parameter estimation during efficient fine-tuning.

large language model, machine learning, monteclora, (18 more...)

arXiv.org Artificial Intelligence

2411.04358

Country:

Asia > India (0.14)
Europe (0.14)

Genre: Research Report (1.00)

Industry:

Education (0.46)
Energy > Oil & Gas > Upstream (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

da Silva, Tiago, de Souza, Daniel Augusto, Mesquita, Diego

Streaming Bayes GFlowNets

arXiv.org Artificial IntelligenceNov-8-2024

Bayes' rule naturally allows for inference refinement in a streaming fashion, without the need to recompute posteriors from scratch whenever new data arrives. In principle, Bayesian streaming is straightforward: we update our prior with the available data and use the resulting posterior as a prior when processing the next data chunk. In practice, however, this recipe entails i) approximating an intractable posterior at each time step; and ii) encapsulating results appropriately to allow for posterior propagation. For continuous state spaces, variational inference (VI) is particularly convenient due to its scalability and the tractability of variational posteriors. For discrete state spaces, however, state-of-the-art VI results in analytically intractable approximations that are ill-suited for streaming settings. To enable streaming Bayesian inference over discrete parameter spaces, we propose streaming Bayes GFlowNets (abbreviated as SB-GFlowNets) by leveraging the recently proposed GFlowNets -- a powerful class of amortized samplers for discrete compositional objects. Notably, SB-GFlowNet approximates the initial posterior using a standard GFlowNet and subsequently updates it using a tailored procedure that requires only the newly observed data. Our case studies in linear preference learning and phylogenetic inference showcase the effectiveness of SB-GFlowNets in sampling from an unnormalized posterior in a streaming setting. As expected, we also observe that SB-GFlowNets is significantly faster than repeatedly training a GFlowNet from scratch to sample from the full posterior.

artificial intelligence, bayesian inference, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2411.05899

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)
(3 more...)

Genre: Research Report > Experimental Study (1.00)

Industry:

Health & Medicine (0.67)
Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.66)

Fruytier, Quentin, Mokhtari, Aryan, Sanghavi, Sujay

Learning Mixtures of Experts with EM

arXiv.org Machine LearningNov-8-2024

Mixtures of Experts (MoE) are Machine Learning models that involve partitioning the input space, with a separate "expert" model trained on each partition. Recently, MoE have become popular as components in today's large language models as a means to reduce training and inference costs. There, the partitioning function and the experts are both learnt jointly via gradient descent on the log-likelihood. In this paper we focus on studying the efficiency of the Expectation Maximization (EM) algorithm for the training of MoE models. We first rigorously analyze EM for the cases of linear or logistic experts, where we show that EM is equivalent to Mirror Descent with unit step size and a Kullback-Leibler Divergence regularizer. This perspective allows us to derive new convergence results and identify conditions for local linear convergence based on the signal-to-noise ratio (SNR). Experiments on synthetic and (small-scale) real-world data show that EM outperforms the gradient descent algorithm both in terms of convergence rate and the achieved accuracy.

artificial intelligence, exp, machine learning, (20 more...)

2411.06056

Country:

North America > United States > Texas > Travis County > Austin (0.14)
Asia > Middle East > Jordan (0.04)
North America > United States > California (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

arXiv.org Machine LearningNov-8-2024

Discovering Latent Structural Causal Models from Spatio-Temporal Data

Wang, Kun, Varambally, Sumanth, Watson-Parris, Duncan, Ma, Yi-An, Yu, Rose

Many important phenomena in scientific fields such as climate, neuroscience, and epidemiology are naturally represented as spatiotemporal gridded data with complex interactions. For example, in climate science, researchers aim to uncover how large-scale events, such as the North Atlantic Oscillation (NAO) and the Antarctic Oscillation (AAO), influence other global processes. Inferring causal relationships from these data is a challenging problem compounded by the high dimensionality of such data and the correlations between spatially proximate points. We present SPACY (SPAtiotemporal Causal discoverY), a novel framework based on variational inference, designed to explicitly model latent time-series and their causal relationships from spatially confined modes in the data. Our method uses an end-to-end training process that maximizes an evidence-lower bound (ELBO) for the data likelihood. Theoretically, we show that, under some conditions, the latent variables are identifiable up to transformation by an invertible matrix. Empirically, we show that SPACY outperforms state-of-the-art baselines on synthetic data, remains scalable for large grids, and identifies key known phenomena from real-world climate data.

artificial intelligence, machine learning, time sery, (17 more...)

2411.05331

Country:

Oceania > Australia (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Japan > Honshū > Tōhoku > Iwate Prefecture > Morioka (0.04)
(8 more...)

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine > Therapeutic Area > Neurology (0.48)
Government > Regional Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.92)
(2 more...)