AITopics

2407.04819

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)
North America > Canada > Ontario > Toronto (0.04)
(6 more...)

Genre: Research Report > New Finding (0.45)

Industry:

Health & Medicine > Therapeutic Area > Neurology (0.67)
Information Technology (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.86)

van Zwol, Björn, Jefferson, Ro, Broek, Egon L. van den

Predictive Coding Networks and Inference Learning: Tutorial and Survey

arXiv.org Machine LearningJul-4-2024

Recent years have witnessed a growing call for renewed emphasis on neuroscience-inspired approaches in artificial intelligence research, under the banner of $\textit{NeuroAI}$. This is exemplified by recent attention gained by predictive coding networks (PCNs) within machine learning (ML). PCNs are based on the neuroscientific framework of predictive coding (PC), which views the brain as a hierarchical Bayesian inference model that minimizes prediction errors from feedback connections. PCNs trained with inference learning (IL) have potential advantages to traditional feedforward neural networks (FNNs) trained with backpropagation. While historically more computationally intensive, recent improvements in IL have shown that it can be more efficient than backpropagation with sufficient parallelization, making PCNs promising alternatives for large-scale applications and neuromorphic hardware. Moreover, PCNs can be mathematically considered as a superset of traditional FNNs, which substantially extends the range of possible architectures for both supervised and unsupervised learning. In this work, we provide a comprehensive review as well as a formal specification of PCNs, in particular placing them in the context of modern ML methods, and positioning PC as a versatile and promising framework worthy of further study by the ML community.

artificial intelligence, machine learning, pcn, (14 more...)

2407.04117

Country:

North America > United States (1.00)
Europe > United Kingdom > England (0.67)

Genre:

Research Report (1.00)
Overview (0.86)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Energy > Oil & Gas (1.00)
Law > Litigation (0.84)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.93)

Kipnis, Alex, Voudouris, Konstantinos, Buschoff, Luca M. Schulze, Schulz, Eric

$\texttt{metabench}$ -- A Sparse Benchmark to Measure General Ability in Large Language Models

arXiv.org Artificial IntelligenceJul-4-2024

Large Language Models (LLMs) vary in their abilities on a range of tasks. Initiatives such as the $\texttt{Open LLM Leaderboard}$ aim to quantify these differences with several large benchmarks (sets of test items to which an LLM can respond either correctly or incorrectly). However, high correlations within and between benchmark scores suggest that (1) there exists a small set of common underlying abilities that these benchmarks measure, and (2) items tap into redundant information and the benchmarks may thus be considerably compressed. We use data from $n > 5000$ LLMs to identify the most informative items of six benchmarks, ARC, GSM8K, HellaSwag, MMLU, TruthfulQA and WinoGrande (with $d=28,632$ items in total). From them we distill a sparse benchmark, $\texttt{metabench}$, that has less than $3\%$ of the original size of all six benchmarks combined. This new sparse benchmark goes beyond point scores by yielding estimators of the underlying benchmark-specific abilities. We show that these estimators (1) can be used to reconstruct each original $\textit{individual}$ benchmark score with, on average, $1.5\%$ root mean square error (RMSE), (2) reconstruct the original $\textit{total}$ score with $0.8\%$ RMSE, and (3) have a single underlying common factor whose Spearman correlation with the total score is $r = 0.93$.

benchmark, latent ability, llm, (15 more...)

2407.12844

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
Europe > Austria > Vienna (0.14)
North America > United States > Illinois > Cook County > Evanston (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine (0.68)
Education (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)

Nomura, Reika, Vermare, Louise A. Hirao, Fujita, Saneiki, Rim, Donsub, Moriguchi, Shuji, LeVeque, Randall J., Terada, Kenjiro

On the performance of sequential Bayesian update for database of diverse tsunami scenarios

arXiv.org Artificial IntelligenceJul-4-2024

Although the sequential tsunami scenario detection framework was validated in our previous work, several tasks remain to be resolved from a practical point of view. This study aims to evaluate the performance of the previous tsunami scenario detection framework using a diverse database consisting of complex fault rupture patterns with heterogeneous slip distributions. Specifically, we compare the effectiveness of scenario superposition to that of the previous most likely scenario detection method. Additionally, how the length of the observation time window influences the accuracy of both methods is analyzed. We utilize an existing database comprising 1771 tsunami scenarios targeting the city Westport (WA, U.S.), which includes synthetic wave height records and inundation distributions as the result of fault rupture in the Cascadia subduction zone. The heterogeneous patterns of slips used in the database increase the diversity of the scenarios and thus make it a proper database for evaluating the performance of scenario superposition. To assess the performance, we consider various observation time windows shorter than 15 minutes and divide the database into five testing and learning sets. The evaluation accuracy of the maximum offshore wave, inundation depth, and its distribution is analyzed to examine the advantages of the scenario superposition method over the previous method. We introduce the dynamic time warping (DTW) method as an additional benchmark and compare its results to that of the Bayesian scenario detection method.

database, prediction, scenario, (14 more...)

2407.03631

Country:

South America > Chile (0.04)
North America > United States > Washington (0.04)
North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
(3 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.95)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.66)

arXiv.org Artificial IntelligenceJul-4-2024

Alice's Adventures in a Differentiable Wonderland -- Volume I, A Tour of the Land

Scardapane, Simone

Neural networks surround us, in the form of large language models, speech transcription systems, molecular discovery algorithms, robotics, and much more. Stripped of anything else, neural networks are compositions of differentiable primitives, and studying them means learning how to program and how to interact with these models, a particular example of what is called differentiable programming. This primer is an introduction to this fascinating field imagined for someone, like Alice, who has just ventured into this strange differentiable wonderland. I overview the basics of optimizing a function via automatic differentiation, and a selection of the most common designs for handling sequences, graphs, texts, and audios. The focus is on a intuitive, self-contained introduction to the most important design techniques, including convolutional, attentional, and recurrent blocks, hoping to bridge the gap between theory and code (PyTorch and JAX) and leaving the reader capable of understanding some of the most advanced models out there, such as large language models (LLMs) and multimodal architectures.

attention operation, convolutional network, directional derivative, (16 more...)

2404.17625

Country:

Europe > France (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
(4 more...)

Genre:

Overview (0.92)
Summary/Review (0.92)
Research Report > New Finding (0.67)
(2 more...)

Industry: Education (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(3 more...)

Tan, Kevin, Hooker, Giles, Ionides, Edward L.

Accelerated Inference for Partially Observed Markov Processes using Automatic Differentiation

arXiv.org Machine LearningJul-3-2024

Automatic differentiation (AD) has driven recent advances in machine learning, including deep neural networks and Hamiltonian Markov Chain Monte Carlo methods. Partially observed nonlinear stochastic dynamical systems have proved resistant to AD techniques because widely used particle filter algorithms yield an estimated likelihood function that is discontinuous as a function of the model parameters. We show how to embed two existing AD particle filter methods in a theoretical framework that provides an extension to a new class of algorithms. This new class permits a bias/variance tradeoff and hence a mean squared error substantially lower than the existing algorithms. We develop likelihood maximization algorithms suited to the Monte Carlo properties of the AD gradient estimate. Our algorithms require only a differentiable simulator for the latent dynamic system; by contrast, most previous approaches to AD likelihood maximization for particle filters require access to the system's transition probabilities. Numerical results indicate that a hybrid algorithm that uses AD to refine a coarse solution from an iterated filtering algorithm show substantial improvement on current state-of-the-art methods for a challenging scientific benchmark problem.

gradient estimate, particle filter, variance, (14 more...)

2407.03085

Country:

North America > Haiti (0.14)
Asia > Bangladesh > Dhaka Division > Dhaka District > Dhaka (0.04)
North America > United States > New York (0.04)
(4 more...)

Genre: Research Report > New Finding (0.92)

Industry: Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.69)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.93)

arXiv.org Artificial IntelligenceJul-2-2024

Sparse Variational Contaminated Noise Gaussian Process Regression with Applications in Geomagnetic Perturbations Forecasting

Iong, Daniel, McAnear, Matthew, Qu, Yuezhou, Zou, Shasha, Toth, Gabor, Chen, Yang

GPR models can also incorporate prior knowledge through selecting an appropriate kernel function. GPR commonly assumes a homoscedastic Gaussian distribution for observation noise because this yields an analytical form for the posterior predictive prediction. However, Bayesian inference based on Gaussian noise distributions is known to be sensitive to outliers which are defined as observations that strongly deviate from model assumptions. In regression, outliers can arise from relevant inputs being absent from the model, measurement error, and other unknown sources. These outliers are associated with unconsidered sources of variation that affect the target variable sporadically. In this case, the observation model is unable to distinguish between random noise and systematic effects not captured by the model. In the context of GPR under Gaussian noise, outliers can heavily influence the posterior predictive distribution, resulting in a biased estimate of the mean function and overly confident prediction intervals. Therefore, robust observation models are desired in the presence of potential outliers.

approximation, hyperparameter, outlier, (14 more...)

2402.1757

Country:

North America > United States > Michigan (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Asia > Middle East > Jordan (0.04)
(5 more...)

Genre: Research Report (1.00)

Industry:

Energy (0.67)
Transportation (0.48)
Consumer Products & Services > Travel (0.47)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.68)
(2 more...)

Fellaji, Mohammed, Pennerath, Frédéric

The Epistemic Uncertainty Hole: an issue of Bayesian Neural Networks

arXiv.org Machine LearningJul-2-2024

More precisely, we observe that the epistemic uncertainty In many applications of Machine Learning, optimizing collapses literally in the presence of large models and solely the performance metrics of the predictive model, sometimes also of little training data, while we expect the such as the accuracy, can result in overconfident interpretations exact opposite behaviour. This phenomenon, which we call of erroneous outcomes, and thus, hazardous decisions "epistemic uncertainty hole", is all the more problematic as in case of critical domains. Therefore, being able to map the it undermines the entire applicative potential of BDL, which model outputs to some uncertainty quantification metrics, if is based precisely on the use of epistemic uncertainty. As well calibrated, is essential from a decision making point of an example, we evaluate the practical consequences of this view. When dealing with Deep Learning models, Bayesian uncertainty hole on one of the main applications of BDL, Deep Learning (BDL) [11, 12, 18, 10, 2], i.e. the application namely the detection of out-of-distribution samples. of Bayesian inference to deep neural networks, appears to be one of the keys to estimate such well-calibrated uncertainties.

cifar10, epistemic uncertainty, neural network, (15 more...)

2407.01985

Country:

North America > United States > New York > New York County > New York City (0.14)
North America > United States > California (0.04)
North America > Canada > Alberta > Census Division No. 15 > Improvement District No. 9 > Banff (0.04)
(2 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.93)

arXiv.org Artificial IntelligenceJul-1-2024

Sequential Manipulation Against Rank Aggregation: Theory and Algorithm

Ma, Ke, Xu, Qianqian, Zeng, Jinshan, Liu, Wei, Cao, Xiaochun, Sun, Yingfei, Huang, Qingming

Rank aggregation with pairwise comparisons is widely encountered in sociology, politics, economics, psychology, sports, etc . Given the enormous social impact and the consequent incentives, the potential adversary has a strong motivation to manipulate the ranking list. However, the ideal attack opportunity and the excessive adversarial capability cause the existing methods to be impractical. To fully explore the potential risks, we leverage an online attack on the vulnerable data collection process. Since it is independent of rank aggregation and lacks effective protection mechanisms, we disrupt the data collection process by fabricating pairwise comparisons without knowledge of the future data or the true distribution. From the game-theoretic perspective, the confrontation scenario between the online manipulator and the ranker who takes control of the original data source is formulated as a distributionally robust game that deals with the uncertainty of knowledge. Then we demonstrate that the equilibrium in the above game is potentially favorable to the adversary by analyzing the vulnerability of the sampling algorithms such as Bernoulli and reservoir methods. According to the above theoretical analysis, different sequential manipulation policies are proposed under a Bayesian decision framework and a large class of parametric pairwise comparison models. For attackers with complete knowledge, we establish the asymptotic optimality of the proposed policies. To increase the success rate of the sequential manipulation with incomplete knowledge, a distributionally robust estimator, which replaces the maximum likelihood estimation in a saddle point problem, provides a conservative data generation solution. Finally, the corroborating empirical evidence shows that the proposed method manipulates the results of rank aggregation methods in a sequential manner.

adversary, class file, pairwise comparison, (15 more...)

doi: 10.1109/TPAMI.2024.3416710

2407.01916

Country:

Asia > China > Beijing > Beijing (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)
North America > United States (0.04)
(7 more...)

Genre:

Personal (0.92)
Research Report > New Finding (0.67)

Industry:

Information Technology > Security & Privacy (0.67)
Government > Voting & Elections (0.45)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.86)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.86)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.67)

Shi, Xinxing, Baldwin-McDonald, Thomas, Álvarez, Mauricio A.

Adaptive RKHS Fourier Features for Compositional Gaussian Process Models

arXiv.org Machine LearningJul-1-2024

Gaussian Processes (GPs) provide a principled Bayesian framework for function approximation, making them particularly useful in many applications requiring uncertainty calibration [Rasmussen and Williams, 2006], such as Bayesian optimisation [Snoek et al., 2012] and time-series analysis [Roberts et al., 2013]. Despite offering reasonable uncertainty estimation, shallow GPs often struggle to model complex, non-stationary processes present in practical applications. To overcome this limitation, Deep Gaussian Processes (DGPs) employ a compositional architecture by stacking multiple GP layers, thereby enhancing representational power while preserving the model's intrinsic capability to quantify uncertainty [Damianou and Lawrence, 2013]. However, the conventional variational formulation of DGPs heavily depends on local inducing point approximations across intermediate GP layers [Titsias, 2009, Salimbeni and Deisenroth, 2017], which hinder the model from capturing the global structures commonly found in real-world scenarios. Incorporating Fourier features into GP models has shown promise in addressing this challenge in GP inference due to the periodic nature of these features. A line of research uses Random Fourier Features (RFFs, [Rahimi and Recht, 2007]) of stationary kernels to convert the original (deep) GPs into Bayesian networks in weight space [Lázaro-Gredilla et al., 2010, Gal and Turner, 2015, Cutajar et al., 2017]. Building on this concept within a sparse variational GP framework, recent advancements in inter-domain GPs [Lázaro-Gredilla and Figueiras-Vidal, 2009a, Van der Wilk et al., 2020] directly approximate the posterior of the original GPs by introducing fixed Variational Fourier Features (VFFs) through process projection onto a Reproducing Kernel Hilbert Space (RKHS)[Hensman et al., 2018, Rudner et al., 2020]. VFFs are derived by projecting GPs onto a different domain.

fourier feature, gaussian process, kernel, (14 more...)

2407.01856

Country:

North America > United States > Virginia > Arlington County > Arlington (0.04)
North America > United States > California > Orange County > Irvine (0.04)
Europe > Iceland > Capital Region > Reykjavik (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.54)