AITopics

2305.17557

Country:

North America > United States > Texas > Brazos County > College Station (0.04)
North America > United States > New York (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (1.00)

Industry:

Health & Medicine (0.67)
Banking & Finance (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Jimenez, Felix, Katzfuss, Matthias

Vecchia Gaussian Process Ensembles on Internal Representations of Deep Neural Networks

In recent years, deep neural networks (DNNs) have achieved remarkable success in various tasks such as image recognition, natural language processing, and speech recognition. However, despite their excellent performance, these models have certain limitations, such as their lack of uncertainty quantification (UQ). Much of UQ for DNNs is based on a Bayesian approach that models network weights as random variables [22] or has involved ensembles of networks [18]. Gaussian processes (GPs) provide natural UQ, but they lack the representation learning that makes DNNs successful. Standard GPs are known to scale poorly with large datasets, but GP approximations are plentiful and one such method is the Vecchia approximation [31, 16], which uses nearest-neighbor conditioning sets to exploit conditional independence among the data.

artificial intelligence, machine learning, representation, (20 more...)

2305.17063

Country:

North America > United States > Texas > Brazos County > College Station (0.14)
North America > United States > New York > New York County > New York City (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.66)

Lahoti, Aakash, Senapati, Spandan, Rajawat, Ketan, Koppel, Alec

Sharpened Lazy Incremental Quasi-Newton Method

We consider the finite sum minimization of $n$ strongly convex and smooth functions with Lipschitz continuous Hessians in $d$ dimensions. In many applications where such problems arise, including maximum likelihood estimation, empirical risk minimization, and unsupervised learning, the number of observations $n$ is large, and it becomes necessary to use incremental or stochastic algorithms whose per-iteration complexity is independent of $n$. Of these, the incremental/stochastic variants of the Newton method exhibit superlinear convergence, but incur a per-iteration complexity of $O(d^3)$, which may be prohibitive in large-scale settings. On the other hand, the incremental Quasi-Newton method incurs a per-iteration complexity of $O(d^2)$ but its superlinear convergence rate has only been characterized asymptotically. This work puts forth the Sharpened Lazy Incremental Quasi-Newton (SLIQN) method that achieves the best of both worlds: an explicit superlinear convergence rate with a per-iteration complexity of $O(d^2)$. Building upon the recently proposed Sharpened Quasi-Newton method, the proposed incremental variant incorporates a hybrid update strategy incorporating both classic and greedy BFGS updates. The proposed lazy update rule distributes the computational complexity between the iterations, so as to enable a per-iteration complexity of $O(d^2)$. Numerical tests demonstrate the superiority of SLIQN over all other incremental and stochastic Quasi-Newton variants.

artificial intelligence, bayesian inference, machine learning, (18 more...)

2305.17283

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
Asia > China > Heilongjiang Province > Daqing (0.04)

Genre: Research Report (0.82)

Industry: Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.54)

Chen, Daiwei, Chang, Weikai, Chaudhari, Pratik

Learning Capacity: A Measure of the Effective Dimensionality of a Model

We exploit a formal correspondence between thermodynamics and inference, where the number of samples can be thought of as the inverse temperature, to define a "learning capacity'' which is a measure of the effective dimensionality of a model. We show that the learning capacity is a tiny fraction of the number of parameters for many deep networks trained on typical datasets, depends upon the number of samples used for training, and is numerically consistent with notions of capacity obtained from the PAC-Bayesian framework. The test error as a function of the learning capacity does not exhibit double descent. We show that the learning capacity of a model saturates at very small and very large sample sizes; this provides guidelines, as to whether one should procure more data or whether one should search for new architectures, to improve performance. We show how the learning capacity can be used to understand the effective dimensionality, even for non-parametric models such as random forests and $k$-nearest neighbor classifiers.

artificial intelligence, bayesian inference, machine learning, (18 more...)

2305.17332

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > Pennsylvania (0.04)

Genre: Research Report > New Finding (0.93)

Industry: Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.66)

Lei, Mengying, Sun, Lijun

Bayesian Kernelized Tensor Factorization as Surrogate for Bayesian Optimization

Bayesian optimization (BO) primarily uses Gaussian processes (GP) as the key surrogate model, mostly with a simple stationary and separable kernel function such as the squared-exponential kernel with automatic relevance determination (SE-ARD). However, such simple kernel specifications are deficient in learning functions with complex features, such as being nonstationary, nonseparable, and multimodal. Approximating such functions using a local GP, even in a low-dimensional space, requires a large number of samples, not to mention in a high-dimensional setting. In this paper, we propose to use Bayesian Kernelized Tensor Factorization (BKTF) -- as a new surrogate model -- for BO in a $D$-dimensional Cartesian product space. Our key idea is to approximate the underlying $D$-dimensional solid with a fully Bayesian low-rank tensor CP decomposition, in which we place GP priors on the latent basis functions for each dimension to encode local consistency and smoothness. With this formulation, information from each sample can be shared not only with neighbors but also across dimensions. Although BKTF no longer has an analytical posterior, we can still efficiently approximate the posterior distribution through Markov chain Monte Carlo (MCMC) and obtain prediction and full uncertainty quantification (UQ). We conduct numerical experiments on both standard BO test functions and machine learning hyperparameter tuning problems, and our results show that BKTF offers a flexible and highly effective approach for characterizing complex functions with UQ, especially in cases where the initial sample size and budget are severely limited.

artificial intelligence, bayesian inference, machine learning, (14 more...)

2302.1451

Country:

North America > Canada > Quebec > Montreal (0.28)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(2 more...)

Genre: Research Report > New Finding (0.86)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.68)

Bloomfield, Robin, Rushby, John

Assessing Confidence with Assurance 2.0

An assurance case is intended to provide justifiable confidence in the truth of its top claim, which typically concerns safety or security. A natural question is then "how much" confidence does the case provide? We argue that confidence cannot be reduced to a single attribute or measurement. Instead, we suggest it should be based on attributes that draw on three different perspectives: positive, negative, and residual doubts. Positive Perspectives consider the extent to which the evidence and overall argument of the case combine to make a positive statement justifying belief in its claims. We set a high bar for justification, requiring it to be indefeasible. The primary positive measure for this is soundness, which interprets the argument as a logical proof. Confidence in evidence can be expressed probabilistically and we use confirmation measures to ensure that the "weight" of evidence crosses some threshold. In addition, probabilities can be aggregated from evidence through the steps of the argument using probability logics to yield what we call probabilistic valuations for the claims. Negative Perspectives record doubts and challenges to the case, typically expressed as defeaters, and their exploration and resolution. Assurance developers must guard against confirmation bias and should vigorously explore potential defeaters as they develop the case, and should record them and their resolution to avoid rework and to aid reviewers. Residual Doubts: the world is uncertain so not all potential defeaters can be resolved. We explore risks and may deem them acceptable or unavoidable. It is crucial however that these judgments are conscious ones and that they are recorded in the assurance case. This report examines the perspectives in detail and indicates how Clarissa, our prototype toolset for Assurance 2.0, assists in their evaluation.

argument, logic & formal reasoning, machine learning, (20 more...)

2205.04522

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
North America > United States > Virginia > Hampton (0.04)
(24 more...)

Genre: Research Report (1.00)

Industry:

Transportation > Air (1.00)
Law (1.00)
Health & Medicine > Therapeutic Area (1.00)
(4 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (1.00)
Information Technology > Software Engineering (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.92)
(3 more...)

Coping with low data availability for social media crisis message categorisation

Wang, Congcong

During crisis situations, social media allows people to quickly share information, including messages requesting help. This can be valuable to emergency responders, who need to categorise and prioritise these messages based on the type of assistance being requested. However, the high volume of messages makes it difficult to filter and prioritise them without the use of computational techniques. Fully supervised filtering techniques for crisis message categorisation typically require a large amount of annotated training data, but this can be difficult to obtain during an ongoing crisis and is expensive in terms of time and labour to create. This thesis focuses on addressing the challenge of low data availability when categorising crisis messages for emergency response. It first presents domain adaptation as a solution for this problem, which involves learning a categorisation model from annotated data from past crisis events (source domain) and adapting it to categorise messages from an ongoing crisis event (target domain). In many-to-many adaptation, where the model is trained on multiple past events and adapted to multiple ongoing events, a multi-task learning approach is proposed using pre-trained language models. This approach outperforms baselines and an ensemble approach further improves performance...

large language model, machine learning, natural language, (26 more...)

2305.17211

Country:

North America > United States > Texas (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.13)
(46 more...)

Genre:

Workflow (1.00)
Research Report > Promising Solution (1.00)
Research Report > New Finding (1.00)
(2 more...)

Industry:

Media (1.00)
Health & Medicine (1.00)
Education (1.00)
(6 more...)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Classification (1.00)
(9 more...)

Abernethy, Jacob, Agarwal, Alekh, Marinov, Teodor V., Warmuth, Manfred K.

A Mechanism for Sample-Efficient In-Context Learning for Sparse Retrieval Tasks

We study the phenomenon of \textit{in-context learning} (ICL) exhibited by large language models, where they can adapt to a new learning task, given a handful of labeled examples, without any explicit parameter optimization. Our goal is to explain how a pre-trained transformer model is able to perform ICL under reasonable assumptions on the pre-training process and the downstream tasks. We posit a mechanism whereby a transformer can achieve the following: (a) receive an i.i.d. sequence of examples which have been converted into a prompt using potentially-ambiguous delimiters, (b) correctly segment the prompt into examples and labels, (c) infer from the data a \textit{sparse linear regressor} hypothesis, and finally (d) apply this hypothesis on the given test example and return a predicted label. We establish that this entire procedure is implementable using the transformer mechanism, and we give sample complexity guarantees for this learning framework. Our empirical findings validate the challenge of segmentation, and we show a correspondence between our posited mechanisms and observed attention maps for step (c).

large language model, machine learning, natural language, (21 more...)

2305.1704

Genre: Research Report (0.64)

Industry: Leisure & Entertainment > Sports > Baseball (0.92)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)
(2 more...)

MixCE: Training Autoregressive Language Models by Mixing Forward and Reverse Cross-Entropies

Zhang, Shiyue, Wu, Shijie, Irsoy, Ozan, Lu, Steven, Bansal, Mohit, Dredze, Mark, Rosenberg, David

Autoregressive language models are trained by minimizing the cross-entropy of the model distribution Q relative to the data distribution P -- that is, minimizing the forward cross-entropy, which is equivalent to maximum likelihood estimation (MLE). We have observed that models trained in this way may "over-generalize", in the sense that they produce non-human-like text. Moreover, we believe that reverse cross-entropy, i.e., the cross-entropy of P relative to Q, is a better reflection of how a human would evaluate text generated by a model. Hence, we propose learning with MixCE, an objective that mixes the forward and reverse cross-entropies. We evaluate models trained with this objective on synthetic data settings (where P is known) and real data, and show that the resulting models yield better generated text without complex decoding strategies. Our code and models are publicly available at https://github.com/bloomberg/mixce-acl2023

justification, machine learning, natural language, (19 more...)

2305.16958

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > France (0.04)
Oceania > Australia > Victoria > Melbourne (0.04)
(5 more...)

Genre: Research Report > Experimental Study (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.48)

Chandra, Kartik, Li, Tzu-Mao, Tenenbaum, Josh, Ragan-Kelley, Jonathan

Acting as Inverse Inverse Planning

Great storytellers know how to take us on a journey. They direct characters to act -- not necessarily in the most rational way -- but rather in a way that leads to interesting situations, and ultimately creates an impactful experience for audience members looking on. If audience experience is what matters most, then can we help artists and animators *directly* craft such experiences, independent of the concrete character actions needed to evoke those experiences? In this paper, we offer a novel computational framework for such tools. Our key idea is to optimize animations with respect to *simulated* audience members' experiences. To simulate the audience, we borrow an established principle from cognitive science: that human social intuition can be modeled as "inverse planning," the task of inferring an agent's (hidden) goals from its (observed) actions. Building on this model, we treat storytelling as "*inverse* inverse planning," the task of choosing actions to manipulate an inverse planner's inferences. Our framework is grounded in literary theory, naturally capturing many storytelling elements from first principles. We give a series of examples to demonstrate this, with supporting evidence from human subject studies.

animation, artificial intelligence, machine learning, (17 more...)

doi: 10.1145/3588432.3591510

2305.16913

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.16)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.05)
(5 more...)

Genre: Research Report > New Finding (0.68)

Industry:

Leisure & Entertainment (1.00)
Media > Film (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
(3 more...)