AITopics

Deep Learning models have achieved remarkable performance in tasks such as image classification or generation, often surpassing human accuracy. However, they can struggle to learn new tasks and update their knowledge without access to previous data, leading to a significant loss of accuracy known as Catastrophic Forgetting (CF). This phenomenon was first observed by McCloskey and Cohen in 1989 and remains an active research topic. Incremental learning without forgetting is widely recognized as a crucial aspect in building better AI systems, as it allows models to adapt to new tasks without losing the ability to perform previously learned ones. This article surveys recent studies that tackle CF in modern Deep Learning models that use gradient descent as their learning algorithm. Although several solutions have been proposed, a definitive solution or consensus on assessing CF is yet to be established. The article provides a comprehensive review of recent solutions, proposes a taxonomy to organize them, and identifies research gaps in this area.

international conference, learning, proceedings, (11 more...)

2312.10549

Country:

North America > Canada > Quebec > Montreal (0.14)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
North America > United States > New York > New York County > New York City (0.04)
(36 more...)

Genre:

Overview (1.00)
Research Report > Promising Solution (0.92)
Research Report > New Finding (0.67)

Industry:

Education > Educational Setting > Online (0.92)
Health & Medicine (0.92)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.45)

Zhdanov, Maksim, Dereka, Stanislav, Kolesnikov, Sergey

Unveiling Empirical Pathologies of Laplace Approximation for Uncertainty Estimation

In this paper, we critically evaluate Bayesian methods for uncertainty estimation in deep learning, focusing on the widely applied Laplace approximation and its variants. Our findings reveal that the conventional method of fitting the Hessian matrix negatively impacts out-of-distribution (OOD) detection efficiency. We propose a different point of view, asserting that focusing solely on optimizing prior precision can yield more accurate uncertainty estimates in OOD detection while preserving adequate calibration metrics. Moreover, we demonstrate that this property is not connected to the training stage of a model but rather to its intrinsic properties. Through extensive experimental evaluation, we establish the superiority of our simplified approach over traditional methods in the out-of-distribution domain.

detection, laplace approximation, neural network, (13 more...)

2312.10464

Country: Europe > Russia > Central Federal District > Moscow Oblast > Moscow (0.04)

Genre: Research Report > New Finding (0.34)

Industry: Health & Medicine > Diagnostic Medicine (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.68)

Horii, Shunsuke, Chikahara, Yoichi

Uncertainty Quantification in Heterogeneous Treatment Effect Estimation with Gaussian-Process-Based Partially Linear Model

Estimating heterogeneous treatment effects across individuals has attracted growing attention as a statistical tool for performing critical decision-making. We propose a Bayesian inference framework that quantifies the uncertainty in treatment effect estimation to support decision-making in a relatively small sample size setting. Our proposed model places Gaussian process priors on the nonparametric components of a semiparametric model called a partially linear model. This model formulation has three advantages. First, we can analytically compute the posterior distribution of a treatment effect without relying on the computationally demanding posterior approximation. Second, we can guarantee that the posterior distribution concentrates around the true one as the sample size goes to infinity. Third, we can incorporate prior knowledge about a treatment effect into the prior distribution, improving the estimation efficiency. Our experimental results show that even in the small sample size setting, our method can accurately estimate the heterogeneous treatment effects and effectively quantify its estimation uncertainty.

learner, linear model, treatment setup, (11 more...)

2312.10435

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > Florida > Palm Beach County > Boca Raton (0.04)
(2 more...)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.66)

Industry:

Health & Medicine > Therapeutic Area (0.46)
Health & Medicine > Public Health (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.68)

Active Inference and Intentional Behaviour

Friston, Karl J., Salvatori, Tommaso, Isomura, Takuya, Tschantz, Alexander, Kiefer, Alex, Verbelen, Tim, Koudahl, Magnus, Paul, Aswin, Parr, Thomas, Razi, Adeel, Kagan, Brett, Buckley, Christopher L., Ramstead, Maxwell J. D.

Recent advances in theoretical biology suggest that basal cognition and sentient behaviour are emergent properties of in vitro cell cultures and neuronal networks, respectively. Such neuronal networks spontaneously learn structured behaviours in the absence of reward or reinforcement. In this paper, we characterise this kind of self-organisation through the lens of the free energy principle, i.e., as self-evidencing. We do this by first discussing the definitions of reactive and sentient behaviour in the setting of active inference, which describes the behaviour of agents that model the consequences of their actions. We then introduce a formal account of intentional behaviour, that describes agents as driven by a preferred endpoint or goal in latent state-spaces. We then investigate these forms of (reactive, sentient, and intentional) behaviour using simulations. First, we simulate the aforementioned in vitro experiments, in which neuronal cultures spontaneously learn to play Pong, by implementing nested, free energy minimising processes. The simulations are then used to deconstruct the ensuing predictive behaviour, leading to the distinction between merely reactive, sentient, and intentional behaviour, with the latter formalised in terms of inductive planning. This distinction is further studied using simple machine learning benchmarks (navigation in a grid world and the Tower of Hanoi problem), that show how quickly and efficiently adaptive behaviour emerges under an inductive form of active inference.

agent, free energy, inference, (15 more...)

2312.07547

Country:

Asia > Vietnam > Hanoi > Hanoi (0.25)
North America > United States > California > Los Angeles County > Los Angeles (0.14)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
(7 more...)

Genre: Research Report (1.00)

Industry:

Leisure & Entertainment > Games (1.00)
Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

arXiv.org Machine LearningDec-15-2023

Learning to Infer Unobserved Behaviors: Estimating User's Preference for a Site over Other Sites

Sinha, Atanu R, Anand, Tanay, Maheshwari, Paridhi, Lakshmy, A V, Jain, Vishal

A site's recommendation system relies on knowledge of its users' preferences to offer relevant recommendations to them. These preferences are for attributes that comprise items and content shown on the site, and are estimated from the data of users' interactions with the site. Another form of users' preferences is material too, namely, users' preferences for the site over other sites, since that shows users' base level propensities to engage with the site. Estimating users' preferences for the site, however, faces major obstacles because (a) the focal site usually has no data of its users' interactions with other sites; these interactions are users' unobserved behaviors for the focal site; and (b) the Machine Learning literature in recommendation does not offer a model of this situation. Even if (b) is resolved, the problem in (a) persists since without access to data of its users' interactions with other sites, there is no ground truth for evaluation. Moreover, it is most useful when (c) users' preferences for the site can be estimated at the individual level, since the site can then personalize recommendations to individual users. We offer a method to estimate individual user's preference for a focal site, under this premise. In particular, we compute the focal site's share of a user's online engagements without any data from other sites. We show an evaluation framework for the model using only the focal site's data, allowing the site to test the model. We rely upon a Hierarchical Bayes Method and perform estimation in two different ways - Markov Chain Monte Carlo and Stochastic Gradient with Langevin Dynamics. Our results find good support for the approach to computing personalized share of engagement and for its evaluation.

engagement, focal site, iet, (14 more...)

arXiv.org Machine Learning

2312.16177

Country:

Asia > India (0.04)
North America > United States > California (0.04)
North America > United States > Hawaii (0.04)
Europe (0.04)

Genre: Research Report > New Finding (0.68)

Industry:

Information Technology > Security & Privacy (1.00)
Law (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.93)
(2 more...)

Image Restoration Through Generalized Ornstein-Uhlenbeck Bridge

Yue, Conghan, Peng, Zhengwei, Ma, Junlong, Du, Shiyan, Wei, Pengxu, Zhang, Dongyu

Diffusion models possess powerful generative capabilities enabling the mapping of noise to data using reverse stochastic differential equations. However, in image restoration tasks, the focus is on the mapping relationship from low-quality images to high-quality images. To address this, we introduced the Generalized Ornstein-Uhlenbeck Bridge (GOUB) model. By leveraging the natural mean-reverting property of the generalized OU process and further adjusting the variance of its steady-state distribution through the Doob's h-transform, we achieve diffusion mappings from point to point with minimal cost. This allows for end-to-end training, enabling the recovery of high-quality images from low-quality ones. Additionally, we uncovered the mathematical essence of some bridge models, all of which are special cases of the GOUB and empirically demonstrated the optimality of our proposed models. Furthermore, benefiting from our distinctive parameterization mechanism, we proposed the Mean-ODE model that is better at capturing pixel-level information and structural perceptions. Experimental results show that both models achieved state-of-the-art results in various tasks, including inpainting, deraining, and super-resolution. Code is available at https://github.com/Hammour-steak/GOUB.

arxiv preprint arxiv, diffusion model, image restoration, (9 more...)

2312.10299

Country: Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Asset Ownership Identification: Using machine learning to predict enterprise asset ownership

Jacobik, Craig

Asset owner identification is an important first step for any information security organization, allowing organizations the ability to identify and detect data breaches and losses, vulnerabilities, possible attack surfaces, and define effective countermeasures. Using existing asset ownership data, the research utilized an assortment of machine learning algorithms to determine the best classification model to predict an asset's owner. The research ran separate analyses for each enumerated team, then ran a 100 iteration Monte Carlo Cross Validation across Adaboost, Logistic Regression, Naive Bayes, Classification and Regression Trees, and Random Forests. Finally, a visualization dashboard was created to help users understand the asset inventory through interactive exploratory data analysis as well as the ability to understand model evaluation metrics including accuracy, sensitivity, and specificity for each model. Overall, Adaboost performed the best across all owners with low testing errors below 5% while Naive Bayes performed the worst. The remaining models performed similarly. The fully qualified domain name (FQDN), Classless Inter-Domain Routing (CIDR) CIDR/16, and location were among the most important features.

dashboard, random forest, response variable, (14 more...)

2312.10266

Country:

North America > Canada > British Columbia (0.04)
Asia > India (0.04)

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Brand, Cornelius, Ganian, Robert, Kalyanasundaram, Subrahmanyam, Inerney, Fionn Mc

The Complexity of Optimizing Atomic Congestion

Atomic congestion games are a classic topic in network design, routing, and algorithmic game theory, and are capable of modeling congestion and flow optimization tasks in various application areas. While both the price of anarchy for such games as well as the computational complexity of computing their Nash equilibria are by now well-understood, the computational complexity of computing a system-optimal set of strategies -- that is, a centrally planned routing that minimizes the average cost of agents -- is severely understudied in the literature. We close this gap by identifying the exact boundaries of tractability for the problem through the lens of the parameterized complexity paradigm. After showing that the problem remains highly intractable even on extremely simple networks, we obtain a set of results which demonstrate that the structural parameters which control the computational (in)tractability of the problem are not vertex-separator based in nature (such as, e.g., treewidth), but rather based on edge separators. We conclude by extending our analysis towards the (even more challenging) min-max variant of the problem.

agent, arc, flow assignment, (15 more...)

2312.10219

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > Germany > Bavaria > Regensburg (0.04)
Europe > Austria (0.04)
Asia > India > Telangana > Hyderabad (0.04)

Genre: Research Report (0.82)

Industry: Leisure & Entertainment > Games (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Game Theory (0.87)
Information Technology > Communications > Networks (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Review of Unsupervised POS Tagging and Its Implications on Language Acquisition

Dickson, Niels

An ability that underlies human syntactic knowledge is determining which words can appear in the similar structures (i.e. grouping words by their syntactic categories). These groupings enable humans to combine structures in order to communicate complex meanings. A foundational question is how do children acquire this ability underlying syntactic knowledge. In exploring this process, we will review various engineering approaches whose goal is similar to that of a child's -- without prior syntactic knowledge, correctly identify the parts of speech (POS) of the words in a sample of text. In reviewing these unsupervised tagging efforts, we will discuss common themes that support the advances in the models and their relevance for language acquisition. For example, we discuss how each model judges success (evaluation metrics), the "additional information" that constrains the POS learning (such as orthographic information), and the context used to determine POS (only previous word, words before and after the target, etc). The identified themes pave the way for future investigations into the cognitive processes that underpin the acquisition of syntactic categories and provide a useful layout of current state of the art unsupervised POS tagging models.

category, information, syntactic category, (15 more...)

2312.10169

Country:

North America > United States > Massachusetts > Middlesex County > Somerville (0.04)
North America > United States > New Jersey > Bergen County > Mahwah (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(2 more...)

Genre: Research Report > New Finding (0.67)

Industry: Education (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
(3 more...)

Bayesian Estimate of Mean Proper Scores for Diversity-Enhanced Active Learning

Tan, Wei, Du, Lan, Buntine, Wray

The effectiveness of active learning largely depends on the sampling efficiency of the acquisition function. Expected Loss Reduction (ELR) focuses on a Bayesian estimate of the reduction in classification error, and more general costs fit in the same framework. We propose Bayesian Estimate of Mean Proper Scores (BEMPS) to estimate the increase in strictly proper scores such as log probability or negative mean square error within this framework. We also prove convergence results for this general class of costs. To facilitate better experimentation with the new acquisition functions, we develop a complementary batch AL algorithm that encourages diversity in the vector of expected changes in scores for unlabeled data. To allow high-performance classifiers, we combine deep ensembles, and dynamic validation set construction on pretrained models, and further speed up the ensemble process with the idea of Monte Carlo Dropout. Extensive experiments on both texts and images show that the use of mean square error and log probability with BEMPS yields robust acquisition functions and well-calibrated classifiers, and consistently outperforms the others tested. The advantages of BEMPS over the others are further supported by a set of qualitative analyses, where we visualise their sampling behaviour using data maps and t-SNE plots.

active learning, classifier, dataset, (14 more...)

doi: 10.1109/TPAMI.2023.3343359

2312.10116

Country:

North America > Canada > Ontario > Toronto (0.14)
Oceania > Australia (0.04)
Asia > Vietnam > Hanoi > Hanoi (0.04)
(2 more...)

Genre: Research Report > New Finding (0.93)

Industry: Health & Medicine (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.90)