AITopics | edac

Collaborating Authors

edac

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Supplementary Material for Uncertainty-Based Offline Reinforcement Learning with Diversified Q-Ensemble AEnsemble gradient diversification

Neural Information Processing SystemsApr-25-2026, 13:16:29 GMT

Proposition 1. Suppose Qφj(s,a) = Q(s,a) and Qφj(s,) is locally linear in the neighborhood of a for all j [N]. Let λmin and wmin be the smallest eigenvalue and the corresponding normalized eigenvector of the matrix Var aQφj(s,a) and > 0 be the value such that mini6=j aQφi(s,a), aQφj(s,a) = 1 . We first prove that the smallest eigenvalue λmin of Var aQφj(s,a) is upper-bounded by some constant multiple of . By Lemma 1, the total variance of the matrix is less or equal to N 1N. Note that, using the fact that the Q-values coincide at the action a and the local linearity of the Q-functions, we have derived Var(Qφj(s,a+ kw)) = k2w|Var aQφj(s,a) w. (2) Plugging w = wmin in Equation (2) and using Equation (1), we have Var(Qφj(s,a+ kwmin)) = k2w|minVar aQφj(s,a) wmin = k2λmin A.2 Relationship between maximizing the total variance and maximizing the smallest eigenvalue As we have shown in Section 4, maximizing the total variance of the matrix Var ( aQφi(s,a)) is equivalent to minimizing the cosine similarity of all distinct pairs of the gradients aQφi(s,a), 2 which makes the gradients uniformly distributed on the unit sphere S|A| 1.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.65)

Add feedback

3d3d286a8d153a4a58156d0e02d8570c-Supplemental.pdf

Neural Information Processing SystemsFeb-8-2026, 07:44:23 GMT

dataset, edac, variance, (16 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.65)

Add feedback

A Theoretical Analysis

Neural Information Processing SystemsAug-22-2025, 01:03:20 GMT

In this section, we provide detailed theoretical analysis and proofs in linear MDPs [23]. A.1 LSVI Solution In linear MDPs, we assume that the transition dynamics and reward function take the form of P Theorem (Theorem 1 restate) . In experiments, we do not use explicit constraints (e.g., Spectral regularization) for the upper bound Corollary (Corollary 1 restate) . I given in Corollary 1. To conclude, we obtain from Eq. (22) that |T V First, we give the following lemma.

artificial intelligence, machine learning, rorl, (18 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.46)

Industry: Government (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Robots (0.67)

Add feedback

Efficient Offline Reinforcement Learning: The Critic is Critical

Jelley, Adam, McInroe, Trevor, Devlin, Sam, Storkey, Amos

arXiv.org Artificial IntelligenceJun-19-2024

Recent work has demonstrated both benefits and limitations from using supervised approaches (without temporal-difference learning) for offline reinforcement learning. While off-policy reinforcement learning provides a promising approach for improving performance beyond supervised approaches, we observe that training is often inefficient and unstable due to temporal difference bootstrapping. In this paper we propose a best-of-both approach by first learning the behavior policy and critic with supervised learning, before improving with off-policy reinforcement learning. Specifically, we demonstrate improved efficiency by pre-training with a supervised Monte-Carlo value-error, making use of commonly neglected downstream information from the provided offline trajectories. We find that we are able to more than halve the training time of the considered offline algorithms on standard benchmarks, and surprisingly also achieve greater stability. We further build on the importance of having consistent policy and value functions to propose novel hybrid algorithms, TD3+BC+CQL and EDAC+BC, that regularize both the actor and the critic towards the behavior policy. This helps to more reliably improve on the behavior policy when learning from limited human demonstrations.

artificial intelligence, machine learning, reinforcement learning, (13 more...)

arXiv.org Artificial Intelligence

2406.13376

Country:

Oceania > New Zealand (0.04)
North America > United States > Oregon (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.84)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

EDAC: Efficient Deployment of Audio Classification Models For COVID-19 Detection

Jovanović, Andrej, Mihaly, Mario, Donaldson, Lennon

arXiv.org Artificial IntelligenceSep-11-2023

The global spread of COVID-19 had severe consequences for public health and the world economy. The quick onset of the pandemic highlighted the potential benefits of cheap and deployable pre-screening methods to monitor the prevalence of the disease in a population. Various researchers made use of machine learning methods in an attempt to detect COVID-19. The solutions leverage various input features, such as CT scans or cough audio signals, with state-of-the-art results arising from deep neural network architectures. However, larger models require more compute; a pertinent consideration when deploying to the edge. To address this, we first recreated two models that use cough audio recordings to detect COVID-19. Through applying network pruning and quantisation, we were able to compress these two architectures without reducing the model's predictive performance. Specifically, we were able to achieve an 105.76x and an 19.34x reduction in the compressed model file size with corresponding 1.37x and 1.71x reductions in the inference times of the two models.

audio classification model, dataset, inference time, (13 more...)

arXiv.org Artificial Intelligence

2309.05357

Country: Europe > United Kingdom > Scotland > City of Edinburgh > Edinburgh (0.04)

Genre: Research Report > New Finding (0.67)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Uncertainty-Based Offline Reinforcement Learning with Diversified Q-Ensemble

An, Gaon, Moon, Seungyong, Kim, Jang-Hyun, Song, Hyun Oh

arXiv.org Artificial IntelligenceOct-5-2021

Offline reinforcement learning (offline RL), which aims to find an optimal policy from a previously collected static dataset, bears algorithmic difficulties due to function approximation errors from out-of-distribution (OOD) data points. To this end, offline RL algorithms adopt either a constraint or a penalty term that explicitly guides the policy to stay close to the given dataset. However, prior methods typically require accurate estimation of the behavior policy or sampling from OOD data points, which themselves can be a non-trivial problem. Moreover, these methods under-utilize the generalization ability of deep neural networks and often fall into suboptimal solutions too close to the given dataset. In this work, we propose an uncertainty-based offline RL method that takes into account the confidence of the Q-value prediction and does not require any estimation or sampling of the data distribution. We show that the clipped Q-learning, a technique widely used in online RL, can be leveraged to successfully penalize OOD data points with high prediction uncertainties. Surprisingly, we find that it is possible to substantially outperform existing offline RL methods on various tasks by simply increasing the number of Q-networks along with the clipped Q-learning. Based on this observation, we propose an ensemble-diversified actor-critic algorithm that reduces the number of required ensemble networks down to a tenth compared to the naive ensemble while achieving state-of-the-art performance on most of the D4RL benchmarks considered.

dataset, q-value, variance, (15 more...)

arXiv.org Artificial Intelligence

2110.01548

Country:

Asia > South Korea > Seoul > Seoul (0.04)
Oceania > Australia > New South Wales > Sydney (0.04)
Europe > Netherlands > South Holland > Delft (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback