AITopics | watkin

Collaborating Authors

watkin

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Zap Q-Learning

Adithya M Devraj, Sean Meyn

Neural Information Processing SystemsApr-23-2026, 03:09:47 GMT

Neural Information Processing Systems http://nips.cc/

artificial intelligence, machine learning, reinforcement learning, (13 more...)

Neural Information Processing Systems

Country: North America > United States (0.93)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Safe and Efficient Off-Policy Reinforcement Learning

Remi Munos, Tom Stepleton, Anna Harutyunyan, Marc Bellemare

Neural Information Processing SystemsApr-22-2026, 01:53:54 GMT

In this work, we take a fresh look at some old and new algorithms for off-policy, return-based reinforcement learning. Expressing these in a common form, we derive a novel algorithm, Retrace(λ), with three desired properties: (1) it has low variance; (2) it safely uses samples collected from any behaviour policy, whatever its degree of "off-policyness"; and (3) it is efficient as it makes the best use of samples collected from near on-policy behaviour policies. We analyze the contractive nature of the related operator under both off-policy policy evaluation and control settings and derive online sample-based algorithms. We believe this is the first return-based off-policy control algorithm converging a.s. to Q without the GLIE assumption (Greedy in the Limit with Infinite Exploration). As a corollary, we prove the convergence of Watkins' Q(λ), which was an open problem since 1989. We illustrate the benefits of Retrace(λ) on a standard suite of Atari 2600 games. One fundamental trade-off in reinforcement learning lies in the definition of the update target: should one estimate Monte Carlo returns or bootstrap from an existing Q-function?

artificial intelligence, machine learning, reinforcement learning, (18 more...)

Neural Information Processing Systems

Country: Europe (0.28)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

The Computation of Generalized Embeddings for Underwater Acoustic Target Recognition using Contrastive Learning

Hummel, Hilde I., Gansekoele, Arwin, Bhulai, Sandjai, van der Mei, Rob

arXiv.org Artificial IntelligenceMay-20-2025

The increasing level of sound pollution in marine environments poses an increased threat to ocean health, making it crucial to monitor underwater noise. By monitoring this noise, the sources responsible for this pollution can be mapped. Monitoring is performed by passively listening to these sounds. This generates a large amount of data records, capturing a mix of sound sources such as ship activities and marine mammal vocalizations. Although machine learning offers a promising solution for automatic sound classification, current state-of-the-art methods implement supervised learning. This requires a large amount of high-quality labeled data that is not publicly available. In contrast, a massive amount of lower-quality unlabeled data is publicly available, offering the opportunity to explore unsupervised learning techniques. This research explores this possibility by implementing an unsupervised Contrastive Learning approach. Here, a Conformer-based encoder is optimized by the so-called Variance-Invariance-Covariance Regularization loss function on these lower-quality unlabeled data and the translation to the labeled data is made. Through classification tasks involving recognizing ship types and marine mammal vocalizations, our method demonstrates to produce robust and generalized embeddings. This shows to potential of unsupervised methods for various automatic underwater acoustic analysis tasks.

artificial intelligence, augmentation function, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2505.12904

Country:

North America > Canada (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)
Europe > Germany (0.04)

Genre: Research Report > Promising Solution (0.54)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.76)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Valentine's Day dangers: Dating app killers lure love seekers in unsuspecting ways

FOX NewsFeb-14-2025, 19:00:56 GMT

Kurt "The Cyberguy" Knutsson explains how facial recognition technology can help you find your perfect match. From a poisonous date to finding love with a serial killer, these six chilling cases show how unsuspecting dating app users on the quest for romance led them into the clutches of danger. Dating apps – from Tinder to Grindr – are the modern way for people to connect with potential partners from the comfort of their own space. Brace yourself for stories that blur the line between love and terror. Here is Fox News Digital's list of some recent cases where love went wrong.

artificial intelligence, fox new digital, social media, (13 more...)

FOX News

Country:

North America > United States > New York (0.06)
North America > United States > Pennsylvania (0.06)
Oceania > New Zealand > North Island > Auckland Region > Auckland (0.05)
(5 more...)

Industry:

Law > Criminal Law (1.00)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
Information Technology (1.00)
Government (0.97)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.87)

Add feedback

Regularized Q-learning through Robust Averaging

Schmitt-Förster, Peter, Sutter, Tobias

arXiv.org Artificial IntelligenceMay-29-2024

We propose a new Q-learning variant, called 2RA Q-learning, that addresses some weaknesses of existing Q-learning methods in a principled manner. One such weakness is an underlying estimation bias which cannot be controlled and often results in poor performance. We propose a distributionally robust estimator for the maximum expected value term, which allows us to precisely control the level of estimation bias introduced. The distributionally robust estimator admits a closed-form solution such that the proposed algorithm has a computational cost per iteration comparable to Watkins' Q-learning. For the tabular case, we show that 2RA Q-learning converges to the optimal policy and analyze its asymptotic mean-squared error. Lastly, we conduct numerical experiments for various settings, which corroborate our theoretical findings and indicate that 2RA Q-learning often performs better than existing methods.

q-learning, theorem 3, watkin, (15 more...)

arXiv.org Artificial Intelligence

2405.02201

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Shock of the old: 11 wild views of the future – from winged postmen to self-cleaning homes

The GuardianMar-13-2024, 14:00:08 GMT

"Things can only get better", D:Ream promised, but they were wrong, and so were most people in history who have tried to predict the future. It never stopped us from trying, though, and a few visionaries have been pretty good at it. There was Leonardo da Vinci, of course, with his helicopters and fridges, and Joseph Glanvill, who in 1661 suggested moon voyages and communication using "magnetic waves" might be a thing. Civil engineer John Elfreth Watkins, writing in 1900, predicted mobile phones, ready meals and global digital media ("Photographs will be telegraphed from any distance. If there be a battle in China a hundred years hence, snapshots of its most striking events will be published in the newspapers an hour later").

self-cleaning home, wild view, winged postmen, (12 more...)

The Guardian

Country:

Asia > China (0.25)
Europe > France (0.06)
Oceania > Australia > New South Wales (0.05)
(2 more...)

Industry:

Transportation > Passenger (0.49)
Media > News (0.36)
Transportation > Air (0.35)
Transportation > Ground > Road (0.30)

Technology: Information Technology > Artificial Intelligence > Robots (0.33)

Add feedback

Safe and efficient off-policy reinforcement learning Rémi Munos

Neural Information Processing SystemsMar-12-2024, 17:13:58 GMT

algorithm, retrace, sequence, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > New York > New York County > New York City (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Zap Q-Learning

Neural Information Processing SystemsMar-11-2024, 20:10:30 GMT

The Zap Q-learning algorithm introduced in this paper is an improvement of Watkins' original algorithm and recent competitors in several respects. It is a matrix-gain algorithm designed so that its asymptotic variance is optimal. Moreover, an ODE analysis suggests that the transient behavior is a close match to a deterministic Newton-Raphson implementation. This is made possible by a two time-scale update equation for the matrix gain sequence. The analysis suggests that the approach will lead to stable and efficient computation even for non-ideal parameterized settings. Numerical experiments confirm the quick convergence, even in such non-ideal cases.

artificial intelligence, machine learning, reinforcement learning, (13 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > United States > Florida > Alachua County > Gainesville (0.14)
North America > United States > New York > Tompkins County > Ithaca (0.04)
(2 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

About to Break Down? You Might Be a Cybertruck.

Mother JonesFeb-16-2024, 11:00:40 GMT

Tesla CEO Elon Musk stands in front of the damaged Cybertruck after it fails a demonstration of its durability.Ringo H.W. Chiu / AP At a live delivery event this November, where Elon Musk awkwardly opened the door for about a dozen new Cybertruck owners, he told the world: "The apocalypse can come along any moment, and here at Tesla, we have the best in apocalypse technology." Then he showed a video of the vehicle being pummeled by a machine gun, quipping, "If you're ever in an argument with another car, you will win." And then he sold a bunch of Cybertrucks. Two million have been preordered--and 500 delivered--for over 60,000 a pop. Some soon proved that they couldn't survive a test drive, let alone a ride with Mad Max.

apocalypse technology, cybertruck, vehicle, (10 more...)

Mother Jones

Country: North America > United States > California > Los Angeles County > Los Angeles (0.05)

Industry:

Leisure & Entertainment (0.73)
Media > Film (0.50)

Technology: Information Technology > Artificial Intelligence (0.30)

Add feedback

Theoretical remarks on feudal hierarchies and reinforcement learning

AIHubJan-16-2024, 10:01:15 GMT

Reinforcement learning is a paradigm through which an agent interacts with its environment by trying out different actions at different states and observing the outcome. Each of these interactions can change the state of the environment, and can also provide rewards to the agent. The goal of the agent is to learn the value of performing each action on each state. By value, we mean the biggest amount of rewards that is possible for the agent to obtain after performing that action in that state. If the agent achieves this goal, it can then act optimally on its environment by choosing, at every state, the action that has the biggest value.

machine learning, reinforcement, reinforcement learning, (14 more...)

AIHub

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback