AITopics | softopc

Collaborating Authors

softopc

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

b5b03f06271f8917685d14cea7c6c50a-Paper.pdf

Neural Information Processing SystemsFeb-13-2026, 18:35:36 GMT

evaluation, reinforcement learning, softopc, (13 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Alameda County > Berkeley (0.04)
North America > Canada (0.04)
Europe > United Kingdom > England > Greater London > London (0.04)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.98)
Information Technology > Artificial Intelligence > Robots (0.96)

Add feedback

b5b03f06271f8917685d14cea7c6c50a-Paper.pdf

Neural Information Processing SystemsAug-20-2025, 00:09:02 GMT

evaluation, reinforcement learning, softopc, (13 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Alameda County > Berkeley (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Europe > United Kingdom > England > Greater London > London (0.04)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.98)
Information Technology > Artificial Intelligence > Robots (0.96)

Add feedback

Google's AI picks which machine learning models will produce the best results

#artificialintelligenceJun-20-2019, 04:24:17 GMT

Leave it to the folks at Google to devise AI capable of predicting which machine learning models will produce the best results. In a newly-published paper ("Off-Policy Evaluation via Off-Policy Classification") and blog post, a team of Google AI researchers propose what they call "off-policy classification," or OPC, which evaluates the performance of AI-driven agents by treating evaluation as a classification problem. The team notes that their approach -- a variant of reinforcement learning, which employs rewards to drive software policies toward goals -- works with image inputs and scales to tasks including vision-based robotic grasping. "Fully off-policy reinforcement learning is a variant in which an agent learns entirely from older data, which is appealing because it enables model iteration without requiring a physical robot," writes Robotics at Google software engineer Alexa Irpan. "With fully off-policy RL, one can train several models on the same fixed dataset collected by previous agents, then select the best one."

best result, machine learning, reinforcement learning, (11 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.66)

Add feedback

Off-Policy Evaluation via Off-Policy Classification

Irpan, Alex, Rao, Kanishka, Bousmalis, Konstantinos, Harris, Chris, Ibarz, Julian, Levine, Sergey

arXiv.org Artificial IntelligenceJun-4-2019

In this work, we consider the problem of model selection for deep reinforcement learning (RL) in real-world environments. Typically, the performance of deep RL algorithms is evaluated via on-policy interactions with the target environment. However, comparing models in a real-world environment for the purposes of early stopping or hyperparameter tuning is costly and often practically infeasible. This leads us to examine off-policy policy evaluation (OPE) in such settings. We focus on OPE for value-based methods, which are of particular interest in deep RL, with applications like robotics, where off-policy algorithms based on Q-function estimation can often attain better sample complexity than direct policy optimization. Existing OPE metrics either rely on a model of the environment, or the use of importance sampling (IS) to correct for the data being off-policy. However, for high-dimensional observations, such as images, models of the environment can be difficult to fit and value-based methods can make IS hard to use or even ill-conditioned, especially when dealing with continuous action spaces. In this paper, we focus on the specific case of MDPs with continuous action spaces and sparse binary rewards, which is representative of many important real-world applications. We propose an alternative metric that relies on neither models nor IS, by framing OPE as a positive-unlabeled (PU) classification problem with the Q-function as the decision function. We experimentally show that this metric outperforms baselines on a number of tasks. Most importantly, it can reliably predict the relative performance of different policies in a number of generalization scenarios, including the transfer to the real-world of policies trained in simulation for an image-based robotic manipulation task.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

1906.01624

Country:

North America > United States > California > Alameda County > Berkeley (0.04)
Europe > United Kingdom > England > Greater London > London (0.04)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback