AITopics | pdis

Collaborating Authors

pdis

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Using Options and Covariance Testing for Long Horizon Off-Policy Policy Evaluation

Zhaohan Guo, Philip S. Thomas, Emma Brunskill

Neural Information Processing SystemsNov-21-2025, 10:13:49 GMT

Evaluating a policy by deploying it in the real world can be risky and costly.

estimator, evaluation policy, trajectory, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Hampshire County > Amherst (0.14)
North America > Canada > Alberta (0.14)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
(5 more...)

Industry:

Education (0.68)
Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (0.48)
Government > Regional Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.69)

Add feedback

SOPE: Spectrum of Off-Policy Estimators

Neural Information Processing SystemsAug-16-2025, 10:22:02 GMT

Consequently, if the parameterization is not rich enough, then it may not be possible to represent the distribution ratios accurately, and when using rich function approximators (such as neural networks) then the optimization procedure may get stuck in sub-optimal saddle points.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > Texas > Travis County > Austin (0.05)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)

Industry: Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.66)

Add feedback

Efficient Policy Evaluation with Safety Constraint for Reinforcement Learning

Chen, Claire, Liu, Shuze, Zhang, Shangtong

arXiv.org Artificial IntelligenceOct-7-2024

In reinforcement learning, classic on-policy evaluation methods often suffer from high variance and require massive online data to attain the desired accuracy. Previous studies attempt to reduce evaluation variance by searching for or designing proper behavior policies to collect data. However, these approaches ignore the safety of such behavior policies -- the designed behavior policies have no safety guarantee and may lead to severe damage during online executions. In this paper, to address the challenge of reducing variance while ensuring safety simultaneously, we propose an optimal variance-minimizing behavior policy under safety constraints. Theoretically, while ensuring safety constraints, our evaluation method is unbiased and has lower variance than on-policy evaluation. Empirically, our method is the only existing method to achieve both substantial variance reduction and safety constraint satisfaction. Furthermore, we show our method is even superior to previous methods in both variance reduction and execution safety.

behavior policy, constraint, safety constraint, (12 more...)

arXiv.org Artificial Intelligence

2410.05655

Country:

North America > Canada > Alberta (0.14)
North America > United States > Virginia (0.04)
North America > United States > New Jersey (0.04)
(5 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Using Options and Covariance Testing for Long Horizon Off-Policy Policy Evaluation

Zhaohan Guo, Philip S. Thomas, Emma Brunskill

Neural Information Processing SystemsOct-3-2024, 13:16:02 GMT

Evaluating a policy by deploying it in the real world can be risky and costly. Off-policy policy evaluation (OPE) algorithms use historical data collected from running a previous policy to evaluate a new policy, which provides a means for evaluating a policy without requiring it to ever be deployed. Importance sampling is a popular OPE method because it is robust to partial observability and works with continuous states and actions. However, the amount of historical data required by importance sampling can scale exponentially with the horizon of the problem: the number of sequential decisions that are made. We propose using policies over temporally extended actions, called options, and show that combining these policies with importance sampling can significantly improve performance for long-horizon problems. In addition, we can take advantage of special cases that arise due to options-based policies to further improve the performance of importance sampling. We further generalize these special cases to a general covariance testing rule that can be used to decide which weights to drop in an IS estimate, and derive a new IS algorithm called Incremental Importance Sampling that can provide significantly more accurate estimates for a broad class of domains.

estimator, evaluation policy, trajectory, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)
North America > United States > Massachusetts > Hampshire County > Amherst (0.14)
North America > Canada > Alberta (0.14)
(5 more...)

Industry:

Education (0.68)
Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (0.48)
Government > Regional Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.69)

Add feedback

Doubly Optimal Policy Evaluation for Reinforcement Learning

Liu, Shuze, Chen, Claire, Zhang, Shangtong

arXiv.org Artificial IntelligenceOct-3-2024

Policy evaluation estimates the performance of a policy by (1) collecting data from the environment and (2) processing raw data into a meaningful estimate. Due to the sequential nature of reinforcement learning, any improper data-collecting policy or data-processing method substantially deteriorates the variance of evaluation results over long time steps. Thus, policy evaluation often suffers from large variance and requires massive data to achieve the desired accuracy. In this work, we design an optimal combination of data-collecting policy and data-processing baseline. Theoretically, we prove our doubly optimal policy evaluation method is unbiased and guaranteed to have lower variance than previously best-performing methods. Empirically, compared with previous works, we show our method reduces variance substantially and achieves superior empirical performance.

behavior policy, estimator, variance, (14 more...)

arXiv.org Artificial Intelligence

2410.02226

Country:

North America > Canada > Alberta (0.14)
North America > United States > Virginia (0.04)
North America > United States > New Jersey (0.04)
(4 more...)

Genre: Research Report (0.81)

Industry: Information Technology (0.54)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Efficient Multi-Policy Evaluation for Reinforcement Learning

Liu, Shuze, Chen, Yuxin, Zhang, Shangtong

arXiv.org Artificial IntelligenceAug-16-2024

To unbiasedly evaluate multiple target policies, the dominant approach among RL practitioners is to run and evaluate each target policy separately. However, this evaluation method is far from efficient because samples are not shared across policies, and running target policies to evaluate themselves is actually not optimal. In this paper, we address these two weaknesses by designing a tailored behavior policy to reduce the variance of estimators across all target policies. Theoretically, we prove that executing this behavior policy with manyfold fewer samples outperforms on-policy evaluation on every target policy under characterized conditions. Empirically, we show our estimator has a substantially lower variance compared with previous best methods and achieves state-of-the-art performance in a broad range of environments.

estimator, target policy, variance, (15 more...)

arXiv.org Artificial Intelligence

2408.08706

Country:

North America > Canada > Alberta (0.14)
North America > United States > Virginia (0.04)
North America > United States > Massachusetts > Hampshire County > Amherst (0.04)
(2 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Near-Field Spot Beamfocusing: A Correlation-Aware Transfer Learning Approach

Fallah, Mohammad Amir, Monemi, Mehdi, Rasti, Mehdi, Latva-Aho, Matti

arXiv.org Artificial IntelligenceMay-21-2024

3D spot beamfocusing (SBF), in contrast to conventional angular-domain beamforming, concentrates radiating power within very small volume in both radial and angular domains in the near-field zone. Recently the implementation of channel-state-information (CSI)-independent machine learning (ML)-based approaches have been developed for effective SBF using extremely-largescale-programable-metasurface (ELPMs). These methods involve dividing the ELPMs into subarrays and independently training them with Deep Reinforcement Learning to jointly focus the beam at the Desired Focal Point (DFP). This paper explores near-field SBF using ELPMs, addressing challenges associated with lengthy training times resulting from independent training of subarrays. To achieve a faster CSIindependent solution, inspired by the correlation between the beamfocusing matrices of the subarrays, we leverage transfer learning techniques. First, we introduce a novel similarity criterion based on the Phase Distribution Image of subarray apertures. Then we devise a subarray policy propagation scheme that transfers the knowledge from trained to untrained subarrays. We further enhance learning by introducing Quasi-Liquid-Layers as a revised version of the adaptive policy reuse technique. We show through simulations that the proposed scheme improves the training speed about 5 times. Furthermore, for dynamic DFP management, we devised a DFP policy blending process, which augments the convergence rate up to 8-fold.

dfp, elpm, subarray, (17 more...)

arXiv.org Artificial Intelligence

2405.19347

Country:

Europe > Finland > Northern Ostrobothnia > Oulu (0.04)
Asia > Middle East > Iran > Fars Province > Shiraz (0.04)
Asia > Middle East > Iran > Tehran Province > Tehran (0.04)
(6 more...)

Genre:

Research Report (0.64)
Instructional Material (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Transfer Learning (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.68)

Add feedback

Data Poisoning Attacks on Off-Policy Policy Evaluation Methods

Lobo, Elita, Singh, Harvineet, Petrik, Marek, Rudin, Cynthia, Lakkaraju, Himabindu

arXiv.org Artificial IntelligenceApr-6-2024

Off-policy Evaluation (OPE) methods are a crucial tool for evaluating policies in high-stakes domains such as healthcare, where exploration is often infeasible, unethical, or expensive. However, the extent to which such methods can be trusted under adversarial threats to data quality is largely unexplored. In this work, we make the first attempt at investigating the sensitivity of OPE methods to marginal adversarial perturbations to the data. We design a generic data poisoning attack framework leveraging influence functions from robust statistics to carefully construct perturbations that maximize error in the policy value estimates. We carry out extensive experimentation with multiple healthcare and control datasets. Our results demonstrate that many existing OPE methods are highly prone to generating value estimates with large errors when subject to data poisoning attacks, even for small adversarial perturbations. These findings question the reliability of policy values derived using OPE methods and motivate the need for developing OPE methods that are statistically robust to train-time data poisoning attacks.

dope attack, ope method, value function estimate, (15 more...)

arXiv.org Artificial Intelligence

2404.04714

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > North Carolina > Durham County > Durham (0.04)
North America > United States > New Hampshire (0.04)
(4 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Information Technology > Security & Privacy (0.94)
Health & Medicine > Therapeutic Area (0.79)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
(2 more...)

Add feedback

Improving Monte Carlo Evaluation with Offline Data

Liu, Shuze, Zhang, Shangtong

arXiv.org Artificial IntelligenceMar-23-2023

Monte Carlo (MC) methods are the most widely used methods to estimate the performance of a policy. Given an interested policy, MC methods give estimates by repeatedly running this policy to collect samples and taking the average of the outcomes. Samples collected during this process are called online samples. To get an accurate estimate, MC methods consume massive online samples. When online samples are expensive, e.g., online recommendations and inventory management, we want to reduce the number of online samples while achieving the same estimate accuracy. To this end, we use off-policy MC methods that evaluate the interested policy by running a different policy called behavior policy. We design a tailored behavior policy such that the variance of the off-policy MC estimator is provably smaller than the ordinary MC estimator. Importantly, this tailored behavior policy can be efficiently learned from existing offline data, i,e., previously logged data, which are much cheaper than online samples. With reduced variance, our off-policy MC method requires fewer online samples to evaluate the performance of a policy compared with the ordinary MC method. Moreover, our off-policy MC estimator is always unbiased.

machine learning, pdis, reinforcement learning, (12 more...)

arXiv.org Artificial Intelligence

2301.13734

Country: