AITopics | Liu, Vincent

Collaborating Authors

Liu, Vincent

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Carbon Connect: An Ecosystem for Sustainable Computing

Lee, Benjamin C., Brooks, David, van Benthem, Arthur, Gupta, Udit, Hills, Gage, Liu, Vincent, Pierce, Benjamin, Stewart, Christopher, Strubell, Emma, Wei, Gu-Yeon, Wierman, Adam, Yao, Yuan, Yu, Minlan

arXiv.org Artificial IntelligenceMay-22-2024

Computing is at a moment of profound opportunity. Emerging applications -- such as capable artificial intelligence, immersive virtual realities, and pervasive sensor systems -- drive unprecedented demand for computer. Despite recent advances toward net zero carbon emissions, the computing industry's gross energy usage continues to rise at an alarming rate, outpacing the growth of new energy installations and renewable energy deployments. A shift towards sustainability is needed to spark a transformation in how computer systems are manufactured, allocated, and consumed. Carbon Connect envisions coordinated research thrusts that produce design and management strategies for sustainable, next-generation computer systems. These strategies must flatten and then reverse growth trajectories for computing power and carbon for society's most rapidly growing applications such as artificial intelligence and virtual spaces. We will require accurate models for carbon accounting in computing technology. For embodied carbon, we must re-think conventional design strategies -- over-provisioned monolithic servers, frequent hardware refresh cycles, custom silicon -- and adopt life-cycle design strategies that more effectively reduce, reuse and recycle hardware at scale. For operational carbon, we must not only embrace renewable energy but also design systems to use that energy more efficiently. Finally, new hardware design and management strategies must be cognizant of economic policy and regulatory landscape, aligning private initiatives with societal goals. Many of these broader goals will require computer scientists to develop deep, enduring collaborations with researchers in economics, law, and industrial ecology to spark change in broader practice.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2405.13858

Country: North America > United States (1.00)

Genre: Research Report (0.82)

Industry:

Government > Regional Government > North America Government > United States Government (1.00)
Energy > Renewable (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Information Technology > Human Computer Interaction > Interfaces > Virtual Reality (0.34)

Add feedback

Switching the Loss Reduces the Cost in Batch Reinforcement Learning

Ayoub, Alex, Wang, Kaiwen, Liu, Vincent, Robertson, Samuel, McInerney, James, Liang, Dawen, Kallus, Nathan, Szepesvári, Csaba

arXiv.org Artificial IntelligenceMar-12-2024

In offline reinforcement learning (RL), also known as batch RL, we often want agents that learn how to achieve a goal from a fixed dataset using as few samples as possible. A standard approach in this setting is fitted Q-iteration (FQI) [Ernst et al., 2005], which iteratively minimizes the regression error on the batch dataset. In this work we propose a simple and principled improvement to FQI, using log-loss (FQI-log), and prove that it can achieve a much faster convergence rate. In particular, the number of samples it requires to learn a near-optimal policy scales with the cost of the optimal policy, leading to a so-called small-cost bound, the RL analogue of a small-loss bound in supervised learning. We highlight that FQI-log is the first computationally efficient batch RL algorithm to achieve a small-cost bound.

artificial intelligence, machine learning, reinforcement learning, (10 more...)

arXiv.org Artificial Intelligence

2403.05385

Country:

North America > Canada > Alberta (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)

Genre: Research Report > New Finding (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Under the Surface: Tracking the Artifactuality of LLM-Generated Data

Das, Debarati, De Langis, Karin, Martin-Boyle, Anna, Kim, Jaehyung, Lee, Minhwa, Kim, Zae Myung, Hayati, Shirley Anugrah, Owan, Risako, Hu, Bin, Parkar, Ritik, Koo, Ryan, Park, Jonginn, Tyagi, Aahan, Ferland, Libby, Roy, Sanjali, Liu, Vincent, Kang, Dongyeop

arXiv.org Artificial IntelligenceJan-30-2024

This work delves into the expanding role of large language models (LLMs) in generating artificial data. LLMs are increasingly employed to create a variety of outputs, including annotations, preferences, instruction prompts, simulated dialogues, and free text. As these forms of LLM-generated data often intersect in their application, they exert mutual influence on each other and raise significant concerns about the quality and diversity of the artificial data incorporated into training cycles, leading to an artificial data ecosystem. To the best of our knowledge, this is the first study to aggregate various types of LLM-generated text data, from more tightly constrained data like "task labels" to more lightly constrained "free-form text". We then stress test the quality and implications of LLM-generated artificial data, comparing it with human data across various existing benchmarks. Despite artificial data's capability to match human performance, this paper reveals significant hidden disparities, especially in complex tasks where LLMs often miss the nuanced understanding of intrinsic human-generated content. This study critically examines diverse LLM-generated data and emphasizes the need for ethical practices in data creation and when using LLMs. It highlights the LLMs' shortcomings in replicating human traits and behaviors, underscoring the importance of addressing biases and artifacts produced in LLM-generated content for future research and development. All data and code are available on our project page.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2401.14698

Country:

Europe (1.00)
North America > United States > California (0.14)
Asia > Middle East > UAE (0.14)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Media > Music (0.67)
Leisure & Entertainment > Sports (0.67)
Health & Medicine > Consumer Health (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

Add feedback

When is Offline Policy Selection Sample Efficient for Reinforcement Learning?

Liu, Vincent, Nagarajan, Prabhat, Patterson, Andrew, White, Martha

arXiv.org Artificial IntelligenceDec-4-2023

Offline reinforcement learning algorithms often require careful hyperparameter tuning. Consequently, before deployment, we need to select amongst a set of candidate policies. As yet, however, there is little understanding about the fundamental limits of this offline policy selection (OPS) problem. In this work we aim to provide clarity on when sample efficient OPS is possible, primarily by connecting OPS to off-policy policy evaluation (OPE) and Bellman error (BE) estimation. We first show a hardness result, that in the worst case, OPS is just as hard as OPE, by proving a reduction of OPE to OPS. As a result, no OPS method can be more sample efficient than OPE in the worst case. We then propose a BE method for OPS, called Identifiable BE Selection (IBES), that has a straightforward method for selecting its own hyperparameters. We highlight that using IBES for OPS generally has more requirements than OPE methods, but if satisfied, can be more sample efficient. We conclude with an empirical study comparing OPE and IBES, and by showing the difficulty of OPS on an offline Atari benchmark dataset.

machine learning, reinforcement learning, selection, (14 more...)

arXiv.org Artificial Intelligence

2312.02355

Country: North America > Canada > Alberta (0.28)

Genre: Research Report > New Finding (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

AlpaServe: Statistical Multiplexing with Model Parallelism for Deep Learning Serving

Li, Zhuohan, Zheng, Lianmin, Zhong, Yinmin, Liu, Vincent, Sheng, Ying, Jin, Xin, Huang, Yanping, Chen, Zhifeng, Zhang, Hao, Gonzalez, Joseph E., Stoica, Ion

arXiv.org Artificial IntelligenceJul-19-2023

Model parallelism is conventionally viewed as a method to scale a single large deep learning model beyond the memory limits of a single device. In this paper, we demonstrate that model parallelism can be additionally used for the statistical multiplexing of multiple devices when serving multiple models, even when a single model can fit into a single device. Our work reveals a fundamental trade-off between the overhead introduced by model parallelism and the opportunity to exploit statistical multiplexing to reduce serving latency in the presence of bursty workloads. We explore the new trade-off space and present a novel serving system, AlpaServe, that determines an efficient strategy for placing and parallelizing collections of large deep learning models across a distributed cluster. Evaluation results on production workloads show that AlpaServe can process requests at up to 10x higher rates or 6x more burstiness while staying within latency constraints for more than 99% of requests.

artificial intelligence, machine learning, parallelism, (16 more...)

arXiv.org Artificial Intelligence

2302.11665

Country: North America > United States > California > San Diego County (0.14)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Measuring and Mitigating Interference in Reinforcement Learning

Liu, Vincent, Wang, Han, Tao, Ruo Yu, Javed, Khurram, White, Adam, White, Martha

arXiv.org Artificial IntelligenceJul-10-2023

Catastrophic interference is common in many network-based learning systems, and many proposals exist for mitigating it. Before overcoming interference we must understand it better. In this work, we provide a definition and novel measure of interference for value-based reinforcement learning methods such as Fitted Q-Iteration and DQN. We systematically evaluate our measure of interference, showing that it correlates with instability in control performance, across a variety of network architectures. Our new interference measure allows us to ask novel scientific questions about commonly used deep learning architectures and study learning algorithms which mitigate interference. Lastly, we outline a class of algorithms which we call online-aware that are designed to mitigate interference, and show they do reduce interference according to our measure and that they improve stability and performance in several classic control environments.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

2307.04887

Country: North America > Canada > Alberta (0.14)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Exploiting Action Impact Regularity and Exogenous State Variables for Offline Reinforcement Learning

Liu, Vincent (University of Alberta) | Wright, James R. (University of Alberta) | White, Martha (University of Alberta)

Journal of Artificial Intelligence ResearchMay-11-2023

Offline reinforcement learning--learning a policy from a batch of data--is known to be hard for general MDPs. These results motivate the need to look at specific classes of MDPs where offline reinforcement learning might be feasible. In this work, we explore a restricted class of MDPs to obtain guarantees for offline reinforcement learning. The key property, which we call Action Impact Regularity (AIR), is that actions primarily impact a part of the state (an endogenous component) and have limited impact on the remaining part of the state (an exogenous component). AIR is a strong assumption, but it nonetheless holds in a number of real-world domains including financial markets. We discuss algorithms that exploit the AIR property, and provide a theoretical analysis for an algorithm based on Fitted-Q Iteration. Finally, we demonstrate that the algorithm outperforms existing offline reinforcement learning algorithms across different data collection policies in simulated and real world environments where the regularity holds.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

Journal of Artificial Intelligence Research

doi: 10.1613/jair.1.14580

AI Access Foundation

14580

Journal of Artificial Intelligence Research

Country: North America > Canada > Alberta (0.28)

Genre: Research Report > New Finding (0.46)

Industry:

Banking & Finance > Trading (1.00)
Transportation > Ground > Road (0.93)
Automobiles & Trucks (0.93)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Investigating the Properties of Neural Network Representations in Reinforcement Learning

Wang, Han, Miahi, Erfan, White, Martha, Machado, Marlos C., Abbas, Zaheer, Kumaraswamy, Raksha, Liu, Vincent, White, Adam

arXiv.org Artificial IntelligenceMay-5-2023

In this paper we investigate the properties of representations learned by deep reinforcement learning systems. Much of the early work on representations for reinforcement learning focused on designing fixed-basis architectures to achieve properties thought to be desirable, such as orthogonality and sparsity. In contrast, the idea behind deep reinforcement learning methods is that the agent designer should not encode representational properties, but rather that the data stream should determine the properties of the representation -- good representations emerge under appropriate training schemes. In this paper we bring these two perspectives together, empirically investigating the properties of representations that support transfer in reinforcement learning. We introduce and measure six representational properties over more than 25 thousand agent-task settings. We consider Deep Q-learning agents with different auxiliary losses in a pixel-based navigation environment, with source and transfer tasks corresponding to different goal locations. We develop a method to better understand why some representations work better for transfer, through a systematic approach varying task similarity and measuring and correlating representation properties with transfer performance. We demonstrate the generality of the methodology by investigating representations learned by a Rainbow agent that successfully transfer across games modes in Atari 2600.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

arXiv.org Artificial Intelligence

2203.15955

Country: North America > Canada (0.28)

Genre: Research Report > New Finding (0.93)

Industry: Leisure & Entertainment > Games > Computer Games (0.49)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Exploiting Action Impact Regularity and Exogenous State Variables for Offline Reinforcement Learning

Liu, Vincent, Wright, James R., White, Martha

arXiv.org Artificial IntelligenceMay-3-2023

Offline reinforcement learning -- learning a policy from a batch of data -- is known to be hard for general MDPs. These results motivate the need to look at specific classes of MDPs where offline reinforcement learning might be feasible. In this work, we explore a restricted class of MDPs to obtain guarantees for offline reinforcement learning. The key property, which we call Action Impact Regularity (AIR), is that actions primarily impact a part of the state (an endogenous component) and have limited impact on the remaining part of the state (an exogenous component). AIR is a strong assumption, but it nonetheless holds in a number of real-world domains including financial markets. We discuss algorithms that exploit the AIR property, and provide a theoretical analysis for an algorithm based on Fitted-Q Iteration. Finally, we demonstrate that the algorithm outperforms existing offline reinforcement learning algorithms across different data collection policies in simulated and real world environments where the regularity holds.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

2111.08066

Country: North America > Canada > Alberta (0.28)

Genre: Research Report > New Finding (0.46)

Industry:

Banking & Finance > Trading (1.00)
Transportation > Ground > Road (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Asymptotically Unbiased Off-Policy Policy Evaluation when Reusing Old Data in Nonstationary Environments

Liu, Vincent, Chandak, Yash, Thomas, Philip, White, Martha

arXiv.org Artificial IntelligenceFeb-22-2023

In this work, we consider the off-policy policy evaluation problem for contextual bandits and finite horizon reinforcement learning in the nonstationary setting. Reusing old data is critical for policy evaluation, but existing estimators that reuse old data introduce large bias such that we can not obtain a valid confidence interval. Inspired from a related field called survey sampling, we introduce a variant of the doubly robust (DR) estimator, called the regression-assisted DR estimator, that can incorporate the past data without introducing a large bias. The estimator unifies several existing off-policy policy evaluation methods and improves on them with the use of auxiliary information and a regression approach. We prove that the new estimator is asymptotically unbiased, and provide a consistent variance estimator to a construct a large sample confidence interval. Finally, we empirically show that the new estimator improves estimation for the current and future policy values, and provides a tight and valid interval estimation in several nonstationary recommendation environments.

artificial intelligence, estimator, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2302.11725

Country: North America > United States (0.94)

Genre: Research Report (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback