AITopics | Chang, William

Plotting

Chang, William

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Multiplayer Information Asymmetric Contextual Bandits

Chang, William, Lu, Yuanhao

arXiv.org Artificial IntelligenceMar-11-2025

Single-player contextual bandits are a well-studied problem in reinforcement learning that has seen applications in various fields such as advertising, healthcare, and finance. In light of the recent work on \emph{information asymmetric} bandits \cite{chang2022online, chang2023online}, we propose a novel multiplayer information asymmetric contextual bandit framework where there are multiple players each with their own set of actions. At every round, they observe the same context vectors and simultaneously take an action from their own set of actions, giving rise to a joint action. However, upon taking this action the players are subjected to information asymmetry in (1) actions and/or (2) rewards. We designed an algorithm \texttt{LinUCB} by modifying the classical single-player algorithm \texttt{LinUCB} in \cite{chu2011contextual} to achieve the optimal regret $O(\sqrt{T})$ when only one kind of asymmetry is present. We then propose a novel algorithm \texttt{ETC} that is built on explore-then-commit principles to achieve the same optimal regret when both types of asymmetry are present.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

2503.08961

Country:

North America > United States (0.28)
Asia > Middle East > Qatar (0.14)

Genre: Research Report (0.82)

Industry: Health & Medicine (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.67)

Add feedback

Multiplayer Information Asymmetric Bandits in Metric Spaces

Chang, William, Karthik, Aditi

arXiv.org Machine LearningMar-10-2025

In recent years the information asymmetric Lipschitz bandits In this paper we studied the Lipschitz bandit problem applied to the multiplayer information asymmetric problem studied in \cite{chang2022online, chang2023optimal}. More specifically we consider information asymmetry in rewards, actions, or both. We adopt the CAB algorithm given in \cite{kleinberg2004nearly} which uses a fixed discretization to give regret bounds of the same order (in the dimension of the action) space in all 3 problem settings. We also adopt their zooming algorithm \cite{ kleinberg2008multi}which uses an adaptive discretization and apply it to information asymmetry in rewards and information asymmetry in actions.

artificial intelligence, data mining, machine learning, (15 more...)

arXiv.org Machine Learning

2503.08004

Country: North America > United States > California (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Data Science > Data Mining > Big Data (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning > Representation Of Examples (0.42)

Add feedback

Finite-Time Frequentist Regret Bounds of Multi-Agent Thompson Sampling on Sparse Hypergraphs

Jin, Tianyuan, Hsu, Hao-Lun, Chang, William, Xu, Pan

arXiv.org Machine LearningDec-24-2023

We study the multi-agent multi-armed bandit (MAMAB) problem, where $m$ agents are factored into $\rho$ overlapping groups. Each group represents a hyperedge, forming a hypergraph over the agents. At each round of interaction, the learner pulls a joint arm (composed of individual arms for each agent) and receives a reward according to the hypergraph structure. Specifically, we assume there is a local reward for each hyperedge, and the reward of the joint arm is the sum of these local rewards. Previous work introduced the multi-agent Thompson sampling (MATS) algorithm \citep{verstraeten2020multiagent} and derived a Bayesian regret bound. However, it remains an open problem how to derive a frequentist regret bound for Thompson sampling in this multi-agent setting. To address these issues, we propose an efficient variant of MATS, the $\epsilon$-exploring Multi-Agent Thompson Sampling ($\epsilon$-MATS) algorithm, which performs MATS exploration with probability $\epsilon$ while adopts a greedy policy otherwise. We prove that $\epsilon$-MATS achieves a worst-case frequentist regret bound that is sublinear in both the time horizon and the local arm size. We also derive a lower bound for this setting, which implies our frequentist regret upper bound is optimal up to constant and logarithm terms, when the hypergraph is sufficiently sparse. Thorough experiments on standard MAMAB problems demonstrate the superior performance and the improved computational efficiency of $\epsilon$-MATS compared with existing algorithms in the same setting.

artificial intelligence, joint arm, machine learning, (15 more...)

arXiv.org Machine Learning

2312.15549

Country: North America > United States > California (0.14)

Genre: Research Report (1.00)

Industry: Energy (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Optimal Cooperative Multiplayer Learning Bandits with Noisy Rewards and No Communication

Chang, William, Lu, Yuanhao

arXiv.org Machine LearningNov-10-2023

We consider a cooperative multiplayer bandit learning problem where the players are only allowed to agree on a strategy beforehand, but cannot communicate during the learning process. In this problem, each player simultaneously selects an action. Based on the actions selected by all players, the team of players receives a reward. The actions of all the players are commonly observed. However, each player receives a noisy version of the reward which cannot be shared with other players. Since players receive potentially different rewards, there is an asymmetry in the information used to select their actions. In this paper, we provide an algorithm based on upper and lower confidence bounds that the players can use to select their optimal actions despite the asymmetry in the reward information. We show that this algorithm can achieve logarithmic $O(\frac{\log T}{\Delta_{\bm{a}}})$ (gap-dependent) regret as well as $O(\sqrt{T\log T})$ (gap-independent) regret. This is asymptotically optimal in $T$. We also show that it performs empirically better than the current state of the art algorithm for this environment.

algorithm, artificial intelligence, machine learning, (12 more...)

arXiv.org Machine Learning

2311.0621

Country: North America > United States > California (0.14)

Genre: Research Report (0.64)

Industry:

Energy > Oil & Gas > Upstream (0.48)
Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

No-Regret Online Reinforcement Learning with Adversarial Losses and Transitions

Jin, Tiancheng, Liu, Junyan, Rouyer, Chloé, Chang, William, Wei, Chen-Yu, Luo, Haipeng

arXiv.org Machine LearningOct-26-2023

Existing online learning algorithms for adversarial Markov Decision Processes achieve ${O}(\sqrt{T})$ regret after $T$ rounds of interactions even if the loss functions are chosen arbitrarily by an adversary, with the caveat that the transition function has to be fixed. This is because it has been shown that adversarial transition functions make no-regret learning impossible. Despite such impossibility results, in this work, we develop algorithms that can handle both adversarial losses and adversarial transitions, with regret increasing smoothly in the degree of maliciousness of the adversary. More concretely, we first propose an algorithm that enjoys $\widetilde{{O}}(\sqrt{T} + C^{\textsf{P}})$ regret where $C^{\textsf{P}}$ measures how adversarial the transition functions are and can be at most ${O}(T)$. While this algorithm itself requires knowledge of $C^{\textsf{P}}$, we further develop a black-box reduction approach that removes this requirement. Moreover, we also show that further refinements of the algorithm not only maintains the same regret bound, but also simultaneously adapts to easier environments (where losses are generated in a certain stochastically constrained manner as in Jin et al. [2021]) and achieves $\widetilde{{O}}(U + \sqrt{UC^{\textsf{L}}} + C^{\textsf{P}})$ regret, where $U$ is some standard gap-dependent coefficient and $C^{\textsf{L}}$ is the amount of corruption on losses.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Machine Learning

2305.1738

Country: North America > United States > California (0.45)

Genre:

Research Report (0.64)
Instructional Material > Online (0.40)

Industry: Education > Educational Setting (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.50)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Add feedback

Approximation Capabilities of Neural Networks using Morphological Perceptrons and Generalizations

Chang, William, Hamad, Hassan, Chugg, Keith M.

arXiv.org Artificial IntelligenceJul-16-2022

Standard artificial neural networks (ANNs) use sum-product or multiply-accumulate node operations with a memoryless nonlinear activation. These neural networks are known to have universal function approximation capabilities. Previously proposed morphological perceptrons use max-sum, in place of sum-product, node processing and have promising properties for circuit implementations. In this paper we show that these max-sum ANNs do not have universal approximation capabilities. Furthermore, we consider proposed signed-max-sum and max-star-sum generalizations of morphological ANNs and show that these variants also do not have universal approximation capabilities. We contrast these variations to log-number system (LNS) implementations which also avoid multiplications, but do exhibit universal approximation capabilities.

artificial intelligence, machine learning, neural network, (15 more...)

arXiv.org Artificial Intelligence

2207.07832

Country:

North America > United States > California (0.46)
North America > United States > Minnesota (0.28)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.62)

Add feedback