suboptimality gap
- North America > United States > Ohio (0.04)
- North America > United States > California > Los Angeles County > Long Beach (0.04)
- Europe > Austria (0.04)
- (5 more...)
Achieving Constant Regret in Linear Markov Decision Processes
We study the constant regret guarantees in reinforcement learning (RL). Our objective is to design an algorithm that incurs only finite regret over infinite episodes with high probability. We introduce an algorithm, Cert-LSVI-UCB, for misspec-ified linear Markov decision processes (MDPs) where both the transition kernel and the reward function can be approximated by some linear function up to mis-specification level ζ . At the core of Cert-LSVI-UCB is an innovative certified estimator, which facilitates a fine-grained concentration analysis for multi-phase value-targeted regression, enabling us to establish an instance-dependent regret bound that is constant w.r.t. the number of episodes.
- North America > United States > California > Los Angeles County > Los Angeles (0.28)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
- North America > United States > North Carolina > Orange County > Chapel Hill (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > United States > Texas > Brazos County > College Station (0.04)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Singapore (0.04)
- North America > Canada (0.04)
- North America > United States > Virginia (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- (2 more...)
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.67)
- North America > United States (0.15)
- Asia > Singapore (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.95)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.68)
- (2 more...)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- North America > United States > Texas (0.04)
- (4 more...)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
- North America > United States > California > San Mateo County > San Mateo (0.04)
- (3 more...)
Sync or Sink: Bounds on Algorithmic Collective Action with Noise and Multiple Groups
Karan, Aditya, Kalle, Prabhat, Vincent, Nicholas, Sundaram, Hari
Collective action against algorithmic systems provides an opportunity for a small group of individuals to strategically manipulate their data to get specific outcomes, from classification to recommendation models. This effectiveness will invite more growth of this type of coordinated actions, both in the size and the number of distinct collectives. With a small group, however, coordination is key. Currently, there is no formal analysis of how coordination challenges within a collective can impact downstream outcomes, or how multiple collectives may affect each other's success. In this work, we aim to provide guarantees on the success of collective action in the presence of both coordination noise and multiple groups. Our insight is that data generated by either multiple collectives or by coordination noise can be viewed as originating from multiple data distributions. Using this framing, we derive bounds on the success of collective action. We conduct experiments to study the effects of noise on collective action. We find that sufficiently high levels of noise can reduce the success of collective action. In certain scenarios, large noise can sink a collective success rate from $100\%$ to just under $60\%$. We identify potential trade-offs between collective size and coordination noise; for example, a collective that is twice as big but with four times more noise experiencing worse outcomes than the smaller, more coordinated one. This work highlights the importance of understanding nuanced dynamics of strategic behavior in algorithmic systems.
- North America > United States > Illinois > Champaign County > Urbana (0.04)
- North America > Canada > New Brunswick > Fredericton (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Burnaby (0.04)
- (2 more...)