Goto

Collaborating Authors

 coup


Reward Transfer from Inverse Reinforcement Learning: A Coupled Minimax Approach

arXiv.org Machine Learning

Expert demonstrations, such as those from car drivers, help navigate environments with unknown rewards, but are often collected in controlled settings, such as closed-course test tracks, while learned control policies must be deployed in new environments, such as city streets. We can imitate experts to perform well in the same source environment where demonstrations are observed, and we may even use inverse reinforcement learning (IRL) to improve on simple behavior cloning (Ng and Russell, 2000; Abbeel and Ng, 2004; Ziebart et al., 2008; Fu et al., 2018; Geng et al., 2020). But the target environment may have a different transition law, discount factor, or soft-control regularization. For this, IRL is crucial: we can learn a reward from demonstrations in the source environment and transfer it to the target environment, learning a policy that optimizes the same reward function in a new setting (Fu et al., 2018; Schlaginhaufen and Kamgarpour, 2024). In this paper, we characterize how well this transfer can be done and which approaches are preferable. In particular, we show the value in a coupled approach that takes the target environment into account even when learning from the source. In ordinary offline control, the Bellman equation uses a known reward, so the main statistical error comes from target transitions.


What's happening in Myanmar's civil war as military holds elections?

Al Jazeera

What's happening in Myanmar's civil war as military holds elections? Voters in parts of Myanmar are heading to the polls on Sunday for an election that critics view as a bid by the country's generals to legitimise military rule, nearly five years after they overthrew the government of Nobel Laureate Aung San Suu Kyi. The multi-phased election is unfolding amid a raging civil war, with ethnic armed groups and opposition militias fighting the military for control of vast stretches of territory, stretching from the borderlands with Bangladesh and India in the west, across the central plains, to the frontiers with China and Thailand in the north and east. Another third will be covered during a second and third phase in January, while voting has been cancelled altogether in the remainder. Fighting, including air raids and arson, has intensified in several areas.


Practical, Utilitarian Algorithm Configuration

arXiv.org Artificial Intelligence

Utilitarian algorithm configuration identifies a parameter setting for a given algorithm that maximizes a user's utility. Utility functions offer a theoretically well-grounded approach to optimizing decision-making under uncertainty and are flexible enough to capture a user's preferences over algorithm runtimes (e.g., they can describe a sharp cutoff after which a solution is no longer required, a per-hour cost for compute, or diminishing returns from algorithms that take longer to run). COUP is a recently-introduced utilitarian algorithm configuration procedure which was designed mainly to offer strong theoretical guarantees about the quality of the configuration it returns, with less attention paid to its practical performance. This paper closes that gap, bringing theoretically-grounded, utilitarian algorithm configuration to the point where it is competitive with widely used, heuristic configuration procedures that offer no performance guarantees. We present a series of improvements to COUP that improve its empirical performance without degrading its theoretical guarantees and demonstrate their benefit experimentally. Using a case study, we also illustrate ways of exploring the robustness of a given solution to the algorithm selection problem to variations in the utility function.


Strategy Masking: A Method for Guardrails in Value-based Reinforcement Learning Agents

arXiv.org Artificial Intelligence

The use of reward functions to structure AI learning and decision making is core to the current reinforcement learning paradigm; however, without careful design of reward functions, agents can learn to solve problems in ways that may be considered ``undesirable" or ``unethical. Without thorough understanding of the incentives a reward function creates, it can be difficult to impose principled yet general control mechanisms over its behavior. In this paper, we study methods for constructing guardrails for AI agents that use reward functions to learn decision making. We introduce a novel approach, which we call strategy masking, to explicitly learn and then suppress undesirable AI agent behavior. We apply our method to study lying in AI agents and show that strategy masking can effectively modify agent behavior by suppressing, or actively penalizing, the reward dimension for lying such that agents act more honestly while not compromising their ability to perform effectively.


Utilitarian Algorithm Configuration for Infinite Parameter Spaces

arXiv.org Artificial Intelligence

Utilitarian algorithm configuration is a general-purpose technique for automatically searching the parameter space of a given algorithm to optimize its performance, as measured by a given utility function, on a given set of inputs. Recently introduced utilitarian configuration procedures offer optimality guarantees about the returned parameterization while provably adapting to the hardness of the underlying problem. However, the applicability of these approaches is severely limited by the fact that they only search a finite, relatively small set of parameters. They cannot effectively search the configuration space of algorithms with continuous or uncountable parameters. In this paper we introduce a new procedure, which we dub COUP (Continuous, Optimistic Utilitarian Procrastination). COUP is designed to search infinite parameter spaces efficiently to find good configurations quickly. Furthermore, COUP maintains the theoretical benefits of previous utilitarian configuration procedures when applied to finite parameter spaces but is significantly faster, both provably and experimentally.


OpenAI's Chief Scientist Made a Tragic Miscalculation

The Atlantic - Technology

Ilya Sutskever, bless his heart. Until recently, to the extent that Sutskever was known at all, it was as a brilliant artificial-intelligence researcher. He was the star student who helped Geoffrey Hinton, one of the "godfathers of AI," kick off the so-called deep-learning revolution. In 2015, after a short stint at Google, Sutskever co-founded OpenAI and eventually became its chief scientist; so important was he to the company's success that Elon Musk has taken credit for recruiting him. Still, apart from niche podcast appearances and the obligatory hour-plus back-and-forth with Lex Fridman, Sutskever didn't have much of a public profile before this past weekend.


Microsoft 'pulled off a coup' of its own hiring Sam Altman, analysts say

Washington Post - Technology News

"If many OpenAI employees choose to migrate to Microsoft to join Mr. Altman and Mr. Brockman, then not only would Microsoft hold a license to OpenAI's (intellectual property) up to (artificial general intelligence, an AI-system that's generally smarter than humans), but Microsoft would also be effectively acquiring OpenAI's core differentiation -- its ambitious and experienced technical talent," Havemeyer added.


Sam Altman was 'shocked and saddened' after he was fired as CEO of OpenAI

Engadget

Sam Altman and Greg Brockman were "shocked and saddened by what the board did" and are still trying to figure out what exactly happened. The former CEO and the former President of OpenAI have published a post on X, sharing the details of what they do know and how they found out the former was being fired. Apparently, company co-founder Ilya Sutskever invited Altman for a meeting at noon on Friday, which was then attended by the whole board except for Brockman. It was at that meeting that Altman found out he was being fired and that OpenAI was going to announce it "very soon." Shortly after that, Sutskever reportedly invited Brockman to a separate Google Meet conference, where he was told that Altman had gotten fired and that he was being removed from the board.


Driven from city life to jungle insurgency

The Japan Times

On jungle crests about 1 mile from the front lines in eastern Myanmar, a former hotel banquet coordinator slipped his index finger onto the trigger of an assault rifle. A dentist recalled picking larvae from a young fighter's infected bullet wound. A marketing manager described the adapted commercial drones she is directing to foil the enemy. More than a year after Myanmar's military seized full control in a coup -- imprisoning the nation's elected leaders, killing more than 1,700 civilians and arresting at least 13,000 more -- the country is at war, with some unlikely combatants in the fray. On one side is a military junta that, apart from a brief interlude of semidemocratic governance, has ruled with brutal force for a half-century.


Oh, This Game Set in Latin America Has a Coup? How Original

WIRED

For quite some time, I've felt a deep unease playing shooting games set in the modern world. While I'm always delighted to have 11-year-olds pulverize me in Fortnite, or to drop into a zombie-infested city for make-believe fun, when it comes to more realistic shooters I get hung up on the details. For games in the Call of Duty or Tom Clancy franchises, these details usually entail an express ride through a soul-crushing wheel of stereotypes and a kaleidoscope of ahistorical musings extracted from a fictional mashup of the Cold War and the war on drugs. Likewise, as a historian of Latin America and someone who grew up in a Mexican-American community on the US–Mexico border, the genre's ongoing obsession with depicting everything south of my hometown as simultaneously exotic, corrupt, and tyrannical is tedious at best and enraging at worst. So when the reviews for Far Cry 6 started trickling into cyberspace, I wasn't surprised to read that the it rehashed all of the worst stereotypes we've come to expect from video games set in Latin America.