Goto

Collaborating Authors

 esper



You Can't Count on Luck: Why Decision Transformers and RvS Fail in Stochastic Environments

Neural Information Processing Systems

Recently, methods such as Decision Transformer that reduce reinforcement learning to a prediction task and solve it via supervised learning (RvS) have become popular due to their simplicity, robustness to hyperparameters, and strong overall performance on offline RL tasks. However, simply conditioning a probabilistic model on a desired return and taking the predicted action can fail dramatically in stochastic environments since trajectories that result in a return may have only achieved that return due to luck. In this work, we describe the limitations of RvS approaches in stochastic environments and propose a solution. Rather than simply conditioning on returns, as is standard practice, our proposed method, ESPER, conditions on learned average returns which are independent from environment stochasticity. Doing so allows ESPER to achieve strong alignment between target return and expected performance in real environments. We demonstrate this in several challenging stochastic offline-RL tasks including the challenging puzzle game 2048, and Connect Four playing against a stochastic opponent. In all tested domains, ESPER achieves significantly better alignment between the target return and achieved return than simply conditioning on returns. ESPER also achieves higher maximum performance than even the value-based baselines.



You Can't Count on Luck: Why Decision Transformers and RvS Fail in Stochastic Environments

Neural Information Processing Systems

Recently, methods such as Decision Transformer that reduce reinforcement learning to a prediction task and solve it via supervised learning (RvS) have become popular due to their simplicity, robustness to hyperparameters, and strong overall performance on offline RL tasks. However, simply conditioning a probabilistic model on a desired return and taking the predicted action can fail dramatically in stochastic environments since trajectories that result in a return may have only achieved that return due to luck. In this work, we describe the limitations of RvS approaches in stochastic environments and propose a solution. Rather than simply conditioning on returns, as is standard practice, our proposed method, ESPER, conditions on learned average returns which are independent from environment stochasticity. Doing so allows ESPER to achieve strong alignment between target return and expected performance in real environments. We demonstrate this in several challenging stochastic offline-RL tasks including the challenging puzzle game 2048, and Connect Four playing against a stochastic opponent.


You Can't Count on Luck: Why Decision Transformers and RvS Fail in Stochastic Environments

Paster, Keiran, McIlraith, Sheila, Ba, Jimmy

arXiv.org Artificial Intelligence

Recently, methods such as Decision Transformer that reduce reinforcement learning to a prediction task and solve it via supervised learning (RvS) have become popular due to their simplicity, robustness to hyperparameters, and strong overall performance on offline RL tasks. However, simply conditioning a probabilistic model on a desired return and taking the predicted action can fail dramatically in stochastic environments since trajectories that result in a return may have only achieved that return due to luck. In this work, we describe the limitations of RvS approaches in stochastic environments and propose a solution. Rather than simply conditioning on the return of a single trajectory as is standard practice, our proposed method, ESPER, learns to cluster trajectories and conditions on average cluster returns, which are independent from environment stochasticity. Doing so allows ESPER to achieve strong alignment between target return and expected performance in real environments. We demonstrate this in several challenging stochastic offline-RL tasks including the challenging puzzle game 2048, and Connect Four playing against a stochastic opponent. In all tested domains, ESPER achieves significantly better alignment between the target return and achieved return than simply conditioning on returns. ESPER also achieves higher maximum performance than even the value-based baselines.


Eric Schmidt: A Conflict of Interest

#artificialintelligence

Ethics and Eric Schmidt are rare bedfellows. The former Google/Alphabet CEO/Chairman exudes a sense of predatory self-interest, always making the point that what he wants aligns with what is supposedly good for the United States. He has splashed money on numerous projects, including such artificial intelligence outfits as Rebellion Defense, all the time maintaining uncomfortably close ties to the government advisory circuit. For years, he has been hectoring the Department of Defense to uncritically embrace AI, in other words, machine-learning technology. "You absolutely suck at machine learning," Schmidt boldly told General Raymond Thomas in July 2016, head of US Special Operations Command.


NPS' Data Science, AI Certificate Programs Support DOD Workforce Development

#artificialintelligence

On Sept. 9, during the DOD's semi-annual Artificial Intelligence Symposium and Exposition, Secretary of Defense Mark Esper affirmed that the Joint Artificial Intelligence Center (JAIC) in partnership with the Naval Postgraduate School (NPS) and Defense Acquisition University will collaboratively develop an intensive six-week pilot course delivered to more than 80 defense acquisition professionals of all ranks and grades. "These trainees will learn how to apply AI and data science skills to our operations," Esper said in his remarks. "With the support of Congress, the Department plans to request additional funding for the services to grow this effort over time and deliver an AI-ready workforce to the American people." Just as the university's highly-regarded Harnessing Artificial Intelligence video course paved the way for its support of the pilot course, NPS is well positioned to support Esper's declaration for further workforce development through its existing Data Science Certificate, and an upcoming similar certificate program in Artificial Intelligence. In the ongoing effort to expand the Navy's knowledge and expertise in the fields of data science and artificial intelligence, NPS faculty have developed courses that enable students to quickly gain insights in these critical disciplines.


Eyeing China, Pentagon plans larger and 'more lethal' navy

The Japan Times

Washington – U.S. Secretary of Defense Mark Esper announced Wednesday an ambitious plan to expand the U.S. Navy with a range of unmanned and autonomous ships, submarines and aircraft to confront the growing maritime challenge from China. The Pentagon chief said a sweeping review of U.S. naval power dubbed "Future Forward" had laid out a "game-changer" plan that would expand the U.S. sea fleet to more than 355 ships, from the current 293. The plan, which requires adding tens of billions of dollars to the U.S. Navy's budget between now and 2045, is aimed at maintaining superiority over Chinese naval forces, seen as the primary threat to the United States. "The future fleet will be more balanced in its ability to deliver lethal effects from the air, from the sea, and from under the sea," Esper said in a speech at the Rand Corp. in California. The expansion will add "more and smaller" surface ships; more submarines; surface and subsurface vessels that are optionally manned, unmanned and autonomous; and a broad range of unmanned carrier-based aircraft.


Pentagon to pit AI against human pilots in live fighter trials

#artificialintelligence

U.S. Defense Secretary Mark Esper announced Wednesday that the Pentagon intends to conduct live trials pitting tactical aircraft controlled by artificial intelligence against human pilots in 2024. The announcement comes three weeks after an AI algorithm defeated a human pilot in a simulated dogfight between F-16s, something Esper described as an example of the "tectonic impact of machine learning" for the Defense Department's future. "The AI agent's resounding victory demonstrated the ability of advanced algorithms to outperform humans in virtual dogfights. These simulations will culminate in a real-world competition involving full-scale tactical aircraft in 2024," Esper said in prepared remarks delivered to the department's Artificial Intelligence Symposium. The Aug. 20 test was the finale of the Pentagon research agency's AI air combat competition.


AI to take on human pilots in real-world fighter aircraft trials

#artificialintelligence

AI will face off against human pilots in real-world fighter aircraft by 2024, Secretary of Defense Mark Esper revealed on Wednesday. The Pentagon announced the plan a month after an AI system demolished an Air Force pilot in a virtual dogfight. An algorithm developed by defense contractor Heron Systems swept a best-of-five aerial duel versus an F-16 pilot wearing a VR helmet. The new trials will test how the AI's capabilities transfer to the real world, Esper explained on Wednesday at the Pentagon's first AI Symposium: The AI agent's resounding victory demonstrated the ability of advanced algorithms to out-perform humans in virtual dogfights. To be clear, AI's role in our lethality is to support human decision-makers, not replace them.