poker player
Solving the Granularity Mismatch: Hierarchical Preference Learning for Long-Horizon LLM Agents
Gao, Heyang, Sun, Zexu, Min, Erxue, Cai, Hengyi, Wang, Shuaiqiang, Yin, Dawei, Chen, Xu
Large Language Models (LLMs) as autonomous agents are increasingly tasked with solving complex, long-horizon problems. Aligning these agents via preference-based offline methods like Direct Preference Optimization (DPO) is a promising direction, yet it faces a critical granularity mismatch. Trajectory-level DPO provides a signal that is too coarse for precise credit assignment, while step-level DPO is often too myopic to capture the value of multi-step behaviors. To resolve this challenge, we introduce Hierarchical Preference Learning (HPL), a hierarchical framework that optimizes LLM agents by leveraging preference signals at multiple, synergistic granularities. While HPL incorporates trajectory- and step-level DPO for global and local policy stability, its core innovation lies in group-level preference optimization guided by a dual-layer curriculum. Our approach first decomposes expert trajectories into semantically coherent action groups and then generates contrasting suboptimal groups to enable preference learning at a fine-grained, sub-task level. Then, instead of treating all preference pairs equally, HPL introduces a curriculum scheduler that organizes the learning process from simple to complex. This curriculum is structured along two axes: the group length, representing sub-task complexity, and the sample difficulty, defined by the reward gap between preferred and dispreferred action groups. Experiments on three challenging agent benchmarks show that HPL outperforms existing state-of-the-art methods. Our analyses demonstrate that the hierarchical DPO loss effectively integrates preference signals across multiple granularities, while the dual-layer curriculum is crucial for enabling the agent to solve a wide range of tasks, from simple behaviors to complex multi-step sequences.
Nate Silver's New Book, "On the Edge," Reviewed
Keeping a poker face had never struck me as much of a feat--until I had to keep one. My pulse quickened, my cheeks felt flushed, and my eyes were desperate to dart and size up the pot. What had been a mediocre hand was transformed, after the flop came down, into something spectacular: every card from seven to jack--a straight. All that remained was to play it cool and build up my cash prize. The bets started small, and then grew. The next two cards looked innocuous enough.
- North America > United States > California (0.05)
- North America > United States > Texas (0.05)
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > Nevada > Clark County > Las Vegas (0.04)
- Summary/Review (1.00)
- Instructional Material > Course Syllabus & Notes (0.47)
- Leisure & Entertainment > Games (1.00)
- Government (0.94)
On the Edge by Nate Silver review – the art of risk-taking
Nothing is more interesting to poker players and less interesting to everyone else than a breathless recounting of who bet how much with a jack and six of clubs in some game years ago. There's an awful lot of that kind of thing in this book, which celebrates poker players as paradigmatic citizens of a global intellectual community it calls "the River", which also counts among its inhabitants venture capitalists, crypto traders, fashionable philosophers and mild-mannered statisticians. One such statistician, Nate Silver himself, came to public prominence as a data-driven analyst of political polls at his website FiveThirtyEight, which predicted the results of US elections in 2008 and 2012 with seemingly uncanny accuracy. But before that he was a poker player, making money especially in the nascent internet-casino business, until Congress banned online poker in 2006. That, he has said, was his political awakening.
- Banking & Finance (0.90)
- Leisure & Entertainment > Games (0.77)
- Government > Regional Government > North America Government > United States Government (0.35)
IBM finally finds someone willing to buy Watson
In-brief IBM has offloaded healthcare data and analytics assets from its Watson Health business, with private equity firm Francisco Partners hand over around $1bn for the privilege. The takeover "is a clear next step as IBM becomes even more focused on our platform-based hybrid cloud and AI strategy," Tom Rosamilia, senior vice president, IBM Software, told newswire Bloomberg. "IBM remains committed to Watson, our broader AI business, and to the clients and partners we support in healthcare IT." Launched in 2015, IBM Watson Health hasn't been able to turn a profit despite the company spending $4bn in acquisitions to grow the business and its capabilities. IBM has tried to whittle down its Watson Health division for a while, after struggling to sign hospitals as clients. Professional poker players are increasingly consulting specialized poker software programs to boost their chances of winning, but some believe it has made the game less fun and encourages cheating online.
- North America > United States > Texas (0.06)
- North America > United States > Illinois > Cook County > Chicago (0.06)
- Information Technology (1.00)
- Leisure & Entertainment > Games (0.42)
The AI Threat: Winner-Takes-All
They like to play until the winner takes all of the winnings, which ultimately includes all my money. Imagine running a business where your main competitor has the dominant market share, and you are in second place. You have been struggling for years to overtake your primary competitor, but they have advantages in product, in costs, and in marketing that you can't match. You are improving, but your competitor is improving at the same rate. You are stuck in a perpetual second place.
The Deck Is Not Rigged: Poker and the Limits of AI
Tuomas Sandholm, a computer scientist at Carnegie Mellon University, is not a poker player--or much of a poker fan, in fact--but he is fascinated by the game for much the same reason as the great game theorist John von Neumann before him. Von Neumann, who died in 1957, viewed poker as the perfect model for human decision making, for finding the balance between skill and chance that accompanies our every choice. He saw poker as the ultimate strategic challenge, combining as it does not just the mathematical elements of a game like chess but the uniquely human, psychological angles that are more difficult to model precisely--a view shared years later by Sandholm in his research with artificial intelligence. "Poker is the main benchmark and challenge program for games of imperfect information," Sandholm told me on a warm spring afternoon in 2018, when we met in his offices in Pittsburgh. The game, it turns out, has become the gold standard for developing artificial intelligence.
The Deck Is Not Rigged: Poker and the Limits of AI
Tuomas Sandholm, a computer scientist at Carnegie Mellon University, is not a poker player -- or much of a poker fan, in fact -- but he is fascinated by the game for much the same reason as the great game theorist John von Neumann before him. Von Neumann, who died in 1957, viewed poker as the perfect model for human decision making, for finding the balance between skill and chance that accompanies our every choice. He saw poker as the ultimate strategic challenge, combining as it does not just the mathematical elements of a game like chess but the uniquely human, psychological angles that are more difficult to model precisely -- a view shared years later by Sandholm in his research with artificial intelligence. WHAT I LEFT OUT is a recurring feature in which book authors are invited to share anecdotes and narratives that, for whatever reason, did not make it into their final manuscripts. In this installment, Maria Konnikova shares a story that was left out of "The Biggest Bluff: How I Learned to Pay Attention, Master Myself, and Win" (Penguin Press). "Poker is the main benchmark and challenge program for games of imperfect information," Sandholm told me on a warm spring afternoon in 2018, when we met in his offices in Pittsburgh.
Facebook AI Research Is A Game-Changer
For decades, computer programmers have been trying to beat multiplayer games by finding reliable patterns in data. Researchers at Facebook and Carnegie Mellon University published a whitepaper in Science Journal in July that flips this switch. Their software embraces randomness, and it is reliably beating humans at games. Smart bearded person in a classic gray suit is playing poker at casino in smoke sitting at the table... [ ] with chips and cards on it . He is holding a glass of whiskey in his hand and looking away.
- North America > United States > Texas (0.05)
- North America > United States > New York (0.05)
- North America > United States > California > Orange County > Irvine (0.05)
- Leisure & Entertainment > Games (1.00)
- Information Technology (1.00)
Artificial Intelligence Masters The Game of Poker – What Does That Mean For Humans?
While AI had some success at beating humans at other games such as chess and Go (games that follow predefined rules and aren't random), winning at poker proved to be more challenging because it requires strategy, intuition, and reasoning based on hidden information. Despite the challenges, artificial intelligence can now play--and win--poker. Artificial intelligence systems including DeepStack and Libratus paved the way for Pluribus, the AI that beat five other players in six-player Texas Hold'em, the most popular version of poker. This feat goes beyond games. This achievement means that artificial intelligence can now expand to help solve some of the world's most challenging issues.
- North America > United States > Texas (0.32)
- North America > Canada > Alberta > Census Division No. 11 > Edmonton Metropolitan Region > Edmonton (0.05)
Artificial Intelligence Masters The Game of Poker – What Does That Mean For Humans?
While AI had some success at beating humans at other games such as chess and Go (games that follow predefined rules and aren't random), winning at poker proved to be more challenging because it requires strategy, intuition, and reasoning based on hidden information. Despite the challenges, artificial intelligence can now play--and win--poker. Artificial intelligence systems including DeepStack and Libratus paved the way for Pluribus, the AI that beat five other players in six-player Texas Hold'em, the most popular version of poker. This feat goes beyond games. This achievement means that artificial intelligence can now expand to help solve some of the world's most challenging issues.
- North America > United States > Texas (0.37)
- North America > Canada > Alberta > Census Division No. 11 > Edmonton Metropolitan Region > Edmonton (0.06)