Goto

Collaborating Authors

 cooper


OnComputingProbabilisticExplanations forDecisionTrees

Neural Information Processing Systems

For such a reason, the community has started tostudy their probabilistic counterpart, inwhich one requires that the probability ofT(z) = T(x)must beatleast some valueδ (0,1],wherez isa random instance that is compatible withy.


Listening to "The Joe Rogan Experience"

The New Yorker

How a gift for shooting the shit turned into an online empire--and a political force. Trust in American mass media has plummeted; more than three thousand newspapers have disappeared in the past two decades, and many people get their news from social platforms. In this chaotic media multiverse, Rogan has emerged as a figure of singular influence. For a long time, I stayed up through the night listening to tall-tale tellers, U.F.O. I could not get enough of it. I was a fairly ordinary kid, Jersey-born, but the house I lived in was shadowed by illness. My mother had been diagnosed with a debilitating neurological disease when she was in her early thirties. Every year, she got worse. During the day, I wanted nothing more than to please my mother, do well in school, lighten her load. At night, I wanted only to climb into the shelter of my bed and turn on the radio. I was hungry for elsewhere, for other lives--for what was being said down the street, over the bridge, beyond the horizon. On clear nights, the signal was strong. You could hear the country expressing itself incessantly: everyone was phoning in, suggesting three-way trades, bitching about the mayor, speaking in tongues, raging, joking, climbing out on a ledge and threatening to jump. When I wanted a few hours of sleep before school, I tuned in to a ballgame on the West Coast. The staticky murmur of the crowd in Anaheim or Chavez Ravine was a sure slide to oblivion. Mostly, though, I wanted nothing to do with sleep. Mostly, I was tuned in, midnight to five-thirty, to "The Long John Nebel Show."


Is Cognitive Dissonance Actually a Thing?

The New Yorker

Is Cognitive Dissonance Actually a Thing? In 1934, an 8.0-magnitude earthquake hit eastern India, killing thousands and devastating several cities. Curiously, in areas that were spared the worst destruction, stories soon spread that an even bigger disaster was on its way. Leon Festinger, a young American psychologist at the University of Minnesota, read about these rumors in the early nineteen-fifties and was puzzled. Festinger didn't think people would voluntarily adopt anxiety-inducing ideas. Instead, he reasoned, the rumors could better be described as "anxiety justifying." Some had felt the earth shake and were overwhelmed with fear. When the outcome--they were spared--didn't match their emotions, they embraced predictions that affirmed their fright.


Can one big meal really make you gain weight?

Popular Science

Can one big meal really make you gain weight? The post-holiday scale spike is temporary--unless the leftovers get involved. It's hard not to indulge during the holidays, but can the occasional big meal really harm our long-term health? Breakthroughs, discoveries, and DIY tips sent every weekday. For those of us brave enough to step onto the scale the day after Thanksgiving or Christmas, you can sometimes see an increase of up to five to 10 pounds.


Cooper: Co-Optimizing Policy and Reward Models in Reinforcement Learning for Large Language Models

arXiv.org Artificial Intelligence

Large language models (LLMs) have demonstrated remarkable performance in reasoning tasks, where reinforcement learning (RL) serves as a key algorithm for enhancing their reasoning capabilities. Currently, there are two mainstream reward paradigms: model-based rewards and rule-based rewards. However, both approaches suffer from limitations: rule-based rewards lack robustness, while model-based rewards are vulnerable to reward hacking. To address these issues, we propose Cooper(Co-optimizing Policy Model and Reward Model), a RL framework that jointly optimizes both the policy model and the reward model. Cooper leverages the high precision of rule-based rewards when identifying correct responses, and dynamically constructs and selects positive-negative sample pairs for continued training the reward model. This design enhances robustness and mitigates the risk of reward hacking. To further support Cooper, we introduce a hybrid annotation strategy that efficiently and accurately generates training data for the reward model. We also propose a reference-based reward modeling paradigm, where the reward model takes a reference answer as input. Based on this design, we train a reward model named VerifyRM, which achieves higher accuracy on VerifyBench compared to other models of the same size. We conduct reinforcement learning using both VerifyRM and Cooper. Our experiments show that Cooper not only alleviates reward hacking but also improves end-to-end RL performance, for instance, achieving a 0.54% gain in average accuracy on Qwen2.5-1.5B-Instruct. Our findings demonstrate that dynamically updating reward model is an effective way to combat reward hacking, providing a reference for better integrating reward models into RL.


Thinking in Character: Advancing Role-Playing Agents with Role-Aware Reasoning

arXiv.org Artificial Intelligence

The advancement of Large Language Models (LLMs) has spurred significant interest in Role-Playing Agents (RPAs) for applications such as emotional companionship and virtual interaction. However, recent RPAs are often built on explicit dialogue data, lacking deep, human-like internal thought processes, resulting in superficial knowledge and style expression. While Large Reasoning Models (LRMs) can be employed to simulate character thought, their direct application is hindered by attention diversion (i.e., RPAs forget their role) and style drift (i.e., overly formal and rigid reasoning rather than character-consistent reasoning). To address these challenges, this paper introduces a novel Role-Aware Reasoning (RAR) method, which consists of two important stages: Role Identity Activation (RIA) and Reasoning Style Optimization (RSO). RIA explicitly guides the model with character profiles during reasoning to counteract attention diversion, and then RSO aligns reasoning style with the character and scene via LRM distillation to mitigate style drift. Extensive experiments demonstrate that the proposed RAR significantly enhances the performance of RPAs by effectively addressing attention diversion and style drift.


Decentralized Signaling Mechanisms

arXiv.org Artificial Intelligence

We study a system composed of multiple distinct service locations that aims to convince customers to join the system by providing information to customers. We cast the system's information design problem in the framework of Bayesian persuasion and describe centralized and decentralized signaling. We provide efficient methods for computing the system's optimal centralized and decentralized signaling mechanisms and derive a performance guarantee for decentralized signaling when the locations' states are independent. The guarantee states that the probability that a customer joins under optimal decentralized signaling is bounded below by the product of a strictly positive constant and the probability that a customer joins under optimal centralized signaling. The constant depends only on the number of service locations. We provide an example that shows that the constant cannot be improved. We consider an extension to more-general objectives for the system and establish that the same guarantee continues to hold. We also extend our analysis to systems where the locations' states are correlated, and again derive a performance guarantee for decentralized signaling in that setting. For the correlated setting, we prove that the guarantee's asymptotic dependence upon the number of locations cannot be substantially improved. A comparison of our guarantees for independent locations and for correlated locations reveals the influence of dependence on the performance of decentralized signaling.


Cooper: A Library for Constrained Optimization in Deep Learning

arXiv.org Artificial Intelligence

Cooper is an open-source package for solving constrained optimization problems involving deep learning models. Cooper implements several Lagrangian-based first-order update schemes, making it easy to combine constrained optimization algorithms with high-level features of PyTorch such as automatic differentiation, and specialized deep learning architectures and optimizers. Although Cooper is specifically designed for deep learning applications where gradients are estimated based on mini-batches, it is suitable for general non-convex continuous constrained optimization. Cooper's source code is available at https://github.com/cooper-org/cooper.


Nikki Glaser tells Gwyneth Paltrow she tried to hook up with actress' ex Ben Affleck

FOX News

Celebrity matchmaker Alessandra Conti told Fox News Digital that Garner and Affleck are incredible co-parents. Gwyneth Paltrow and Nikki Glaser are spilling the tea when it comes to their connections to Ben Affleck. During a recent episode of Paltrow's "Goop Podcast," the duo openly discussed Glaser's past history of using Raya, an exclusive dating app. While discussing her 2025 Golden Globe Awards opening monologue in which she joked about Affleck yelling the titles of movies "after he orgasms," Glaser said, "When I used to be on Raya and [Ben] would come across, [I would give him a] very concentrated check mark'yes' and, like, never [got] it back." GWYNETH PALTROW SAYS BEN AFFLECK WAS'EXCELLENT' IN BED COMPARED TO BRAD PITT Nikki Glaser told Gwyneth Paltrow she once tried to hook up with the actress' ex, Ben Affleck.


Taylor Sheridan's Newest Hit Is the Perfect Show for Our Times

Slate

Taylor Sheridan, the most overextended man in television, has done it again. Landman, according to the internal metrics at Paramount, is the most watched original show the streamer has ever had. Remember, Yellowstone proper is on Peacock.) The West Texas–set story, which stars Billy Bob Thornton as Tommy Norris, an all-purpose problem solver for a fictional oil company owned by Monty Miller (Jon Hamm), has also developed a bit more of a critical halo than Sheridan's other TV ventures, popping up on best-of-2024 lists, edging into mainstream discourse via podcasts that typically cover more-prestige fare, and retaining a score of 80 percent on Rotten Tomatoes. And the week before Landman wrapped up, this past Sunday night, its lead actor, Billy Bob Thornton, attended the Golden Globes as a nominee for his role in the series.