cooper
Can one big meal really make you gain weight?
Can one big meal really make you gain weight? The post-holiday scale spike is temporary--unless the leftovers get involved. It's hard not to indulge during the holidays, but can the occasional big meal really harm our long-term health? Breakthroughs, discoveries, and DIY tips sent every weekday. For those of us brave enough to step onto the scale the day after Thanksgiving or Christmas, you can sometimes see an increase of up to five to 10 pounds.
- Health & Medicine > Consumer Health (1.00)
- Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.34)
- Health & Medicine > Therapeutic Area > Endocrinology (0.34)
- Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.31)
Cooper: Co-Optimizing Policy and Reward Models in Reinforcement Learning for Large Language Models
Hong, Haitao, Yan, Yuchen, Wu, Xingyu, Hou, Guiyang, Zhang, Wenqi, Lu, Weiming, Shen, Yongliang, Xiao, Jun
Large language models (LLMs) have demonstrated remarkable performance in reasoning tasks, where reinforcement learning (RL) serves as a key algorithm for enhancing their reasoning capabilities. Currently, there are two mainstream reward paradigms: model-based rewards and rule-based rewards. However, both approaches suffer from limitations: rule-based rewards lack robustness, while model-based rewards are vulnerable to reward hacking. To address these issues, we propose Cooper(Co-optimizing Policy Model and Reward Model), a RL framework that jointly optimizes both the policy model and the reward model. Cooper leverages the high precision of rule-based rewards when identifying correct responses, and dynamically constructs and selects positive-negative sample pairs for continued training the reward model. This design enhances robustness and mitigates the risk of reward hacking. To further support Cooper, we introduce a hybrid annotation strategy that efficiently and accurately generates training data for the reward model. We also propose a reference-based reward modeling paradigm, where the reward model takes a reference answer as input. Based on this design, we train a reward model named VerifyRM, which achieves higher accuracy on VerifyBench compared to other models of the same size. We conduct reinforcement learning using both VerifyRM and Cooper. Our experiments show that Cooper not only alleviates reward hacking but also improves end-to-end RL performance, for instance, achieving a 0.54% gain in average accuracy on Qwen2.5-1.5B-Instruct. Our findings demonstrate that dynamically updating reward model is an effective way to combat reward hacking, providing a reference for better integrating reward models into RL.
- Europe > Austria > Vienna (0.14)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- North America > United States > California > Los Angeles County > Long Beach (0.04)
- (4 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.89)
Thinking in Character: Advancing Role-Playing Agents with Role-Aware Reasoning
Tang, Yihong, Chen, Kehai, Yang, Muyun, Niu, Zhengyu, Li, Jing, Zhao, Tiejun, Zhang, Min
The advancement of Large Language Models (LLMs) has spurred significant interest in Role-Playing Agents (RPAs) for applications such as emotional companionship and virtual interaction. However, recent RPAs are often built on explicit dialogue data, lacking deep, human-like internal thought processes, resulting in superficial knowledge and style expression. While Large Reasoning Models (LRMs) can be employed to simulate character thought, their direct application is hindered by attention diversion (i.e., RPAs forget their role) and style drift (i.e., overly formal and rigid reasoning rather than character-consistent reasoning). To address these challenges, this paper introduces a novel Role-Aware Reasoning (RAR) method, which consists of two important stages: Role Identity Activation (RIA) and Reasoning Style Optimization (RSO). RIA explicitly guides the model with character profiles during reasoning to counteract attention diversion, and then RSO aligns reasoning style with the character and scene via LRM distillation to mitigate style drift. Extensive experiments demonstrate that the proposed RAR significantly enhances the performance of RPAs by effectively addressing attention diversion and style drift.
- Asia > Thailand > Bangkok > Bangkok (0.04)
- North America > United States > Florida > Miami-Dade County > Miami (0.04)
- North America > Canada > Ontario > Toronto (0.04)
- (7 more...)
Decentralized Signaling Mechanisms
Boroujeni, Niloufar Mirzavand, Iyer, Krishnamurthy, Cooper, William L.
We study a system composed of multiple distinct service locations that aims to convince customers to join the system by providing information to customers. We cast the system's information design problem in the framework of Bayesian persuasion and describe centralized and decentralized signaling. We provide efficient methods for computing the system's optimal centralized and decentralized signaling mechanisms and derive a performance guarantee for decentralized signaling when the locations' states are independent. The guarantee states that the probability that a customer joins under optimal decentralized signaling is bounded below by the product of a strictly positive constant and the probability that a customer joins under optimal centralized signaling. The constant depends only on the number of service locations. We provide an example that shows that the constant cannot be improved. We consider an extension to more-general objectives for the system and establish that the same guarantee continues to hold. We also extend our analysis to systems where the locations' states are correlated, and again derive a performance guarantee for decentralized signaling in that setting. For the correlated setting, we prove that the guarantee's asymptotic dependence upon the number of locations cannot be substantially improved. A comparison of our guarantees for independent locations and for correlated locations reveals the influence of dependence on the performance of decentralized signaling.
- Europe > Kosovo > District of Gjilan > Kamenica (0.04)
- North America > United States > Minnesota (0.04)
- North America > United States > New York > New York County > New York City (0.04)
Cooper: A Library for Constrained Optimization in Deep Learning
Gallego-Posada, Jose, Ramirez, Juan, Hashemizadeh, Meraj, Lacoste-Julien, Simon
Cooper is an open-source package for solving constrained optimization problems involving deep learning models. Cooper implements several Lagrangian-based first-order update schemes, making it easy to combine constrained optimization algorithms with high-level features of PyTorch such as automatic differentiation, and specialized deep learning architectures and optimizers. Although Cooper is specifically designed for deep learning applications where gradients are estimated based on mini-batches, it is suitable for general non-convex continuous constrained optimization. Cooper's source code is available at https://github.com/cooper-org/cooper.
- North America > Canada > Quebec (0.05)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Nikki Glaser tells Gwyneth Paltrow she tried to hook up with actress' ex Ben Affleck
Celebrity matchmaker Alessandra Conti told Fox News Digital that Garner and Affleck are incredible co-parents. Gwyneth Paltrow and Nikki Glaser are spilling the tea when it comes to their connections to Ben Affleck. During a recent episode of Paltrow's "Goop Podcast," the duo openly discussed Glaser's past history of using Raya, an exclusive dating app. While discussing her 2025 Golden Globe Awards opening monologue in which she joked about Affleck yelling the titles of movies "after he orgasms," Glaser said, "When I used to be on Raya and [Ben] would come across, [I would give him a] very concentrated check mark'yes' and, like, never [got] it back." GWYNETH PALTROW SAYS BEN AFFLECK WAS'EXCELLENT' IN BED COMPARED TO BRAD PITT Nikki Glaser told Gwyneth Paltrow she once tried to hook up with the actress' ex, Ben Affleck.
- Media > Television (0.57)
- Media > Film (0.57)
Taylor Sheridan's Newest Hit Is the Perfect Show for Our Times
Taylor Sheridan, the most overextended man in television, has done it again. Landman, according to the internal metrics at Paramount, is the most watched original show the streamer has ever had. Remember, Yellowstone proper is on Peacock.) The West Texas–set story, which stars Billy Bob Thornton as Tommy Norris, an all-purpose problem solver for a fictional oil company owned by Monty Miller (Jon Hamm), has also developed a bit more of a critical halo than Sheridan's other TV ventures, popping up on best-of-2024 lists, edging into mainstream discourse via podcasts that typically cover more-prestige fare, and retaining a score of 80 percent on Rotten Tomatoes. And the week before Landman wrapped up, this past Sunday night, its lead actor, Billy Bob Thornton, attended the Golden Globes as a nominee for his role in the series.
- Leisure & Entertainment (1.00)
- Media > Television (0.65)
- Energy > Oil & Gas > Upstream (0.30)
Christians more likely to be skeptical of AI, worry about technology in churches
Palantir CEO Alex Karp joins'Fox News Live' to discuss his company's innovative approach to tech development and artificial intelligence. American Christians are more likely to be skeptical about artificial intelligence and are particularly apprehensive about using generative AI in church services, according to a recent survey. Just over a quarter of Christians (28%) surveyed by Barna this fall said they were hopeful about AI development, while 39% of self-identified non-Christians said the same. Only a fraction of Christians surveyed agreed that "AI is good for the Christian Church," according to the Barna survey, conducted through a consumer research panel. Just 22% said they agreed AI would be positive for the church, while 30% strongly disagreed and 21% said they somewhat disagreed.
- North America > United States > Texas > Travis County > Austin (0.05)
- Europe > Germany > Bavaria (0.05)
Crystal-hunting DeepMind AI could help discover new wonder materials
A crystal structure predicted by the GNoME AI. It contains barium (blue), niobium (white) and oxygen (green). An artificial intelligence created by Google DeepMind may help revolutionise materials science, providing new ways to make better batteries, solar panels, computer chips and many more vital technologies. "Anytime somebody wants to improve their technology, it inevitably includes improving the materials," says Ekin Dogus Cubuk at DeepMind. "We just wanted them to have more options."
- North America > United States > California > Alameda County > Berkeley (0.07)
- Europe > United Kingdom > England > Merseyside > Liverpool (0.05)
- Europe > United Kingdom > England > Hampshire > Southampton (0.05)
Johnny Cash's 'Blank Space' Is Why AI Can't Have Nice Things
When Texas-based copywriter Dustin Ballard released a cover of Aqua's 1997 Europop hit "Barbie Girl" this summer using an AI-generated version of Johnny Cash's voice, he was surprised by its reception. "I actually expected more of a backlash," he says. Earlier this fall, when he followed up with AI Johnny Cash singing Taylor Swift's "Blank Space," the feedback was unexpectedly positive once again. "This is hauntingly beautiful," the top comment reads. "It absolutely slaps," Futurism wrote.
- Media > Music (1.00)
- Leisure & Entertainment (1.00)