jenning
PutnamBench: Evaluating Neural Theorem-Provers on the Putnam Mathematical Competition
We present PutnamBench, a new multi-language benchmark for evaluating the ability of neural theorem-provers to solve competition mathematics problems. PutnamBench consists of 1692 hand-constructed formalizations of 640 theorems sourced from the William Lowell Putnam Mathematical Competition, the premier undergraduate-level mathematics competition in North America. All the problems have formalizations in Lean 4 and Isabelle; a substantial subset also has Coq formalizations. PutnamBench requires significant problem-solving ability and proficiency in a broad range of topics taught in undergraduate mathematics courses. We use PutnamBench to evaluate several established neural and symbolic theorem-provers. These approaches can only solve a handful of the PutnamBench problems, establishing the benchmark as a difficult open challenge for research on neural theorem-proving.
Scandal rocks international stone skipping contest
'Nefarious deeds' couldn't keep Jon Jennings from winning the World Stone Skimming Competition. Stones are measured at the World Stone Skimming Championships, held on Easdale Island on September 25, 2016 in Easdale, Seil, Scotland. The championships marking its 20th year are held on the last Sunday in September each year on Easdale, which is the smallest inhabited island of the Inner Hebrides. Breakthroughs, discoveries, and DIY tips sent every weekday. Yet another scandal has been reported in the international sports world .
- Europe > United Kingdom > Scotland (0.26)
- North America > United States > Kentucky (0.05)
- North America > United States > Alaska (0.05)
em Jeopardy! /em 's Most Infamous Moment Haunted the Show's Fans, Its Stars, and Even Alex Trebek. It's Clear Why Now.
's most controversial moment was years in the making. It took many more for the fallout to come into full view. One morning in 2010, Alex Trebek walked onto the IBM campus not far outside New York City and prepared to inspect what would become the most unusual player in's history. The trip, clear across the country from the show's Culver City set, had been carefully planned. David Ferrucci, a computer scientist at IBM, had spent years leading a team to develop what would become the first and, so far, last nonhuman ever to compete on Longtime host Trebek would watch three practice games played with "Watson," as the system was named, and two human contestants. Then the team would be taken to lunch nearby, and Trebek would ultimately take the stage and host two more Watson practice games himself. By then the preparations for a future televised contest with IBM's creation were well underway, but this was the first time Trebek would encounter the technology in person, and his approval was crucial. Ferrucci was eager to show off one element in particular: the display, which had been rigged to show Watson's top three guesses whenever it answered, along with the numerical confidence rate it had in each one. For Ferrucci, this feature was central to demonstrating the computer's language-processing capabilities, because it showed that Watson wasn't just spitting out answers--it was reasoning. If Watson were ever going to be deployed to industries like health care, its human users wouldn't just want to know its best guess. It would be infinitely more valuable to know if Watson was 95 percent confident or just 30 percent, and whether those confidence levels were in line with its actual accuracy rate. It also made for better viewing. Ferrucci had brought his young daughter to the lab earlier in the process and showed her Watson as it played against human opponents. When Watson declined to ring in, Ferrucci's daughter turned to him and asked if the computer had crashed. He struggled to explain that it hadn't--it just wasn't confident enough to hazard a guess.
- North America > United States > California > Los Angeles County > Culver City (0.24)
- North America > United States > New York > Westchester County (0.04)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- (3 more...)
- Leisure & Entertainment > Games > Chess (0.68)
- Leisure & Entertainment > Games > Jeopardy! (0.64)
PutnamBench: Evaluating Neural Theorem-Provers on the Putnam Mathematical Competition
We present PutnamBench, a new multi-language benchmark for evaluating the ability of neural theorem-provers to solve competition mathematics problems. PutnamBench consists of 1692 hand-constructed formalizations of 640 theorems sourced from the William Lowell Putnam Mathematical Competition, the premier undergraduate-level mathematics competition in North America. All the problems have formalizations in Lean 4 and Isabelle; a substantial subset also has Coq formalizations. PutnamBench requires significant problem-solving ability and proficiency in a broad range of topics taught in undergraduate mathematics courses. We use PutnamBench to evaluate several established neural and symbolic theorem-provers.
'Jeopardy' host Ken Jennings 'deeply skeptical' of AI, years after losing to supercomputer
"Jeopardy!" host Ken Jennings tells Fox News Digital he wants to know a human is behind any creative projects, not AI. "I'm deeply skeptical of AI," Jennings told Fox News Digital at the TCM Classic Film Festival. "Obviously, these current iterations of LLMs [Large Language Models] would clean Watson's clock at'Jeopardy!' The technology has moved on. I've played with chatbots and'Jeopardy!' clues, and they're very hard to stump," he said.
- North America > United States > Illinois > Cook County > Chicago (0.05)
- North America > Canada > Ontario > Toronto (0.05)
Navigating Ethical Challenges in Generative AI-Enhanced Research: The ETHICAL Framework for Responsible Generative AI Use
Eacersall, Douglas, Pretorius, Lynette, Smirnov, Ivan, Spray, Erika, Illingworth, Sam, Chugh, Ritesh, Strydom, Sonja, Stratton-Maher, Dianne, Simmons, Jonathan, Jennings, Isaac, Roux, Rian, Kamrowski, Ruth, Downie, Abigail, Thong, Chee Ling, Howell, Katharine A.
The rapid adoption of generative artificial intelligence (GenAI) in research presents both opportunities and ethical challenges that should be carefully navigated. Although GenAI tools can enhance research efficiency through automation of tasks such as literature review and data analysis, their use raises concerns about aspects such as data accuracy, privacy, bias, and research integrity. This paper develops the ETHICAL framework, which is a practical guide for responsible GenAI use in research. Employing a constructivist case study examining multiple GenAI tools in real research contexts, the framework consists of seven key principles: 'Examine policies and guidelines', 'Think about social impacts', 'Harness understanding of the technology', 'Indicate use', 'Critically engage with outputs', 'Access secure versions', and'Look at user agreements'. Applying these principles will enable researchers to uphold research integrity while leveraging GenAI's benefits. The framework addresses a critical gap between awareness of ethical issues and practical action steps, providing researchers with concrete guidance for ethical GenAI integration. This work has implications for research practice, institutional policy development, and the broader academic community while adapting to an AI-enhanced research landscape. The ETHICAL framework can serve as a foundation for developing AI literacy in academic settings and promoting responsible innovation in research methodologies.
- Oceania > New Zealand (0.14)
- North America > Canada > Alberta (0.14)
- Oceania > Australia > Queensland (0.04)
- (7 more...)
- Research Report (1.00)
- Overview (1.00)
- Social Sector (1.00)
- Law > Statutes (1.00)
- Information Technology > Security & Privacy (1.00)
- (4 more...)
Faster Optimal Coalition Structure Generation via Offline Coalition Selection and Graph-Based Search
Taguelmimt, Redha, Aknine, Samir, Boukredera, Djamila, Changder, Narayan, Sandholm, Tuomas
Coalition formation is a key capability in multi-agent systems. An important problem in coalition formation is coalition structure generation: partitioning agents into coalitions to optimize the social welfare. This is a challenging problem that has been the subject of active research for the past three decades. In this paper, we present a novel algorithm, SMART, for the problem based on a hybridization of three innovative techniques. Two of these techniques are based on dynamic programming, where we show a powerful connection between the coalitions selected for evaluation and the performance of the algorithms. These algorithms use offline phases to optimize the choice of coalitions to evaluate. The third one uses branch-and-bound and integer partition graph search to explore the solution space. Our techniques bring a new way of approaching the problem and a new level of precision to the field. In experiments over several common value distributions, we show that the hybridization of these techniques in SMART is faster than the fastest prior algorithms (ODP-IP, BOSS) in generating optimal solutions across all the value distributions.
- Africa > Middle East > Algeria > Béjaïa Province > Béjaïa (0.04)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- Europe > Middle East > Malta > Port Region > Southern Harbour District > Floriana (0.04)
- (2 more...)
America Forgot About IBM Watson. Is ChatGPT Next?
In early 2011, Ken Jennings looked like humanity's last hope. Watson, an artificial intelligence created by the tech giant IBM, had picked off lesser Jeopardy players before the show's all-time champ entered a three-day exhibition match. At the end of the first game, Watson--a machine the size of 10 refrigerators--had Jennings on the ropes, leading $35,734 to $4,800. On day three, Watson finished the job. "I for one welcome our new computer overlords," Jennings wrote on his video screen during Final Jeopardy. Watson was better than any previous AI at addressing a problem that had long stumped researchers: How do you get a computer to precisely understand a clue posed in idiomatic English and then spit out the correct answer (or, as in Jeopardy, the right question)?
- North America > United States > New York (0.05)
- North America > United States > California (0.05)
- Information Technology (1.00)
- Leisure & Entertainment > Sports (0.69)
- Leisure & Entertainment > Games (0.55)
'Jeopardy!' contestant torn apart by fans after huge mistake: 'Such a buffoon'
'Gutfeld!' guests discuss a Jeopardy question that used alleged murderer Brian Laundrie as the clue. A "Jeopardy!" contestant is going viral this week after making what many fans are considering one of the biggest blunders in the show's history. On Wednesday's episode, a woman named Karen had a huge lead over the other two contestants as they neared the end of the second round – she had earned $21,800, while her competitors had earned $7,100 and $6,400. When there were only a few clues left on the Double Jeopardy board, Karen found a Daily Double in the "Hans, Solo" category. If she had made a modest bet, she would have been sure to win the entire game after Final Jeopardy, as the other players couldn't possibly catch up to her lead.
- Media (0.78)
- Leisure & Entertainment > Games > Jeopardy! (0.75)
BookSum: A Collection of Datasets for Long-form Narrative Summarization
Kryściński, Wojciech, Rajani, Nazneen, Agarwal, Divyansh, Xiong, Caiming, Radev, Dragomir
The majority of available text summarization datasets include short-form source documents that lack long-range causal and temporal dependencies, and often contain strong layout and stylistic biases. While relevant, such datasets will offer limited challenges for future generations of text summarization systems. We address these issues by introducing BookSum, a collection of datasets for long-form narrative summarization. Our dataset covers source documents from the literature domain, such as novels, plays and stories, and includes highly abstractive, human written summaries on three levels of granularity of increasing difficulty: paragraph-, chapter-, and book-level. The domain and structure of our dataset poses a unique set of challenges for summarization systems, which include: processing very long documents, non-trivial causal and temporal dependencies, and rich discourse structures. To facilitate future work, we trained and evaluated multiple extractive and abstractive summarization models as baselines for our dataset.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Europe > United Kingdom > England (0.04)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- (13 more...)
- Law (0.67)
- Health & Medicine (0.46)