people
MoCa: Measuring Human-Language Model Alignment on Causal and Moral Judgment Tasks
Human commonsense understanding of the physical and social world is organized around intuitive theories. These theories support making causal and moral judgments. When something bad happens, we naturally ask: who did what, and why? A rich literature in cognitive science has studied people's causal and moral intuitions. This work has revealed a number of factors that systematically influence people's judgments, such as the violation of norms and whether the harm is avoidable or inevitable.
CLaw: Benchmarking Chinese Legal Knowledge in Large Language Models - A Fine-grained Corpus and Reasoning Analysis
Xu, Xinzhe, Zhao, Liang, Xu, Hongshen, Chen, Chen
Large Language Models (LLMs) are increasingly tasked with analyzing legal texts and citing relevant statutes, yet their reliability is often compromised by general pre-training that ingests legal texts without specialized focus, obscuring the true depth of their legal knowledge. This paper introduces CLaw, a novel benchmark specifically engineered to meticulously evaluate LLMs on Chinese legal knowledge and its application in reasoning. CLaw comprises two key components: (1) a comprehensive, fine-grained corpus of all 306 Chinese national statutes, segmented to the subparagraph level and incorporating precise historical revision timesteps for rigorous recall evaluation (64,849 entries), and (2) a challenging set of 254 case-based reasoning instances derived from China Supreme Court curated materials to assess the practical application of legal knowledge. Our empirical evaluation reveals that most contemporary LLMs significantly struggle to faithfully reproduce legal provisions. As accurate retrieval and citation of legal provisions form the basis of legal reasoning, this deficiency critically undermines the reliability of their responses. We contend that achieving trustworthy legal reasoning in LLMs requires a robust synergy of accurate knowledge retrieval--potentially enhanced through supervised fine-tuning (SFT) or retrieval-augmented generation (RAG)--and strong general reasoning capabilities. This work provides an essential benchmark and critical insights for advancing domain-specific LLM reasoning, particularly within the complex legal sphere.
- Asia > China > Tibet Autonomous Region (0.04)
- Asia > Singapore (0.04)
- Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
- (7 more...)
- Law (1.00)
- Transportation > Marine (0.93)
- Transportation > Freight & Logistics Services > Shipping (0.68)
Ryder Cup and the People's Course
The Ryder Cup returns, and it's on a golf course like no other: Bethpage Black, also known as the People's Course. It's one of the toughest tests in golf, built by working-class New Yorkers. Samantha Johnson takes a look at the history of Bethpage Black, a course that sees history, class and identity colliding on the fairways.
- North America > United States > New York (0.27)
- Asia > Middle East > Palestine > Gaza Strip > Gaza Governorate > Gaza (0.11)
- Africa (0.11)
- (8 more...)
- Leisure & Entertainment > Sports > Golf (1.00)
- Leisure & Entertainment > Games (0.89)
- Information Technology > Game Theory (0.43)
- Information Technology > Artificial Intelligence > Games (0.40)
d35b05a832e2bb91f110d54e34e2da79-AuthorFeedback.pdf
We thank all the reviewers for their feedback! Our paper formalizes a data acquisition problem when one cannot verify the true labels of the collected data. The writing of our submission focused more on the basics to ensure clarity for the general, diverse Neurips readers. Most of the technical results were either deferred to the appendix or compressed to fit in the page limit. One of our major technical contributions is the explicit sensitivity guarantee of the peer-prediction style mechanisms.
Exploring the Technical Knowledge Interaction of Global Digital Humanities: Three-decade Evidence from Bibliometric-based perspectives
Li, Jiayi, Yan, Chengxi, Zeng, Yurong, Fang, Zhichao, Wang, Huiru
Digital Humanities (DH) is an interdisciplinary field that integrates computational methods with humanities scholarship to investigate innovative topics. Each academic discipline follows a unique developmental path shaped by the topics researchers investigate and the methods they employ. With the help of bibliometric analysis, most of previous studies have examined DH across multiple dimensions such as research hotspots, co-author networks, and institutional rankings. However, these studies have often been limited in their ability to provide deep insights into the current state of technological advancements and topic development in DH. As a result, their conclusions tend to remain superficial or lack interpretability in understanding how methods and topics interrelate in the field. To address this gap, this study introduced a new concept of Topic-Method Composition (TMC), which refers to a hybrid knowledge structure generated by the co-occurrence of specific research topics and the corresponding method. Especially by analyzing the interaction between TMCs, we can see more clearly the intersection and integration of digital technology and humanistic subjects in DH. Moreover, this study developed a TMC-based workflow combining bibliometric analysis, topic modeling, and network analysis to analyze the development characteristics and patterns of research disciplines. By applying this workflow to large-scale bibliometric data, it enables a detailed view of the knowledge structures, providing a tool adaptable to other fields.
- Research Report (0.84)
- Workflow (0.56)
It shocked the market but has China's DeepSeek changed AI?
DeepSeek's arrival also marked a turning point in the US-China AI rivalry, some experts say. "China was seen as playing catch-up in large language models until this point, with competitive models but always trailing the best western ones," policy analyst Wendy Chang of the Mercator Institute for China Studies told the BBC. A large language model (LLM) is a reasoning system trained to predict the next word in a given sentence or phrase. DeepSeek changed perceptions when it claimed to have achieved a leading model for a fraction of the computational resources and costs common among its American counterparts. OpenAI had spent 5bn ( 3.7bn) in 2024 alone.
- North America > United States (1.00)
- Asia > China > Beijing > Beijing (0.07)
Secret koala population discovered near Australian city
Breakthroughs, discoveries, and DIY tips sent every weekday. When you think of koalas (Phascolarctos cinereus), chances are that words like cute or fluffy come to mind--not cryptic or stealthy. And yet, researchers in southeastern Australia have just discovered hundreds of previously undocumented koalas living surprisingly close to the city of Newcastle. The team conducted what they claim to be the largest and most accurate peer-reviewed koala survey to date. As detailed in a study published this month in the journal Biological Conversation, the survey estimates that a population of 4,357 koalas across 166,302 acres of land is living in the state of New South Wales.
MoCa: Measuring Human-Language Model Alignment on Causal and Moral Judgment Tasks
Human commonsense understanding of the physical and social world is organized around intuitive theories. These theories support making causal and moral judgments. When something bad happens, we naturally ask: who did what, and why? A rich literature in cognitive science has studied people's causal and moral intuitions. This work has revealed a number of factors that systematically influence people's judgments, such as the violation of norms and whether the harm is avoidable or inevitable.
AppealCase: A Dataset and Benchmark for Civil Case Appeal Scenarios
Huang, Yuting, Guo, Meitong, Wu, Yiquan, Li, Ang, Liu, Xiaozhong, Yin, Keting, Sun, Changlong, Wu, Fei, Kuang, Kun
Recent advances in LegalAI have primarily focused on individual case judgment analysis, often overlooking the critical appellate process within the judicial system. Appeals serve as a core mechanism for error correction and ensuring fair trials, making them highly significant both in practice and in research. To address this gap, we present the AppealCase dataset, consisting of 10,000 pairs of real-world, matched first-instance and second-instance documents across 91 categories of civil cases. The dataset also includes detailed annotations along five dimensions central to appellate review: judgment reversals, reversal reasons, cited legal provisions, claim-level decisions, and whether there is new information in the second instance. Based on these annotations, we propose five novel LegalAI tasks and conduct a comprehensive evaluation across 20 mainstream models. Experimental results reveal that all current models achieve less than 50% F1 scores on the judgment reversal prediction task, highlighting the complexity and challenge of the appeal scenario. We hope that the AppealCase dataset will spur further research in LegalAI for appellate case analysis and contribute to improving consistency in judicial decision-making.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Asia > China > Zhejiang Province > Hangzhou (0.04)
- Asia > China > Liaoning Province (0.04)
- (11 more...)
- Overview (1.00)
- Research Report (0.82)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)