jurisdiction
- Europe > United Kingdom > Wales (0.06)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.05)
- Europe > United Kingdom > Scotland (0.04)
- (11 more...)
- Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.70)
Senators Urge Top Regulator to Stay Out of Prediction Market Lawsuits
As prediction market platforms like Polymarket and Kalshi battle regulators in court, Senate Democrats are urging the CFTC to avoid weighing in, escalating a broader fight over the burgeoning industry. Senator Adam Schiff, a Democrat from California, is leading the group of lawmakers urging the CFTC to stay out of state prediction market lawsuits. A group of 23 Democratic US senators sent a letter Friday to the top federal regulator overseeing prediction markets, urging the agency to avoid weighing in on pending court cases over the legality of offerings on the platforms tied to "sports, war, and other prohibited events." Prediction markets, which sell contracts tied to the outcome of real-world developments, have exploded in popularity over the past year, attracting an increasingly mainstream fanbase eager to wager on everything from geopolitical conflicts to fashion choices to the Super Bowl. As they expanded, the platforms have become a magnet for ethical and legal controversies.
- North America > United States > California (0.37)
- North America > United States > New York (0.05)
- North America > United States > Minnesota (0.05)
- (6 more...)
- Law > Litigation (1.00)
- Government > Regional Government > North America Government > United States Government (1.00)
- Banking & Finance > Trading (1.00)
PRBench: Large-Scale Expert Rubrics for Evaluating High-Stakes Professional Reasoning
Akyürek, Afra Feyza, Gosai, Advait, Zhang, Chen Bo Calvin, Gupta, Vipul, Jeong, Jaehwan, Gunjal, Anisha, Rabbani, Tahseen, Mazzone, Maria, Randolph, David, Meymand, Mohammad Mahmoudi, Chattha, Gurshaan, Rodriguez, Paula, Mares, Diego, Singh, Pavit, Liu, Michael, Chawla, Subodh, Cline, Pete, Ogaz, Lucy, Hernandez, Ernesto, Wang, Zihao, Bhatter, Pavi, Ayestaran, Marcos, Liu, Bing, He, Yunzhong
Frontier model progress is often measured by academic benchmarks, which offer a limited view of performance in real-world professional contexts. Existing evaluations often fail to assess open-ended, economically consequential tasks in high-stakes domains like Legal and Finance, where practical returns are paramount. To address this, we introduce Professional Reasoning Bench (PRBench), a realistic, open-ended, and difficult benchmark of real-world problems in Finance and Law. We open-source its 1,100 expert-authored tasks and 19,356 expert-curated criteria, making it, to our knowledge, the largest public, rubric-based benchmark for both legal and finance domains. We recruit 182 qualified professionals, holding JDs, CFAs, or 6+ years of experience, who contributed tasks inspired by their actual workflows. This process yields significant diversity, with tasks spanning 114 countries and 47 US jurisdictions. Our expert-curated rubrics are validated through a rigorous quality pipeline, including independent expert validation. Subsequent evaluation of 20 leading models reveals substantial room for improvement, with top scores of only 0.39 (Finance) and 0.37 (Legal) on our Hard subsets. We further catalog associated economic impacts of the prompts and analyze performance using human-annotated rubric categories. Our analysis shows that models with similar overall scores can diverge significantly on specific capabilities. Common failure modes include inaccurate judgments, a lack of process transparency and incomplete reasoning, highlighting critical gaps in their reliability for professional adoption.
- North America > United States > California (0.04)
- North America > Canada > Alberta > Census Division No. 13 > Westlock County (0.04)
- North America > Canada > Alberta > Census Division No. 11 > Sturgeon County (0.04)
- (2 more...)
- Banking & Finance (1.00)
- Health & Medicine > Government Relations & Public Policy (0.67)
- Law > Litigation (0.46)
- Government > Regional Government > North America Government > United States Government (0.46)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
- Europe > Germany (0.28)
- North America > Canada > British Columbia (0.04)
- North America > United States > Virginia (0.04)
- (18 more...)
- Research Report > New Finding (0.67)
- Research Report > Experimental Study (0.67)
- Law > Statutes (1.00)
- Law > Litigation (1.00)
- Law > International Law (1.00)
- (10 more...)
Policy Cards: Machine-Readable Runtime Governance for Autonomous AI Agents
Policy Cards are introduced as a machine-readable, deployment-layer standard for expressing operational, regulatory, and ethical constraints for AI agents. The Policy Card sits with the agent and enables it to follow required constraints at runtime. It tells the agent what it must and must not do. As such, it becomes an integral part of the deployed agent. Policy Cards extend existing transparency artifacts such as Model, Data, and System Cards by defining a normative layer that encodes allow/deny rules, obligations, evidentiary requirements, and crosswalk mappings to assurance frameworks including NIST AI RMF, ISO/IEC 42001, and the EU AI Act. Each Policy Card can be validated automatically, version-controlled, and linked to runtime enforcement or continuous-audit pipelines. The framework enables verifiable compliance for autonomous agents, forming a foundation for distributed assurance in multi-agent ecosystems. Policy Cards provide a practical mechanism for integrating high-level governance with hands-on engineering practice and enabling accountable autonomy at scale.
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > Maryland > Montgomery County > Gaithersburg (0.04)
- Law (1.00)
- Information Technology > Security & Privacy (1.00)
- Health & Medicine (1.00)
- Government (1.00)
Ask What Your Country Can Do For You: Towards a Public Red Teaming Model
Kennedy, Wm. Matthew, Patlak, Cigdem, Dave, Jayraj, Chambers, Blake, Dhanotiya, Aayush, Ramiah, Darshini, Schwartz, Reva, Hagen, Jack, Kundu, Akash, Pendharkar, Mouni, Baisley, Liam, Skeadas, Theodora, Chowdhury, Rumman
AI systems have the potential to produce both benefits and harms, but without rigorous and ongoing adversarial evaluation, AI actors will struggle to assess the breadth and magnitude of the AI risk surface. Researchers from the field of systems design have developed several effective sociotechnical AI evaluation and red teaming techniques targeting bias, hate speech, mis/disinformation, and other documented harm classes. However, as increasingly sophisticated AI systems are released into high-stakes sectors (such as education, healthcare, and intelligence-gathering), our current evaluation and monitoring methods are proving less and less capable of delivering effective oversight. In order to actually deliver responsible AI and to ensure AI's harms are fully understood and its security vulnerabilities mitigated, pioneering new approaches to close this "responsibility gap" are now more urgent than ever. In this paper, we propose one such approach, the cooperative public AI red-teaming exercise, and discuss early results of its prior pilot implementations. This approach is intertwined with CAMLIS itself: the first in-person public demonstrator exercise was held in conjunction with CAMLIS 2024. We review the operational design and results of this exercise, the prior National Institute of Standards and Technology (NIST)'s Assessing the Risks and Impacts of AI (ARIA) pilot exercise, and another similar exercise conducted with the Singapore Infocomm Media Development Authority (IMDA). Ultimately, we argue that this approach is both capable of delivering meaningful results and is also scalable to many AI developing jurisdictions.
- Asia > Singapore (0.36)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
- North America > United States > Wisconsin > Eau Claire County > Eau Claire (0.04)
- (6 more...)
- Europe > United Kingdom > Wales (0.06)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.05)
- Europe > United Kingdom > Scotland (0.04)
- (11 more...)
L-MARS: Legal Multi-Agent Workflow with Orchestrated Reasoning and Agentic Search
We present L-MARS (Legal Multi-Agent Workflow with Orchestrated Reasoning and Agentic Search), a system that reduces hallucination and uncertainty in legal question answering through coordinated multi-agent reasoning and retrieval. Unlike single-pass retrieval-augmented generation (RAG), L-MARS decomposes queries into subproblems, issues targeted searches across heterogeneous sources (Serper web, local RAG, CourtListener case law), and employs a Judge Agent to verify sufficiency, jurisdiction, and temporal validity before answer synthesis. This iterative reasoning-search-verification loop maintains coherence, filters noisy evidence, and grounds answers in authoritative law. We evaluated L-MARS on LegalSearchQA, a new benchmark of 200 up-to-date multiple choice legal questions in 2025. Results show that L-MARS substantially improves factual accuracy, reduces uncertainty, and achieves higher preference scores from both human experts and LLM-based judges. Our work demonstrates that multi-agent reasoning with agentic search offers a scalable and reproducible blueprint for deploying LLMs in high-stakes domains requiring precise legal retrieval and deliberation.
- Asia > China (0.04)
- North America > United States > California > San Diego County > San Diego (0.04)
- Workflow (0.86)
- Research Report > New Finding (0.34)
- Government > Regional Government > North America Government > United States Government (0.95)
- Law > Statutes (0.68)
OLG++: A Semantic Extension of Obligation Logic Graph
Dasgupta, Subhasis, Stephens, Jon, Gupta, Amarnath
We present OLG++, a semantic extension of the Obligation Logic Graph (OLG) for modeling regulatory and legal rules in municipal and interjurisdictional contexts. OLG++ introduces richer node and edge types, including spatial, temporal, party group, defeasibility, and logical grouping constructs, enabling nuanced representations of legal obligations, exceptions, and hierarchies. The model supports structured reasoning over rules with contextual conditions, precedence, and complex triggers. We demonstrate its expressiveness through examples from food business regulations, showing how OLG++ supports legal question answering using property graph queries. OLG++ also improves over LegalRuleML by providing native support for subClassOf, spatial constraints, and reified exception structures. Our examples show that OLG++ is more expressive than prior graph-based models for legal knowledge representation.
- North America > United States > California > San Diego County > San Diego (0.06)
- North America > United States > California > San Diego County > La Jolla (0.05)
- North America > United States > New York > New York County > New York City (0.04)
- (2 more...)
IndianBailJudgments-1200: A Multi-Attribute Dataset for Legal NLP on Indian Bail Orders
Deshmukh, Sneha, Kamble, Prathmesh
Legal NLP remains underdeveloped in regions like India due to the scarcity of structured datasets. We introduce IndianBailJudgments-1200, a new benchmark dataset comprising 1200 Indian court judgments on bail decisions, annotated across 20+ attributes including bail outcome, IPC sections, crime type, and legal reasoning. Annotations were generated using a prompt-engineered GPT-4o pipeline and verified for consistency. This resource supports a wide range of legal NLP tasks such as outcome prediction, summarization, and fairness analysis, and is the first publicly available dataset focused specifically on Indian bail jurisprudence.
- North America > United States (0.28)
- Europe (0.04)
- Asia > India > Uttar Pradesh (0.04)
- Asia > India > Maharashtra (0.04)
- Law > Criminal Law (1.00)
- Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.68)