AITopics

Large language models (LLMs) are increasingly trained from AI constitutions and model specifications that establish behavioral guidelines and ethical principles. However, these specifications face critical challenges, including internal conflicts between principles and insufficient coverage of nuanced scenarios. We present a systematic methodology for stress-testing model character specifications, automatically identifying numerous cases of principle contradictions and interpretive ambiguities in current model specs. We stress test current model specs by generating scenarios that force explicit tradeoffs between competing value-based principles. Using a comprehensive taxonomy we generate diverse value tradeoff scenarios where models must choose between pairs of legitimate principles that cannot be simultaneously satisfied. We evaluate responses from twelve frontier LLMs across major providers (Anthropic, OpenAI, Google, xAI) and measure behavioral disagreement through value classification scores. Among these scenarios, we identify over 70,000 cases exhibiting significant behavioral divergence. Empirically, we show this high divergence in model behavior strongly predicts underlying problems in model specifications. Through qualitative analysis, we provide numerous example issues in current model specs such as direct contradiction and interpretive ambiguities of several principles. Additionally, our generated dataset also reveals both clear misalignment cases and false-positive refusals across all of the frontier models we study. Lastly, we also provide value prioritization patterns and differences of these models.

large language model, machine learning, natural language, (20 more...)

2510.07686

Genre: Research Report > New Finding (0.67)

Industry:

Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
Banking & Finance (1.00)
Health & Medicine > Therapeutic Area (0.68)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Shall We Play a Game? Language Models for Open-ended Wargames

Matlin, Glenn, Mahajan, Parv, Song, Isaac, Hao, Yixiong, Bard, Ryan, Topp, Stu, Montoya, Evan, Parwani, M. Rehan, Shetty, Soham, Riedl, Mark

Wargames are simulations of conflicts in which participants' decisions influence future events. While casual wargaming can be used for entertainment or socialization, serious wargaming is used by experts to explore strategic implications of decision-making and experiential learning. In this paper, we take the position that Artificial Intelligence (AI) systems, such as Language Models (LMs), are rapidly approaching human-expert capability for strategic planning -- and will one day surpass it. Military organizations have begun using LMs to provide insights into the consequences of real-world decisions during _open-ended wargames_ which use natural language to convey actions and outcomes. We argue the ability for AI systems to influence large-scale decisions motivates additional research into the safety, interpretability, and explainability of AI in open-ended wargames. To demonstrate, we conduct a scoping literature review with a curated selection of 100 unclassified studies on AI in wargames, and construct a novel ontology of open-endedness using the creativity afforded to players, adjudicators, and the novelty provided to observers. Drawing from this body of work, we distill a set of practical recommendations and critical safety considerations for deploying AI in open-ended wargames across common domains. We conclude by presenting the community with a set of high-impact open research challenges for future work.

artificial intelligence, machine learning, natural language, (17 more...)

2509.17192

Country:

North America > Canada (1.00)
Europe (1.00)
Asia (1.00)
North America > United States > California (0.67)

Genre:

Overview (1.00)
Research Report (0.82)

Industry:

Leisure & Entertainment > Games > Computer Games (1.00)
Health & Medicine (1.00)
Government > Regional Government > North America Government > United States Government (1.00)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)

Tan, Zhiyin, D'Souza, Jennifer

Toward Purpose-oriented Topic Model Evaluation enabled by Large Language Models

This study presents a framework for automated evaluation of dynamically evolving topic models using Large Language Models (LLMs). Topic modeling is essential for organizing and retrieving scholarly content in digital library systems, helping users navigate complex and evolving knowledge domains. However, widely used automated metrics, such as coherence and diversity, often capture only narrow statistical patterns and fail to explain semantic failures in practice. We introduce a purpose-oriented evaluation framework that employs nine LLM-based metrics spanning four key dimensions of topic quality: lexical validity, intra-topic semantic soundness, inter-topic structural soundness, and document-topic alignment soundness. The framework is validated through adversarial and sampling-based protocols, and is applied across datasets spanning news articles, scholarly publications, and social media posts, as well as multiple topic modeling methods and open-source LLMs. Our analysis shows that LLM-based metrics provide interpretable, robust, and task-relevant assessments, uncovering critical weaknesses in topic models such as redundancy and semantic drift, which are often missed by traditional metrics. These results support the development of scalable, fine-grained evaluation tools for maintaining topic relevance in dynamic datasets. All code and data supporting this work are accessible at https://github.com/zhiyintan/topic-model-LLMjudgment.

large language model, machine learning, natural language, (19 more...)

doi: 10.1007/s00799-025-00429-5

2509.07142

Country:

Europe > Germany (0.67)
North America > United States > California (0.28)

Genre: Research Report > New Finding (0.67)

Industry:

Law (0.46)
Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)

LiDAR, GNSS and IMU Sensor Alignment through Dynamic Time Warping to Construct 3D City Maps

Wang, Haitian, Albaqami, Hezam, Wang, Xinyu, Ibrahim, Muhammad, Malakan, Zainy M., Algamdi, Abdullah M., Alghamdi, Mohammed H., Mian, Ajmal

Abstract--LiDAR-based 3D mapping suffers from cumulative drift causing global misalignment, particularly in GNSS-constrained environments. T o address this, we propose a unified framework that fuses LiDAR, GNSS, and IMU data for high-resolution city-scale mapping. The method performs velocity-based temporal alignment using Dynamic Time Warping and refines GNSS and IMU signals via extended Kalman filtering. Local maps are built using Normal Distributions Transform-based registration and pose graph optimization with loop closure detection, while global consistency is enforced using GNSS-constrained anchors followed by fine registration of overlapping segments. We also introduce a large-scale multimodal dataset captured in Perth, Western Australia to facilitate future research in this direction. Our dataset comprises 144,000 frames acquired with a 128-channel Ouster LiDAR, synchronized RTK-GNSS trajectories, and MEMS-IMU measurements across 21 urban loops. T o assess geometric consistency, we evaluated our method using alignment metrics based on road centerlines and intersections to capture both global and local accuracy. The proposed framework reduces the average global alignment error from 3.32 m to 1.24 m, achieving a 61.4% improvement, and significantly decreases the intersection centroid offset from 13.22 m to 2.01 m, corresponding to an 84.8% enhancement. The constructed high-fidelity map and raw dataset are publicly available through IEEE Dataport and its visualization can be viewed in the provided Demo. This dataset and method together establish a new benchmark for evaluating 3D city mapping in GNSS-constrained environments, with source code available at GitHub Repository. Urbanization is rapidly transforming cities into dense and complex environments, increasing the demand for scalable infrastructure planning and maintenance [1], [2]. In this context, updated high-resolution spatial data is essential [3], [4], [5]. This work was funded by the University of Jeddah, Jeddah, Saudi Arabia, under grant No. (UJ-24-SUTU-1290).

alignment, artificial intelligence, machine learning, (18 more...)

2507.0842

Country:

Asia > Middle East > Saudi Arabia > Mecca Province > Jeddah (0.45)
Oceania > Australia > Western Australia > Perth (0.34)

Genre:

Research Report (0.64)
Overview (0.46)

Industry:

Information Technology (0.93)
Transportation > Infrastructure & Services (0.46)
Transportation > Ground > Road (0.46)
Government > Regional Government (0.46)

Technology:

Information Technology > Sensing and Signal Processing (1.00)
Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
(3 more...)

Dhulipala, Somayajulu L. N., Ray, Deep, Forman, Nicholas

Compositional Generation for Long-Horizon Coupled PDEs

arXiv.org Machine LearningOct-24-2025

Simulating coupled PDE systems is computationally intensive, and prior efforts have largely focused on training surrogates on the joint (coupled) data, which requires a large amount of data. In the paper, we study compositional diffusion approaches where diffusion models are only trained on the decoupled PDE data and are composed at inference time to recover the coupled field. Specifically, we investigate whether the compositional strategy can be feasible under long time horizons involving a large number of time steps. In addition, we compare a baseline diffusion model with that trained using the v-parameterization strategy. We also introduce a symmetric compositional scheme for the coupled fields based on the Euler scheme. We evaluate on Reaction-Diffusion and modified Burgers with longer time grids, and benchmark against a Fourier Neural Operator trained on coupled data. Despite seeing only decoupled training data, the compositional diffusion models recover coupled trajectories with low error. v-parameterization can improve accuracy over a baseline diffusion model, while the neural operator surrogate remains strongest given that it is trained on the coupled data. These results show that compositional diffusion is a viable strategy towards efficient, long-horizon modeling of coupled PDEs.

artificial intelligence, diffusion model, machine learning, (18 more...)

arXiv.org Machine Learning

2510.20141

Country: North America > United States > Maryland > Prince George's County > College Park (0.14)

Genre: Research Report > New Finding (0.49)

Industry:

Energy (0.95)
Government > Regional Government (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.70)

WIREDOct-23-2025, 23:51:38 GMT

How Hacked Card Shufflers Allegedly Enabled a Mob-Fueled Poker Scam That Rocked the NBA

WIRED recently demonstrated how to cheat at poker by hacking the Deckmate 2 card shufflers used in casinos. The mob was allegedly using the same trick to fleece victims for millions. Security researcher Joseph Tartaro demonstrates how he can insert a hacking device into a USB on the back of the shuffler that alters its code, then transmits the deck's order via Bluetooth to a phone app. The Deckmate 2 automatic card shufflers used in casinos, cardhouses, and high-end private poker games around the world are designed to shuffle a deck in seconds with perfect, computer-generated randomness, vastly speeding up play. They're also, amazingly, sold with a camera inside that can observe every card in the deck before it's dealt--a fact that's become very convenient for poker-cheating hackers and, allegedly, members of the Cosa Nostra mafia.

deckmate 2, shuffler, wired, (16 more...)

WIRED

Country:

North America > United States > Texas (0.05)
North America > United States > New York (0.05)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
(4 more...)

Industry:

Law Enforcement & Public Safety (1.00)
Law (1.00)
Information Technology > Security & Privacy (1.00)
(2 more...)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.48)
Information Technology > Communications > Mobile (0.35)

WIREDOct-23-2025, 22:48:36 GMT

Trump's Investment in Intel Is Paying Off

Trump's Investment in Intel Is Paying Off The chipmaker reported higher than expected revenue on Thursday, and its stock price has risen over 90 percent since August. The Trump administration's investment in Intel appears to be paying off so far, but the once-mighty chipmaker still has a long way to climb back to industry dominance. In August, the US government announced it was converting about $9 billion in federal grants that Intel had been issued during the Biden administration into a roughly 10 percent equity stake in the company. During its third-quarter earnings on Thursday--its first financial update since Trump's surprise investment--Intel reported that it earned $13.7 billion in revenue over the past three months, a three percent increase year-over-year. It's the fourth consecutive quarter that Intel has beat revenue guidance.

customer, intel, trump, (16 more...)

WIRED

Country:

North America > United States > California > Santa Clara County > San Jose (0.05)
North America > United States > Arizona > Maricopa County > Chandler (0.05)
Europe > Slovakia (0.05)
(3 more...)

Industry: Government > Regional Government > North America Government > United States Government (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.49)

What Americans fear most in 2025

For over a decade, Americans' top fear has remained the same: corrupt government officials. Breakthroughs, discoveries, and DIY tips sent every weekday. Team Fear is at it again. For the past 11 years, this dedicated group of researchers with a very cool nickname has conducted the annual Chapman University Survey of American Fears . This year, they surveyed 1,015 adult Americans on what they fear most, from sharks to heights to identity theft .

financial collapse, laura baisa, service and privacy policy, (12 more...)

Popular Science

Country:

North America > United States > California (0.05)
Europe > Russia (0.05)
Asia > Russia (0.05)

Genre:

Research Report (0.37)
Questionnaire & Opinion Survey (0.36)

Industry:

Law Enforcement & Public Safety (1.00)
Health & Medicine > Therapeutic Area (1.00)
Government (1.00)
Information Technology (0.92)

Technology: Information Technology > Artificial Intelligence (0.52)

Al JazeeraOct-23-2025, 18:29:49 GMT

Russia launches barrage of drone strikes across Ukraine

How much of Europe's oil still comes from Russia? Russia launched dozens of drones and decoy drones across Ukrainian territory, including one that hit a school building in Kyiv. Marco Rubio says implementing Gaza peace deal is'top priority' for Trump Body of'breadwinner' Thai captive held in Gaza returned home Displaced Palestinians forced to live in Gaza's graveyards

drone strike, russia launch barrage, video duration 01, (5 more...)

Al Jazeera

Country:

Asia > Russia (1.00)
Europe > Russia (0.93)
Asia > Middle East > Palestine > Gaza Strip > Gaza Governorate > Gaza (0.75)
(8 more...)

Industry:

Government > Military (0.74)
Government > Regional Government > North America Government > United States Government (0.55)
Information Technology > Robotics & Automation (0.45)

Technology: Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles > Drones (0.45)

FOX NewsOct-23-2025, 18:14:05 GMT

Russia violates NATO airspace in Lithuania amid Putin warning on long-range missiles

President Gitanas Nausėda called Russian aircraft incursion into Lithuanian territory a blatant breach of international law as NATO jets responded quickly.

airspace, lithuania, long-range missile, (8 more...)

FOX News

Country:

Asia > Russia (1.00)
Europe > Russia (0.46)
Europe > Ukraine (0.17)
(6 more...)

Industry:

Media (1.00)
Leisure & Entertainment > Sports (1.00)
Law (1.00)
(4 more...)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles > Drones (0.47)