Temporal Reasoning
Deterministic Legal Agents: A Canonical Primitive API for Auditable Reasoning over Temporal Knowledge Graphs
For autonomous legal agents to operate safely in high-stakes domains, they require a foundation of absolute determinism and auditability-guarantees that standard Retrieval-Augmented Generation (RAG) frameworks cannot provide. When interacting with temporal knowledge graphs that model the complex evolution of legal norms, agents must navigate versioning, causality, and hierarchical structures with precision, a task for which black-box vector search is ill-suited. This paper introduces a new architectural pattern to solve this: a formal Primitive API designed as a secure execution layer for reasoning over such graphs. Instead of a monolithic query engine, our framework provides a library of canonical primitives-atomic, composable, and auditable primitives. This design empowers planner-guided agents to decompose complex legal questions into transparent execution plans, enabling critical tasks with full verifiability, including: (i) precise point-in-time version retrieval, (ii) robust causal lineage tracing, and (iii) context-aware hybrid search. Ultimately, this architecture transforms opaque retrieval into auditable reasoning, turning the agent's internal process from a black box into a verifiable log of deterministic primitives and providing a blueprint for building the next generation of trustworthy legal AI.
- Information Technology > Security & Privacy (0.68)
- Law > Statutes (0.67)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (0.68)
- Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.68)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.68)
- (5 more...)
Temporal Knowledge Graph Hyperedge Forecasting: Exploring Entity-to-Category Link Prediction
Markai, Edward, Molavipour, Sina
Temporal Knowledge Graphs have emerged as a powerful way of not only modeling static relationships between entities but also the dynamics of how relations evolve over time. As these informational structures can be used to store information from a real-world setting, such as a news flow, predicting future graph components to a certain extent equates predicting real-world events. Most of the research in this field focuses on embedding-based methods, often leveraging convolutional neural net architectures. These solutions act as black boxes, limiting insight. In this paper, we explore an extension to an established rule-based framework, TLogic, that yields a high accuracy in combination with explainable predictions. This offers transparency and allows the end-user to critically evaluate the rules applied at the end of the prediction stage. The new rule format incorporates entity category as a key component with the purpose of limiting rule application only to relevant entities. When categories are unknown for building the graph, we propose a data-driven method to generate them with an LLM-based approach. Additionally, we investigate the choice of aggregation method for scores of retrieved entities when performing category prediction.
- Europe > Sweden > Stockholm > Stockholm (0.40)
- North America > United States (0.28)
- Asia > India (0.04)
- (11 more...)
- Law (1.00)
- Banking & Finance (1.00)
- Health & Medicine (0.93)
- (3 more...)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.88)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (0.87)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Temporal Reasoning (0.71)
- Asia > China > Liaoning Province > Shenyang (0.40)
- North America > Canada > Quebec > Montreal (0.14)
- North America > United States > New Jersey (0.04)
- (8 more...)
- Law (1.00)
- Government (1.00)
- Information Technology > Security & Privacy (0.93)
- Leisure & Entertainment (0.67)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Temporal Reasoning (0.65)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
- North America > United States (0.67)
- Europe > France (0.28)
- Asia > Middle East > Republic of Türkiye (0.14)
- (45 more...)
- Law (0.93)
- Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.67)
- Government > Military (0.67)
- Government > Regional Government > North America Government > United States Government (0.46)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Temporal Reasoning (0.51)
- Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.47)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Semantic Networks (0.42)
TIME: A Multi-level Benchmark for Temporal Reasoning of LLMs in Real-World Scenarios
Wei, Shaohang, Li, Wei, Song, Feifan, Luo, Wen, Zhuang, Tianyi, Tan, Haochen, Guo, Zhijiang, Wang, Houfeng
Temporal reasoning is pivotal for Large Language Models (LLMs) to comprehend the real world. However, existing works neglect the real-world challenges for temporal reasoning: (1) intensive temporal information, (2) fast-changing event dynamics, and (3) complex temporal dependencies in social interactions. To bridge this gap, we propose a multi-level benchmark TIME, designed for temporal reasoning in real-world scenarios. TIME consists of 38,522 QA pairs, covering 3 levels with 11 fine-grained sub-tasks. This benchmark encompasses 3 sub-datasets reflecting different real-world challenges: TIME-Wiki, TIME-News, and TIME-Dial. We conduct extensive experiments on reasoning models and non-reasoning models. And we conducted an in-depth analysis of temporal reasoning performance across diverse real-world scenarios and tasks, and summarized the impact of test-time scaling on temporal reasoning capabilities. Additionally, we release TIME-Lite, a human-annotated subset to foster future research and standardized evaluation in temporal reasoning. The code is available at https://github.com/sylvain-wei/TIME , the dataset is available at https://huggingface.co/datasets/SylvainWei/TIME , and the project page link is https://sylvain-wei.github.io/TIME/ .
- Asia > Middle East > Israel (0.14)
- Europe > Austria > Vienna (0.14)
- Asia > India (0.05)
- (20 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Temporal Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)
Fast Multivariate Spatio-temporal Analysis via Low Rank Tensor Learning Mohammad T aha Bahadori
Accurate and efficient analysis of multivariate spatio-temporal data is critical in climatology, geology, and sociology applications. Existing models usually assume simple inter-dependence among variables, space, and time, and are computationally expensive. We propose a unified low rank tensor learning framework for multivariate spatio-temporal analysis, which can conveniently incorporate different properties in spatio-temporal data, such as spatial clustering and shared structure among variables. We demonstrate how the general framework can be applied to cokriging and forecasting tasks, and develop an efficient greedy algorithm to solve the resulting optimization problem with convergence guarantee. We conduct experiments on both synthetic datasets and real application datasets to demonstrate that our method is not only significantly faster than existing methods but also achieves lower estimation error.
- North America > United States > California > Los Angeles County > Los Angeles (0.29)
- Pacific Ocean (0.04)
- North America > United States > Rocky Mountains (0.04)
- (7 more...)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Temporal Reasoning (0.62)
NuRisk: A Visual Question Answering Dataset for Agent-Level Risk Assessment in Autonomous Driving
Gao, Yuan, Piccinini, Mattia, Brusnicki, Roberto, Zhang, Yuchen, Betz, Johannes
Understanding risk in autonomous driving requires not only perception and prediction, but also high-level reasoning about agent behavior and context. Current Vision Language Models (VLMs)-based methods primarily ground agents in static images and provide qualitative judgments, lacking the spatio-temporal reasoning needed to capture how risks evolve over time. To address this gap, we propose NuRisk, a comprehensive Visual Question Answering (VQA) dataset comprising 2,900 scenarios and 1.1 million agent-level samples, built on real-world data from nuScenes and Waymo, supplemented with safety-critical scenarios from the CommonRoad simulator. The dataset provides Bird-Eye-View (BEV) based sequential images with quantitative, agent-level risk annotations, enabling spatio-temporal reasoning. We benchmark well-known VLMs across different prompting techniques and find that they fail to perform explicit spatio-temporal reasoning, resulting in a peak accuracy of 33% at high latency. To address these shortcomings, our fine-tuned 7B VLM agent improves accuracy to 41% and reduces latency by 75%, demonstrating explicit spatio-temporal reasoning capabilities that proprietary models lacked. While this represents a significant step forward, the modest accuracy underscores the profound challenge of the task, establishing NuRisk as a critical benchmark for advancing spatio-temporal reasoning in autonomous driving.
- Transportation > Ground > Road (1.00)
- Information Technology > Robotics & Automation (1.00)
- Automobiles & Trucks (1.00)
- Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Temporal Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
Contrastive Representations for Temporal Reasoning
Ziarko, Alicja, Bortkiewicz, Michal, Zawalski, Michal, Eysenbach, Benjamin, Milos, Piotr
In classical AI, perception relies on learning state-based representations, while planning, which can be thought of as temporal reasoning over action sequences, is typically achieved through search. We study whether such reasoning can instead emerge from representations that capture both perceptual and temporal structure. We show that standard temporal contrastive learning, despite its popularity, often fails to capture temporal structure due to its reliance on spurious features. To address this, we introduce Combinatorial Representations for Temporal Reasoning (CRTR), a method that uses a negative sampling scheme to provably remove these spurious features and facilitate temporal reasoning. CRTR achieves strong results on domains with complex temporal structure, such as Sokoban and Rubik's Cube. In particular, for the Rubik's Cube, CRTR learns representations that generalize across all initial states and allow it to solve the puzzle using fewer search steps than BestFS, though with longer solutions. To our knowledge, this is the first method that efficiently solves arbitrary Cube states using only learned representations, without relying on an external search algorithm.
- Europe > Poland > Masovia Province > Warsaw (0.04)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- (7 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Temporal Reasoning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Date Fragments: A Hidden Bottleneck of Tokenization for Temporal Reasoning
Bhatia, Gagan, Peyrard, Maxime, Zhao, Wei
Modern BPE tokenizers often split calendar dates into meaningless fragments, e.g., 20250312 $\rightarrow$ 202, 503, 12, inflating token counts and obscuring the inherent structure needed for robust temporal reasoning. In this work, we (1) introduce a simple yet interpretable metric, termed date fragmentation ratio, that measures how faithfully a tokenizer preserves multi-digit date components; (2) release DateAugBench, a suite of 6500 examples spanning three temporal reasoning tasks: context-based date resolution, format-invariance puzzles, and date arithmetic across historical, contemporary, and future time periods; and (3) through layer-wise probing and causal attention-hop analyses, uncover an emergent date-abstraction mechanism whereby large language models stitch together the fragments of month, day, and year components for temporal reasoning. Our experiments show that excessive fragmentation correlates with accuracy drops of up to 10 points on uncommon dates like historical and futuristic dates. Further, we find that the larger the model, the faster the emergent date abstraction that heals date fragments is accomplished. Lastly, we observe a reasoning path that LLMs follow to assemble date fragments, typically differing from human interpretation (year $\rightarrow$ month $\rightarrow$ day). Our datasets and code are made publicly available \href{https://github.com/gagan3012/date-fragments}{here}.
- North America > United States > Maryland (0.04)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- Europe > France > Auvergne-Rhône-Alpes > Isère > Grenoble (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Temporal Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.50)
Full-History Graphs with Edge-Type Decoupled Networks for Temporal Reasoning
Mohammed, Osama, Pan, Jiaxin, Nayyeri, Mojtaba, Hernández, Daniel, Staab, Steffen
Modeling evolving interactions among entities is critical in many real-world tasks. For example, predicting driver maneuvers in traffic requires tracking how neighboring vehicles accelerate, brake, and change lanes relative to one another over consecutive frames. Likewise, detecting financial fraud hinges on following the flow of funds through successive transactions as they propagate through the network. Unlike classic time-series forecasting, these settings demand reasoning over who interacts with whom and when, calling for a temporal-graph representation that makes both the relations and their evolution explicit. Existing temporal-graph methods typically use snapshot graphs to encode temporal evolution. We introduce a full-history graph that instantiates one node for every entity at every time step and separates two edge sets: (i) intra-time-step edges that capture relations within a single frame and (ii) inter-time-step edges that connect an entity to itself at consecutive steps. To learn on this graph we design an Edge-Type Decoupled Network (ETDNet) with parallel modules: a graph-attention module aggregates information along intra-time-step edges, a multi-head temporal-attention module attends over an entity's inter-time-step history, and a fusion module combines the two messages after every layer. Evaluated on driver-intention prediction (Waymo) and Bitcoin fraud detection (Elliptic++), ETDNet consistently surpasses strong baselines, lifting Waymo joint accuracy to 75.6\% (vs. 74.1\%) and raising Elliptic++ illicit-class F1 to 88.1\% (vs. 60.4\%). These gains demonstrate the benefit of representing structural and temporal relations as distinct edges in a single graph.
- Europe > Germany > Baden-Württemberg > Stuttgart Region > Stuttgart (0.04)
- North America > United States (0.04)
- Europe > United Kingdom > England > Hampshire > Southampton (0.04)
- Asia (0.04)
- Law Enforcement & Public Safety > Fraud (0.35)
- Banking & Finance > Trading (0.34)
- Information Technology > Data Science > Data Mining (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.70)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Temporal Reasoning (0.50)