Goto

Collaborating Authors

 Bremerhaven




Repurposing Synthetic Data for Fine-grained Search Agent Supervision

Zhao, Yida, Li, Kuan, Wu, Xixi, Zhang, Liwen, Zhang, Dingchu, Li, Baixuan, Song, Maojia, Chen, Zhuo, Wang, Chenxi, Wang, Xinyu, Tu, Kewei, Xie, Pengjun, Zhou, Jingren, Jiang, Yong

arXiv.org Artificial Intelligence

LLM-based search agents are increasingly trained on entity-centric synthetic data to solve complex, knowledge-intensive tasks. However, prevailing training methods like Group Relative Policy Optimization (GRPO) discard this rich entity information, relying instead on sparse, outcome-based rewards. This critical limitation renders them unable to distinguish informative "near-miss" samples-those with substantially correct reasoning but a flawed final answer-from complete failures, thus discarding valuable learning signals. We address this by leveraging the very entities discarded during training. Our empirical analysis reveals a strong positive correlation between the number of ground-truth entities identified during an agent's reasoning process and final answer accuracy. Building on this insight, we introduce Entity-aware Group Relative Policy Optimization (E-GRPO), a novel framework that formulates a dense entity-aware reward function. E-GRPO assigns partial rewards to incorrect samples proportional to their entity match rate, enabling the model to effectively learn from these "near-misses". Experiments on diverse question-answering (QA) and deep research benchmarks show that E-GRPO consistently and significantly outperforms the GRPO baseline. Furthermore, our analysis reveals that E-GRPO not only achieves superior accuracy but also induces more efficient reasoning policies that require fewer tool calls, demonstrating a more effective and sample-efficient approach to aligning search agents.


Climate Knowledge in Large Language Models

Kuznetsov, Ivan, Grassi, Jacopo, Pantiukhin, Dmitrii, Shapkin, Boris, Jung, Thomas, Koldunov, Nikolay

arXiv.org Artificial Intelligence

Large language models (LLMs) are increasingly deployed for climate-related applications, where understanding internal climatological knowledge is crucial for reliability and misinformation risk assessment. Despite growing adoption, the capacity of LLMs to recall climate normals from parametric knowledge remains largely uncharacterized. We investigate the capacity of contemporary LLMs to recall climate normals without external retrieval, focusing on a prototypical query: mean July 2-m air temperature 1991-2020 at specified locations. We construct a global grid of queries at 1° resolution land points, providing coordinates and location descriptors, and validate responses against ERA5 reanalysis. Results show that LLMs encode non-trivial climate structure, capturing latitudinal and topographic patterns, with root-mean-square errors of 3-6 °C and biases of $\pm$1 °C. However, spatially coherent errors remain, particularly in mountains and high latitudes. Performance degrades sharply above 1500 m, where RMSE reaches 5-13 °C compared to 2-4 °C at lower elevations. We find that including geographic context (country, city, region) reduces errors by 27% on average, with larger models being most sensitive to location descriptors. While models capture the global mean magnitude of observed warming between 1950-1974 and 2000-2024, they fail to reproduce spatial patterns of temperature change, which directly relate to assessing climate change. This limitation highlights that while LLMs may capture present-day climate distributions, they struggle to represent the regional and local expression of long-term shifts in temperature essential for understanding climate dynamics. Our evaluation framework provides a reproducible benchmark for quantifying parametric climate knowledge in LLMs and complements existing climate communication assessments.



9 Appendix Supplementary material for the paper Causal analysis of 19 spread in Germany

Neural Information Processing Systems

W in V, W is independent of V\ ( Descendants(W) Parents( W)) given Parents (W) . As expected we see that the number of detected causes by Granger is multiple times more than those of SyPI; in most cases Granger detects as causes all the candidate states. On the other hand, SyPI does not suffer from such problems even when there are latent confounders. Finally, in the third column, we report the detected distant causes. Strict thresholds (the default of SyPI method) are used for the analysis.



LargeAD: Large-Scale Cross-Sensor Data Pretraining for Autonomous Driving

Kong, Lingdong, Xu, Xiang, Liu, Youquan, Cen, Jun, Chen, Runnan, Zhang, Wenwei, Pan, Liang, Chen, Kai, Liu, Ziwei

arXiv.org Artificial Intelligence

Recent advancements in vision foundation models (VFMs) have revolutionized visual perception in 2D, yet their potential for 3D scene understanding, particularly in autonomous driving applications, remains underexplored. In this paper, we introduce LargeAD, a versatile and scalable framework designed for large-scale 3D pretraining across diverse real-world driving datasets. Our framework leverages VFMs to extract semantically rich superpixels from 2D images, which are aligned with LiDAR point clouds to generate high-quality contrastive samples. This alignment facilitates cross-modal representation learning, enhancing the semantic consistency between 2D and 3D data. We introduce several key innovations: i) VFM-driven superpixel generation for detailed semantic representation, ii) a VFM-assisted contrastive learning strategy to align multimodal features, iii) superpoint temporal consistency to maintain stable representations across time, and iv) multi-source data pretraining to generalize across various LiDAR configurations. Our approach delivers significant performance improvements over state-of-the-art methods in both linear probing and fine-tuning tasks for both LiDAR-based segmentation and object detection. Extensive experiments on eleven large-scale multi-modal datasets highlight our superior performance, demonstrating the adaptability, efficiency, and robustness in real-world autonomous driving scenarios.


Agent-Based Modelling of Older Adult Needs for Autonomous Mobility-on-Demand: A Case Study in Winnipeg, Canada

Prédhumeau, Manon, Manley, Ed

arXiv.org Artificial Intelligence

As the populations continue to age across many nations, ensuring accessible and efficient transportation options for older adults has become an increasingly important concern. Autonomous Mobility-on-Demand (AMoD) systems have emerged as a potential solution to address the needs faced by older adults in their daily mobility. However, estimation of older adult mobility needs, and how they vary over space and time, is crucial for effective planning and implementation of such service, and conventional four-step approaches lack the granularity to fully account for these needs. To address this challenge, we propose an agent-based model of older adults mobility demand in Winnipeg, Canada. The model is built for 2022 using primarily open data, and is implemented in the Multi-Agent Transport Simulation (MATSim) toolkit. After calibration to accurately reproduce observed travel behaviors, a new AMoD service is tested in simulation and its potential adoption among Winnipeg older adults is explored. The model can help policy makers to estimate the needs of the elderly populations for door-to-door transportation and can guide the design of AMoD transport systems.


Real-time Ship Recognition and Georeferencing for the Improvement of Maritime Situational Awareness

Perez, Borja Carrillo

arXiv.org Artificial Intelligence

In an era where maritime infrastructures are crucial, advanced situational awareness solutions are increasingly important. The use of optical camera systems can allow real-time usage of maritime footage. This thesis presents an investigation into leveraging deep learning and computer vision to advance real-time ship recognition and georeferencing for the improvement of maritime situational awareness. A novel dataset, ShipSG, is introduced, containing 3,505 images and 11,625 ship masks with corresponding class and geographic position. After an exploration of state-of-the-art, a custom real-time segmentation architecture, ScatYOLOv8+CBAM, is designed for the NVIDIA Jetson AGX Xavier embedded system. This architecture adds the 2D scattering transform and attention mechanisms to YOLOv8, achieving an mAP of 75.46% and an 25.3 ms per frame, outperforming state-of-the-art methods by over 5%. To improve small and distant ship recognition in high-resolution images on embedded systems, an enhanced slicing mechanism is introduced, improving mAP by 8% to 11%. Additionally, a georeferencing method is proposed, achieving positioning errors of 18 m for ships up to 400 m away and 44 m for ships between 400 m and 1200 m. The findings are also applied in real-world scenarios, such as the detection of abnormal ship behaviour, camera integrity assessment and 3D reconstruction. The approach of this thesis outperforms existing methods and provides a framework for integrating recognized and georeferenced ships into real-time systems, enhancing operational effectiveness and decision-making for maritime stakeholders. This thesis contributes to the maritime computer vision field by establishing a benchmark for ship segmentation and georeferencing research, demonstrating the viability of deep-learning-based recognition and georeferencing methods for real-time maritime monitoring.