PaSa: An LLM Agent for Comprehensive Academic Paper Search
He, Yichen, Huang, Guanhua, Feng, Peiyuan, Lin, Yuan, Zhang, Yuchen, Li, Hang, E, Weinan
We introduce PaSa, an advanced Paper Search agent powered by large language models. PaSa can autonomously make a series of decisions, including invoking search tools, reading papers, and selecting relevant references, to ultimately obtain comprehensive and accurate results for complex scholarly queries. We optimize PaSa using reinforcement learning with a synthetic dataset, AutoScholarQuery, which includes 35k fine-grained academic queries and corresponding papers sourced from top-tier AI conference publications. Additionally, we develop RealScholarQuery, a benchmark collecting real-world academic queries to assess PaSa performance in more realistic scenarios. Despite being trained on synthetic data, PaSa significantly outperforms existing baselines on RealScholarQuery, including Google, Google Scholar, Google with GPT-4 for paraphrased queries, chatGPT (search-enabled GPT-4o), GPT-o1, and PaSa-GPT-4o (PaSa implemented by prompting GPT-4o). Notably, PaSa-7B surpasses the best Google-based baseline, Google with GPT-4o, by 37.78% in recall@20 and 39.90% in recall@50. It also exceeds PaSa-GPT-4o by 30.36% in recall and 4.25% in precision. Model, datasets, and code are available at https://github.com/bytedance/pasa.
- North America > Mexico > Mexico City > Mexico City (0.04)
- Asia > Thailand > Bangkok > Bangkok (0.04)
- Asia > China (0.04)
Temporal Graph MLP Mixer for Spatio-Temporal Forecasting
Bilal, Muhammad, Lopez, Luis Carretero
Spatiotemporal forecasting is critical in applications such as traffic prediction, climate modeling, and environmental monitoring. However, the prevalence of missing data in real-world sensor networks significantly complicates this task. In this paper, we introduce the Temporal Graph MLP-Mixer (T-GMM), a novel architecture designed to address these challenges. The model combines node-level processing with patch-level subgraph encoding to capture localized spatial dependencies while leveraging a three-dimensional MLP-Mixer to handle temporal, spatial, and feature-based dependencies. Experiments on the AQI, ENGRAD, PV-US and METR-LA datasets demonstrate the model's ability to effectively forecast even in the presence of significant missing data. While not surpassing state-of-the-art models in all scenarios, the T-GMM exhibits strong learning capabilities, particularly in capturing long-range dependencies. These results highlight its potential for robust, scalable spatiotemporal forecasting.
Infrastructure for AI Agents
Chan, Alan, Wei, Kevin, Huang, Sihao, Rajkumar, Nitarshan, Perrier, Elija, Lazar, Seth, Hadfield, Gillian K., Anderljung, Markus
Increasingly many AI systems can plan and execute interactions in open-ended environments, such as making phone calls or buying online goods. As developers grow the space of tasks that such AI agents can accomplish, we will need tools both to unlock their benefits and manage their risks. Current tools are largely insufficient because they are not designed to shape how agents interact with existing institutions (e.g., legal and economic systems) or actors (e.g., digital service providers, humans, other AI agents). For example, alignment techniques by nature do not assure counterparties that some human will be held accountable when a user instructs an agent to perform an illegal action. To fill this gap, we propose the concept of agent infrastructure: technical systems and shared protocols external to agents that are designed to mediate and influence their interactions with and impacts on their environments. Agent infrastructure comprises both new tools and reconfigurations or extensions of existing tools. For example, to facilitate accountability, protocols that tie users to agents could build upon existing systems for user authentication, such as OpenID. Just as the Internet relies on infrastructure like HTTPS, we argue that agent infrastructure will be similarly indispensable to ecosystems of agents. We identify three functions for agent infrastructure: 1) attributing actions, properties, and other information to specific agents, their users, or other actors; 2) shaping agents' interactions; and 3) detecting and remedying harmful actions from agents. We propose infrastructure that could help achieve each function, explaining use cases, adoption, limitations, and open questions. Making progress on agent infrastructure can prepare society for the adoption of more advanced agents.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- North America > United States > New York > New York County > New York City (0.05)
- North America > United States > North Carolina (0.04)
- (10 more...)
- Law Enforcement & Public Safety (1.00)
- Law (1.00)
- Information Technology > Security & Privacy (1.00)
- (3 more...)
Agent-as-Judge for Factual Summarization of Long Narratives
Jeong, Yeonseok, Kim, Minsoo, Hwang, Seung-won, Kim, Byung-Hak
Large Language Models (LLMs) have demonstrated near-human performance in summarization tasks based on traditional metrics such as ROUGE and BERTScore. However, these metrics do not adequately capture critical aspects of summarization quality, such as factual accuracy, particularly for long narratives (>100K tokens). Recent advances, such as LLM-as-a-Judge, address the limitations of metrics based on lexical similarity but still exhibit factual inconsistencies, especially in understanding character relationships and states. In this work, we introduce NarrativeFactScore, a novel "Agent-as-a-Judge" framework for evaluating and refining summaries. By leveraging a Character Knowledge Graph (CKG) extracted from input and generated summaries, NarrativeFactScore assesses the factual consistency and provides actionable guidance for refinement, such as identifying missing or erroneous facts. We demonstrate the effectiveness of NarrativeFactScore through a detailed workflow illustration and extensive validation on widely adopted benchmarks, achieving superior performance compared to competitive methods. Our results highlight the potential of agent-driven evaluation systems to improve the factual reliability of LLM-generated summaries.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Asia > South Korea > Seoul > Seoul (0.04)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- (5 more...)
Multi-Modal Attention Networks for Enhanced Segmentation and Depth Estimation of Subsurface Defects in Pulse Thermography
Salah, Mohammed, Werghi, Naoufel, Svetinovic, Davor, Abdulrahman, Yusra
AI-driven pulse thermography (PT) has become a crucial tool in non-destructive testing (NDT), enabling automatic detection of hidden anomalies in various industrial components. Current state-of-the-art techniques feed segmentation and depth estimation networks compressed PT sequences using either Principal Component Analysis (PCA) or Thermographic Signal Reconstruction (TSR). However, treating these two modalities independently constrains the performance of PT inspection models as these representations possess complementary semantic features. To address this limitation, this work proposes PT-Fusion, a multi-modal attention-based fusion network that fuses both PCA and TSR modalities for defect segmentation and depth estimation of subsurface defects in PT setups. PT-Fusion introduces novel feature fusion modules, Encoder Attention Fusion Gate (EAFG) and Attention Enhanced Decoding Block (AEDB), to fuse PCA and TSR features for enhanced segmentation and depth estimation of subsurface defects. In addition, a novel data augmentation technique is proposed based on random data sampling from thermographic sequences to alleviate the scarcity of PT datasets. The proposed method is benchmarked against state-of-the-art PT inspection models, including U-Net, attention U-Net, and 3D-CNN on the Universit\'e Laval IRT-PVC dataset. The results demonstrate that PT-Fusion outperforms the aforementioned models in defect segmentation and depth estimation accuracies with a margin of 10%.
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.15)
- North America > United States > Massachusetts (0.04)
- North America > United States > Arizona (0.04)
- (5 more...)
- Materials (0.67)
- Health & Medicine (0.46)
- Energy > Renewable (0.46)
Comparing hundreds of machine learning classifiers and discrete choice models in predicting travel behavior: an empirical benchmark
Wang, Shenhao, Mo, Baichuan, Zheng, Yunhan, Hess, Stephane, Zhao, Jinhua
Numerous studies have compared machine learning (ML) and discrete choice models (DCMs) in predicting travel demand. However, these studies often lack generalizability as they compare models deterministically without considering contextual variations. To address this limitation, our study develops an empirical benchmark by designing a tournament model, thus efficiently summarizing a large number of experiments, quantifying the randomness in model comparisons, and using formal statistical tests to differentiate between the model and contextual effects. This benchmark study compares two large-scale data sources: a database compiled from literature review summarizing 136 experiments from 35 studies, and our own experiment data, encompassing a total of 6,970 experiments from 105 models and 12 model families. This benchmark study yields two key findings. Firstly, many ML models, particularly the ensemble methods and deep learning, statistically outperform the DCM family (i.e., multinomial, nested, and mixed logit models). However, this study also highlights the crucial role of the contextual factors (i.e., data sources, inputs and choice categories), which can explain models' predictive performance more effectively than the differences in model types alone. Model performance varies significantly with data sources, improving with larger sample sizes and lower dimensional alternative sets. After controlling all the model and contextual factors, significant randomness still remains, implying inherent uncertainty in such model comparisons. Overall, we suggest that future researchers shift more focus from context-specific model comparisons towards examining model transferability across contexts and characterizing the inherent uncertainty in ML, thus creating more robust and generalizable next-generation travel demand models.
- Europe (0.04)
- Asia > Singapore (0.04)
- North America > United States > Massachusetts (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (0.90)
A recursive Bayesian neural network for constitutive modeling of sands under monotonic loading
Noor, Toiba, Lone, Soban Nasir, Ramana, G. V., Nayek, Rajdip
In geotechnical engineering, constitutive models play a crucial role in describing soil behavior under varying loading conditions. Data-driven deep learning (DL) models offer a promising alternative for developing predictive constitutive models. When prediction is the primary focus, quantifying the predictive uncertainty of a trained DL model and communicating this uncertainty to end users is crucial for informed decision-making. This study proposes a recursive Bayesian neural network (rBNN) framework, which builds upon recursive feedforward neural networks (rFFNNs) by introducing generalized Bayesian inference for uncertainty quantification. A significant contribution of this work is the incorporation of a sliding window approach in rFFNNs, allowing the models to effectively capture temporal dependencies across load steps. The rBNN extends this framework by treating model parameters as random variables, with their posterior distributions inferred using generalized variational inference. The proposed framework is validated on two datasets: (i) a numerically simulated consolidated drained (CD) triaxial dataset employing a hardening soil model and (ii) an experimental dataset comprising 28 CD triaxial tests on Baskarp sand. Comparative analyses with LSTM, Bi-LSTM, and GRU models demonstrate that the deterministic rFFNN achieves superior predictive accuracy, attributed to its transparent structure and sliding window design. While the rBNN marginally trails in accuracy for the experimental case, it provides robust confidence intervals, addressing data sparsity and measurement noise in experimental conditions. The study underscores the trade-offs between deterministic and probabilistic approaches and the potential of rBNNs for uncertainty-aware constitutive modeling.
- Europe > Denmark > North Jutland > Aalborg (0.04)
- North America > United States > Nevada (0.04)
- North America > United States > Colorado > Boulder County > Boulder (0.04)
- (2 more...)
Counterfactual Explanations for k-means and Gaussian Clustering
Vardakas, Georgios, Karra, Antonia, Pitoura, Evaggelia, Likas, Aristidis
Counterfactuals have been recognized as an effective approach to explain classifier decisions. Nevertheless, they have not yet been considered in the context of clustering. In this work, we propose the use of counterfactuals to explain clustering solutions. First, we present a general definition for counterfactuals for model-based clustering that includes plausibility and feasibility constraints. Then we consider the counterfactual generation problem for k-means and Gaussian clustering assuming Euclidean distance. Our approach takes as input the factual, the target cluster, a binary mask indicating actionable or immutable features and a plausibility factor specifying how far from the cluster boundary the counterfactual should be placed. In the k-means clustering case, analytical mathematical formulas are presented for computing the optimal solution, while in the Gaussian clustering case (assuming full, diagonal, or spherical covariances) our method requires the numerical solution of a nonlinear equation with a single parameter only. We demonstrate the advantages of our approach through illustrative examples and quantitative experimental comparisons.
- Research Report > Experimental Study (0.68)
- Research Report > New Finding (0.68)
Fast energy-aware OLSR routing in VANETs by means of a parallel evolutionary algorithm
Toutouh, Jamal, Nesmachnow, Sergio, Alba, Enrique
This work tackles the problem of reducing the power consumption of the OLSR routing protocol in vehicular networks. Nowadays, energy-aware and green communication protocols are important research topics, specially when deploying wireless mobile networks. This article introduces a fast automatic methodology to search for energy-efficient OLSR configurations by using a parallel evolutionary algorithm. The experimental analysis demonstrates that significant improvements over the standard configuration can be attained in terms of power consumption, with no noteworthy loss in the QoS.
- Europe > Spain (0.14)
- South America > Uruguay (0.04)
- North America > United States > New Jersey > Middlesex County > Piscataway (0.04)
- (3 more...)
- Energy (1.00)
- Telecommunications > Networks (0.67)
Deployment of an Aerial Multi-agent System for Automated Task Execution in Large-scale Underground Mining Environments
Dahlquist, Niklas, Nordström, Samuel, Stathoulopoulos, Nikolaos, Lindqvist, Björn, Saradagi, Akshit, Nikolakopoulos, George
In this article, we present a framework for deploying an aerial multi-agent system in large-scale subterranean environments with minimal infrastructure for supporting multi-agent operations. The multi-agent objective is to optimally and reactively allocate and execute inspection tasks in a mine, which are entered by a mine operator onthe-fly. The assignment of currently available tasks to the team of agents is accomplished through an auction-based system, where the agents bid for available tasks, which are used by a central auctioneer to optimally assigns tasks to agents. A mobile Wi-Fi mesh supports inter-agent communication and bi-directional communication between the agents and the task allocator, while the task execution is performed completely infrastructure-free. Given a task to be accomplished, a reliable and modular agent behavior is synthesized by generating behavior trees from a pool of agent capabilities, using a back-chaining approach. The auction system in the proposed framework is reactive and supports addition of new operator-specified tasks on-the-go, at any point through a user-friendly operator interface. The framework has been validated in a real underground mining environment using three aerial agents, with several inspection locations spread in an environment of almost 200 meters. The proposed framework can be utilized for missions involving rapid inspection, gas detection, distributed sensing and mapping etc. in a subterranean environment. The proposed framework and its field deployment contributes towards furthering reliable automation in large-scale subterranean environments to offload both routine and dangerous tasks from human operators to autonomous aerial robots. The use of autonomous robotic platforms in industrial production facilities is on the rise, both to increase profitability and to increase safety for human operators [1]. Specifically, in deep underground mining, where the fundamental risk of accidents is high, the industry is focusing on creating a safer environment for humans by deploying robotic systems to either execute dangerous tasks or verify the safety before authorizing human entry. Through efforts in the mining industry, human workers have already been moved to safer locations in several critical operations via, for instance, teleoperation of heavy machinery.
- North America > Canada > Quebec > Montreal (0.04)
- Europe > Sweden > Norrbotten County > Luleå (0.04)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- Europe > Italy > Sicily > Palermo (0.04)