Government
An Electrocardiogram Multi-task Benchmark with Comprehensive Evaluations and Insightful Findings
Xu, Yuhao, Lu, Jiaying, Ding, Sirui, Cao, Defu, Hu, Xiao, Yang, Carl
In the process of patient diagnosis, non-invasive measurements are widely used due to their low risks and quick results. Electrocardiogram (ECG), as a non-invasive method to collect heart activities, is used to diagnose cardiac conditions. Analyzing the ECG typically requires domain expertise, which is a roadblock to applying artificial intelligence (AI) for healthcare. Through advances in self-supervised learning and foundation models, AI systems can now acquire and leverage domain knowledge without relying solely on human expertise. However, there is a lack of comprehensive analyses over the foundation models' performance on ECG. This study aims to answer the research question: "Are Foundation Models Useful for ECG Analysis?" To address it, we evaluate language/general time-series/ECG foundation models in comparison with time-series deep learning models. The experimental results show that general time-series/ECG foundation models achieve a top performance rate of 80%, indicating their effectiveness in ECG analysis. In-depth analyses and insights are provided along with comprehensive experimental results. This study highlights the limitations and potential of foundation models in advancing physiological waveform analysis. The data and code for this benchmark are publicly available at https://github.com/yuhaoxu99/ECGMultitasks-Benchmark.
Large Language Models as Search Engines: Societal Challenges
Sadeddine, Zacchary, Maxwell, Winston, Varoquaux, Gaël, Suchanek, Fabian M.
Large Language Models (LLMs) may one day replace search engines as the primary portal to information on the Web. In this article, we investigate the societal challenges that such a change could bring. We focus on the roles of LLM Providers, Content Creators, and End Users, and identify 15 types of challenges. With each, we show current mitigation strategies -- both from the technical perspective and the legal perspective. We also discuss the impact of each challenge and point out future research opportunities.
Optimal Perturbation Budget Allocation for Data Poisoning in Offline Reinforcement Learning
Qiu, Junnan, Zhao, Yuanjie, Li, Jie
Offline Reinforcement Learning (RL) enables policy optimization from static datasets but is inherently vulnerable to data poisoning attacks. Existing attack strategies typically rely on locally uniform perturbations, which treat all samples indiscriminately. This approach is inefficient, as it wastes the perturbation budget on low-impact samples, and lacks stealthiness due to significant statistical deviations. In this paper, we propose a novel Global Budget Allocation attack strategy. Leveraging the theoretical insight that a sample's influence on value function convergence is proportional to its Temporal Difference (TD) error, we formulate the attack as a global resource allocation problem. We derive a closed-form solution where perturbation magnitudes are assigned proportional to the TD-error sensitivity under a global L2 constraint. Empirical results on D4RL benchmarks demonstrate that our method significantly outperforms baseline strategies, achieving up to 80% performance degradation with minimal perturbations that evade detection by state-of-the-art statistical and spectral defenses.
The Adoption and Usage of AI Agents: Early Evidence from Perplexity
Yang, Jeremy, Yonack, Noah, Zyskowski, Kate, Yarats, Denis, Ho, Johnny, Ma, Jerry
This paper presents the first large-scale field study of the adoption, usage intensity, and use cases of general-purpose AI agents operating in open-world web environments. Our analysis centers on Comet, an AI-powered browser developed by Perplexity, and its integrated agent, Comet Assistant. Drawing on hundreds of millions of anonymized user interactions, we address three fundamental questions: Who is using AI agents? How intensively are they using them? And what are they using them for? Our findings reveal substantial heterogeneity in adoption and usage across user segments. Earlier adopters, users in countries with higher GDP per capita and educational attainment, and individuals working in digital or knowledge-intensive sectors -- such as digital technology, academia, finance, marketing, and entrepreneurship -- are more likely to adopt or actively use the agent. To systematically characterize the substance of agent usage, we introduce a hierarchical agentic taxonomy that organizes use cases across three levels: topic, subtopic, and task. The two largest topics, Productivity & Workflow and Learning & Research, account for 57% of all agentic queries, while the two largest subtopics, Courses and Shopping for Goods, make up 22%. The top 10 out of 90 tasks represent 55% of queries. Personal use constitutes 55% of queries, while professional and educational contexts comprise 30% and 16%, respectively. In the short term, use cases exhibit strong stickiness, but over time users tend to shift toward more cognitively oriented topics. The diffusion of increasingly capable AI agents carries important implications for researchers, businesses, policymakers, and educators, inviting new lines of inquiry into this rapidly emerging class of AI capabilities.
Automatic Fact-checking in English and Telugu
Chikkala, Ravi Kiran, Anikina, Tatiana, Skachkova, Natalia, Vykopal, Ivan, Agerri, Rodrigo, van Genabith, Josef
False information poses a significant global challenge, and manually verifying claims is a time-consuming and resource-intensive process. In this research paper, we experiment with different approaches to investigate the effectiveness of large language models (LLMs) in classifying factual claims by their veracity and generating justifications in English and Telugu. The key contributions of this work include the creation of a bilingual English-Telugu dataset and the benchmarking of different veracity classification approaches based on LLMs.
SAFT: Structure-Aware Fine-Tuning of LLMs for AMR-to-Text Generation
Kamel, Rafiq, Guerranti, Filippo, Geisler, Simon, Günnemann, Stephan
Large Language Models (LLMs) are increasingly applied to tasks involving structured inputs such as graphs. Abstract Meaning Representations (AMRs), which encode rich semantics as directed graphs, offer a rigorous testbed for evaluating LLMs on text generation from such structures. Yet, current methods often arbitrarily linearize AMRs, discarding key structural cues, or rely on architectures incompatible with standard LLMs. We introduce SAFT, a structure-aware fine-tuning approach that injects graph topology into pretrained LLMs without architectural changes. We compute direction-sensitive positional encodings from the magnetic Laplacian of transformed AMRs and project them into the embedding space of the LLM. While possibly applicable to any graph-structured inputs, we focus on AMR-to-text generation as a representative and challenging benchmark. SAFT sets a new state-of-the-art on AMR 3.0 with a 3.5 BLEU improvement over baselines. Gains scale with graph complexity, highlighting the value of structure-aware representations in enhancing LLM performance. SAFT offers a general and effective pathway for bridging structured data and language models.
Revealing economic facts: LLMs know more than they say
Buckmann, Marcus, Nguyen, Quynh Anh, Hill, Edward
During training, generative large language models (LLMs) are exposed to vast amounts of information, including data relevant to economic modelling, such as geospatial statistics and firm-level financial metrics. If LLMs can effectively retrieve and utilise this knowledge, they could reduce dependence on external data sources that are time-consuming to access, clean, and merge, or that incur financial costs. Moreover, if LLMs accurately represent data, they could support downstream tasks like data imputation and outlier detection. In this study, we evaluate whether and how LLMs can be used for typical economic data processes. Not all knowledge within an LLM may be explicit and retrievable in natural language by prompting the model.
Divided Fed lowers rates, signals pause and one 2026 cut as growth rebounds
U.S. Federal Reserve Chair Jerome Powell speaks during a conference following a two-day meeting of the Federal Open Market Committee, at the U.S. Federal Reserve in Washington on Wednesday. Washington - The Federal Reserve cut interest rates on Wednesday in another divided vote, but signaled it will likely pause further reductions in borrowing costs as the U.S. central bank looks for clearer signals about the direction of the job market and inflation that remains somewhat elevated. New projections issued after the Fed's two-day meeting showed the median policymaker sees just one quarter-percentage-point cut in 2026, the same outlook as in September, with inflation expected to slow to around 2.4% by the end of next year even as economic growth accelerates to an above-trend 2.3% and the unemployment rate remains at a moderate 4.4%. In considering the extent and timing of additional adjustments to the target range for the federal funds rate, the Committee will carefully assess incoming data, the rate-setting Federal Open Market Committee said in language that in the past has been used to signal a pause in policy actions -- an outlook at odds with market expectations, which remained locked into two rate cuts next year even after the Fed issued its statement. In a time of both misinformation and too much information, quality journalism is more crucial than ever.
U.S. military funds AI tools to speed modeling of viral outbreaks
As SARS-CoV-2 radiated across the planet in 2020, epidemiologists scrambled to predict its spread--and its deadly consequences. Often, they turned to models that not only simulate viral transmission and hospitalization rates, but can also predict the effect of interventions: masks, vaccines, or travel bans. But in addition to being computationally intensive, models in epidemiology and other disciplines can be black boxes: millions of lines of legacy code subject to finicky tunings by operators at research organizations scattered around the world. They don't always provide clear guidance. "The models that are used are often kind of brittle and nonexplainable," says Erica Briscoe, who was a program manager for the Automating Scientific Knowledge Extraction and Modeling (ASKEM) project at the Defense Advanced Research Projects Agency (DARPA).