supported
10 Breakthrough Technologies 2026
Our reporters and editors constantly debate which emerging technologies will define the future. Once a year, we take stock and share some educated guesses with our readers. Here are the advances that we think will drive progress or incite the most change--for better or worse--in the years ahead. Rubrik is the exclusive sponsor of the 10 Breakthrough Technologies 2026 and had no editorial influence on this list. Rubrik is a security and AI operations company that aims to secure and accelerate the world's AI transformation.
- Energy (0.99)
- Health & Medicine > Therapeutic Area (0.49)
- Health & Medicine > Pharmaceuticals & Biotechnology (0.33)
Hybrid Fact-Checking that Integrates Knowledge Graphs, Large Language Models, and Search-Based Retrieval Agents Improves Interpretable Claim Verification
Kolli, Shaghayegh, Rosenbaum, Richard, Cavelius, Timo, Strothe, Lasse, Lata, Andrii, Diesner, Jana
Large language models (LLMs) excel in generating fluent utterances but can lack reliable grounding in verified information. At the same time, knowledge-graph-based fact-checkers deliver precise and interpretable evidence, yet suffer from limited coverage or latency. By integrating LLMs with knowledge graphs and real-time search agents, we introduce a hybrid fact-checking approach that leverages the individual strengths of each component. Our system comprises three autonomous steps: 1) a Knowledge Graph (KG) Retrieval for rapid one-hop lookups in DBpedia, 2) an LM-based classification guided by a task-specific labeling prompt, producing outputs with internal rule-based logic, and 3) a Web Search Agent invoked only when KG coverage is insufficient. Our pipeline achieves an F1 score of 0.93 on the FEVER benchmark on the Supported/Refuted split without task-specific fine-tuning. To address Not enough information cases, we conduct a targeted reannotation study showing that our approach frequently uncovers valid evidence for claims originally labeled as Not Enough Information (NEI), as confirmed by both expert annotators and LLM reviewers. With this paper, we present a modular, opensource fact-checking pipeline with fallback strategies and generalization across datasets.
- North America > Canada (0.05)
- North America > Mexico > Mexico City > Mexico City (0.04)
- Asia > Singapore (0.04)
- (5 more...)
Federated Data Analytics for Cancer Immunotherapy: A Privacy-Preserving Collaborative Platform for Patient Management
Raheem, Mira, Papazoglou, Michael, Krämer, Bernd, El-Tazi, Neamat, Elgammal, Amal
Connected health is a multidisciplinary approach focused on health management, prioritizing pa-tient needs in the creation of tools, services, and treatments. This paradigm ensures proactive and efficient care by facilitating the timely exchange of accurate patient information among all stake-holders in the care continuum. The rise of digital technologies and process innovations promises to enhance connected health by integrating various healthcare data sources. This integration aims to personalize care, predict health outcomes, and streamline patient management, though challeng-es remain, particularly in data architecture, application interoperability, and security. Data analytics can provide critical insights for informed decision-making and health co-creation, but solutions must prioritize end-users, including patients and healthcare professionals. This perspective was explored through an agile System Development Lifecycle in an EU-funded project aimed at developing an integrated AI-generated solution for managing cancer patients undergoing immunotherapy. This paper contributes with a collaborative digital framework integrating stakeholders across the care continuum, leveraging federated big data analytics and artificial intelligence for improved decision-making while ensuring privacy. Analytical capabilities, such as treatment recommendations and adverse event predictions, were validated using real-life data, achieving 70%-90% accuracy in a pilot study with the medical partners, demonstrating the framework's effectiveness.
- Africa > Middle East > Egypt > Cairo Governorate > Cairo (0.05)
- Europe > Spain (0.04)
- Europe > Germany > Brandenburg > Potsdam (0.04)
- (8 more...)
- Information Technology > Security & Privacy (1.00)
- Health & Medicine > Therapeutic Area > Oncology (1.00)
- Health & Medicine > Consumer Health (1.00)
- Information Technology > Data Science > Data Mining > Big Data (1.00)
- Information Technology > Communications > Web > Semantic Web (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (1.00)
If We May De-Presuppose: Robustly Verifying Claims through Presupposition-Free Question Decomposition
Dipta, Shubhashis Roy, Ferraro, Francis
Prior work has shown that presupposition in generated questions can introduce unverified assumptions, leading to inconsistencies in claim verification. Additionally, prompt sensitivity remains a significant challenge for large language models (LLMs), resulting in performance variance as high as 3-6%. While recent advancements have reduced this gap, our study demonstrates that prompt sensitivity remains a persistent issue. To address this, we propose a structured and robust claim verification framework that reasons through presupposition-free, decomposed questions. Extensive experiments across multiple prompts, datasets, and LLMs reveal that even state-of-the-art models remain susceptible to prompt variance and presupposition. Our method consistently mitigates these issues, achieving up to a 2-5% improvement.
- North America > United States > Maryland > Baltimore County (0.14)
- North America > United States > Maryland > Baltimore (0.14)
- Europe > Germany (0.04)
- (12 more...)
- Media (0.68)
- Leisure & Entertainment (0.68)
- Government (0.68)
VeriTrail: Closed-Domain Hallucination Detection with Traceability
Metropolitansky, Dasha, Larson, Jonathan
Even when instructed to adhere to source material, Language Models often generate unsubstantiated content - a phenomenon known as "closed-domain hallucination." This risk is amplified in processes with multiple generative steps (MGS), compared to processes with a single generative step (SGS). However, due to the greater complexity of MGS processes, we argue that detecting hallucinations in their final outputs is necessary but not sufficient: it is equally important to trace where hallucinated content was likely introduced and how faithful content may have been derived from the source through intermediate outputs. To address this need, we present VeriTrail, the first closed-domain hallucination detection method designed to provide traceability for both MGS and SGS processes. We also introduce the first datasets to include all intermediate outputs as well as human annotations of final outputs' faithfulness for their respective MGS processes. We demonstrate that VeriTrail outperforms baseline methods on both datasets.
- Law (0.67)
- Energy > Renewable (0.46)
- Government > Regional Government (0.46)
- Information Technology > Services (0.45)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.94)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
All You Need is Sally-Anne: ToM in AI Strongly Supported After Surpassing Tests for 3-Year-Olds
Alon, Nitay, Barnby, Joseph, Mirsky, Reuth, Sarkadi, Stefan
Theory of Mind (ToM) is a hallmark of human cognition, allowing individuals to reason about others' beliefs and intentions. Engineers behind recent advances in Artificial Intelligence (AI) have claimed to demonstrate comparable capabilities. This paper presents a model that surpasses traditional ToM tests designed for 3-year-old children, providing strong support for the presence of ToM in AI systems.
- Oceania > Australia > Western Australia (0.05)
- Europe > United Kingdom > England > Greater London > London (0.05)
- North America > United States (0.04)
- (2 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Cognitive Science (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.48)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.47)
MEMERAG: A Multilingual End-to-End Meta-Evaluation Benchmark for Retrieval Augmented Generation
Blandón, María Andrea Cruz, Talur, Jayasimha, Charron, Bruno, Liu, Dong, Mansour, Saab, Federico, Marcello
Automatic evaluation of retrieval augmented generation (RAG) systems relies on fine-grained dimensions like faithfulness and relevance, as judged by expert human annotators. Meta-evaluation benchmarks support the development of automatic evaluators that correlate well with human judgement. However, existing benchmarks predominantly focus on English or use translated data, which fails to capture cultural nuances. A native approach provides a better representation of the end user experience. In this work, we develop a Multilingual End-to-end Meta-Evaluation RAG benchmark (MEMERAG). Our benchmark builds on the popular MIRACL dataset, using native-language questions and generating responses with diverse large language models (LLMs), which are then assessed by expert annotators for faithfulness and relevance. We describe our annotation process and show that it achieves high inter-annotator agreement. We then analyse the performance of the answer-generating LLMs across languages as per the human evaluators. Finally we apply the dataset to our main use-case which is to benchmark multilingual automatic evaluators (LLM-as-a-judge). We show that our benchmark can reliably identify improvements offered by advanced prompting techniques and LLMs. We will release our benchmark to support the community developing accurate evaluation methods for multilingual RAG systems.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- Europe > Spain (0.14)
- North America > United States > Florida > Miami-Dade County > Miami (0.04)
- (7 more...)
STRIVE: Structured Reasoning for Self-Improvement in Claim Verification
Gong, Haisong, Li, Jing, Wu, Junfei, Liu, Qiang, Wu, Shu, Wang, Liang
Claim verification is the task of determining whether a claim is supported or refuted by evidence. Self-improvement methods, where reasoning chains are generated and those leading to correct results are selected for training, have succeeded in tasks like mathematical problem solving. However, in claim verification, this approach struggles. Low-quality reasoning chains may falsely match binary truth labels, introducing faulty reasoning into the self-improvement process and ultimately degrading performance. To address this, we propose STRIVE: Structured Reasoning for Self-Improved Verification. Our method introduces a structured reasoning design with Claim Decomposition, Entity Analysis, and Evidence Grounding Verification. These components improve reasoning quality, reduce errors, and provide additional supervision signals for self-improvement. STRIVE begins with a warm-up phase, where the base model is fine-tuned on a small number of annotated examples to learn the structured reasoning design. It is then applied to generate reasoning chains for all training examples, selecting only those that are correct and structurally sound for subsequent self-improvement training. We demonstrate that STRIVE achieves significant improvements over baseline models, with a 31.4% performance gain over the base model and 20.7% over Chain of Thought on the HOVER datasets, highlighting its effectiveness.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Asia > China > Hubei Province > Wuhan (0.05)
- North America > United States > Pennsylvania (0.04)
- (10 more...)
- Leisure & Entertainment > Sports (0.95)
- Media (0.95)
VeriFact: Verifying Facts in LLM-Generated Clinical Text with Electronic Health Records
Chung, Philip, Swaminathan, Akshay, Goodell, Alex J., Kim, Yeasul, Reincke, S. Momsen, Han, Lichy, Deverett, Ben, Sadeghi, Mohammad Amin, Ariss, Abdel-Badih, Ghanem, Marc, Seong, David, Lee, Andrew A., Coombes, Caitlin E., Bradshaw, Brad, Sufian, Mahir A., Hong, Hyo Jung, Nguyen, Teresa P., Rasouli, Mohammad R., Kamra, Komal, Burbridge, Mark A., McAvoy, James C., Saffary, Roya, Ma, Stephen P., Dash, Dev, Xie, James, Wang, Ellen Y., Schmiesing, Clifford A., Shah, Nigam, Aghaeepour, Nima
Methods to ensure factual accuracy of text generated by large language models (LLM) in clinical medicine are lacking. VeriFact is an artificial intelligence system that combines retrieval-augmented generation and LLM-as-a-Judge to verify whether LLM-generated text is factually supported by a patient's medical history based on their electronic health record (EHR). To evaluate this system, we introduce VeriFact-BHC, a new dataset that decomposes Brief Hospital Course narratives from discharge summaries into a set of simple statements with clinician annotations for whether each statement is supported by the patient's EHR clinical notes. Whereas highest agreement between clinicians was 88.5%, VeriFact achieves up to 92.7% agreement when compared to a denoised and adjudicated average human clinician ground truth, suggesting that VeriFact exceeds the average clinician's ability to fact-check text against a patient's medical record. VeriFact may accelerate the development of LLM-based EHR applications by removing current evaluation bottlenecks.
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > New Jersey > Mercer County > Princeton (0.04)
- North America > United States > Massachusetts > Suffolk County > Boston (0.04)
- (3 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.92)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)
Weak baselines and reporting biases lead to overoptimism in machine learning for fluid-related partial differential equations
One of the most promising applications of machine learning (ML) in computational physics is to accelerate the solution of partial differential equations (PDEs). The key objective of ML-based PDE solvers is to output a sufficiently accurate solution faster than standard numerical methods, which are used as a baseline comparison. We first perform a systematic review of the ML-for-PDE solving literature. Of articles that use ML to solve a fluid-related PDE and claim to outperform a standard numerical method, we determine that 79% (60/76) compare to a weak baseline. Second, we find evidence that reporting biases, especially outcome reporting bias and publication bias, are widespread. We conclude that ML-for-PDE solving research is overoptimistic: weak baselines lead to overly positive results, while reporting biases lead to underreporting of negative results. To a large extent, these issues appear to be caused by factors similar to those of past reproducibility crises: researcher degrees of freedom and a bias towards positive results. We call for bottom-up cultural changes to minimize biased reporting as well as top-down structural reforms intended to reduce perverse incentives for doing so.
- Health & Medicine (1.00)
- Energy > Oil & Gas > Upstream (1.00)
- Government > Regional Government > North America Government > United States Government (0.45)