Eisenstadt
Re-FRAME the Meeting Summarization SCOPE: Fact-Based Summarization and Personalization via Questions
Kirstein, Frederic, Kumar, Sonu, Ruas, Terry, Gipp, Bela
Meeting summarization with large language models (LLMs) remains error-prone, often producing outputs with hallucinations, omissions, and irrelevancies. We present FRAME, a modular pipeline that reframes summarization as a semantic enrichment task. FRAME extracts and scores salient facts, organizes them thematically, and uses these to enrich an outline into an abstractive summary. To personalize summaries, we introduce SCOPE, a reason-out-loud protocol that has the model build a reasoning trace by answering nine questions before content selection. For evaluation, we propose P-MESA, a multi-dimensional, reference-free evaluation framework to assess if a summary fits a target reader. P-MESA reliably identifies error instances, achieving >= 89% balanced accuracy against human annotations and strongly aligns with human severity ratings (r >= 0.70). On QMSum and FAME, FRAME reduces hallucination and omission by 2 out of 5 points (measured with MESA), while SCOPE improves knowledge fit and goal alignment over prompt-only baselines. Our findings advocate for rethinking summarization to improve control, faithfulness, and personalization.
- Europe > Germany > Lower Saxony > Gottingen (0.50)
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
- Asia > Singapore (0.04)
- (23 more...)
- Research Report > New Finding (1.00)
- Overview (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.94)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)
SpecDiff-2: Scaling Diffusion Drafter Alignment For Faster Speculative Decoding
Sandler, Jameson, Christopher, Jacob K., Hartvigsen, Thomas, Fioretto, Ferdinando
Speculative decoding has become the standard approach for accelerating Large Language Model (LLM) inference. It exploits a lossless draft-then-verify procedure to circumvent the latency of autoregressive decoding, achieving impressive speed-ups. Yet, current speculative decoding approaches remain limited by two fundamental bottlenecks: (1) the autoregressive dependency during drafting which limits parallelism, and (2) frequent rejections of draft tokens caused by misalignment between the draft and verify models. This paper proposes SpecDiff-2, a novel framework to jointly address these two bottlenecks. It leverages discrete diffusion as a non-autoregressive drafter to address bottleneck (1) and develops novel techniques to calibrate discrete diffusion drafters with autoregressive verifiers, addressing bottleneck (2). Experimental results across a comprehensive benchmark suite show that SpecDiff-2 achieves a new state-of-the-art across reasoning, coding, and mathematical benchmarks, improving tokens-per-second by up to an average of +55% over previous baselines and obtaining up to 5.5x average speed-up over standard decoding, without any loss of accuracy.
- North America > United States > Virginia > Albemarle County > Charlottesville (0.04)
- North America > United States > New Mexico > Bernalillo County > Albuquerque (0.04)
- Europe > Austria > Burgenland > Eisenstadt (0.04)
The QCET Taxonomy of Standard Quality Criterion Names and Definitions for the Evaluation of NLP Systems
Belz, Anya, Mille, Simon, Thomson, Craig
Prior work has shown that two NLP evaluation experiments that report results for the same quality criterion name (e.g. Fluency) do not necessarily evaluate the same aspect of quality, and the comparability implied by the name can be misleading. Not knowing when two evaluations are comparable in this sense means we currently lack the ability to draw reliable conclusions about system quality on the basis of multiple, independently conducted evaluations. This in turn hampers the ability of the field to progress scientifically as a whole, a pervasive issue in NLP since its beginning (Sparck Jones, 1981). It is hard to see how the issue of unclear comparability can be fully addressed other than by the creation of a standard set of quality criterion names and definitions that the several hundred quality criterion names actually in use in the field can be mapped to, and grounded in. Taking a strictly descriptive approach, the QCET Quality Criteria for Evaluation Taxonomy derives a standard set of quality criterion names and definitions from three surveys of evaluations reported in NLP, and structures them into a hierarchy where each parent node captures common aspects of its child nodes. We present QCET and the resources it consists of, and discuss its three main uses in (i) establishing comparability of existing evaluations, (ii) guiding the design of new evaluations, and (iii) assessing regulatory compliance.
- Research Report (1.00)
- Overview (0.67)
- Law (1.00)
- Government (1.00)
- Leisure & Entertainment (0.92)
- Health & Medicine > Therapeutic Area (0.46)
Adaptive Overclocking: Dynamic Control of Thinking Path Length via Real-Time Reasoning Signals
Jiang, Shuhao, Wang, Songbo, Qiao, Yang, Xu, Chun, Zheng, Chaoyang, Zhou, Shengyi, Wang, Huanjun, Li, Fangming, Zhang, Cong, Wang, Jiyu
Large Reasoning Models (LRMs) often suffer from computational inefficiency due to overthinking, where a fixed reasoning budget fails to match the varying complexity of tasks. To address this issue, we propose Adaptive Overclocking, a method that makes the overclocking hyperparameter $α$ dynamic and context-aware. Our method adjusts reasoning speed in real time through two complementary signals: (1) token-level model uncertainty for fine-grained step-wise control, and (2) input complexity estimation for informed initialization. We implement this approach with three strategies: Uncertainty-Aware Alpha Scheduling (UA-$α$S), Complexity-Guided Alpha Initialization (CG-$α$I), and a Hybrid Adaptive Control (HAC) that combines both. Experiments on GSM8K, MATH, and SVAMP show that HAC achieves superior accuracy-latency trade-offs, reducing unnecessary computation on simple problems while allocating more resources to challenging ones. By mitigating overthinking, Adaptive Overclocking enhances both efficiency and overall reasoning performance.
- Europe > Austria > Burgenland > Eisenstadt (0.05)
- Asia > China (0.04)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Cognitive Science (0.95)
- Information Technology > Artificial Intelligence > Representation & Reasoning (0.69)
Don't Overthink It: A Survey of Efficient R1-style Large Reasoning Models
Yue, Linan, Du, Yichao, Wang, Yizhi, Gao, Weibo, Yao, Fangzhou, Wang, Li, Liu, Ye, Xu, Ziyu, Liu, Qi, Di, Shimin, Zhang, Min-Ling
Recently, Large Reasoning Models (LRMs) have gradually become a research hotspot due to their outstanding performance in handling complex tasks. Among them, DeepSeek R1 has garnered significant attention for its exceptional performance and open-source nature, driving advancements in the research of R1-style LRMs. Unlike traditional Large Language Models (LLMs), these models enhance logical deduction and decision-making capabilities during reasoning by incorporating mechanisms such as long chain-of-thought and self-reflection through reinforcement learning. However, with the widespread application of these models, the problem of overthinking has gradually emerged. Specifically, when generating answers, these models often construct excessively long reasoning chains with redundant or repetitive steps, which leads to reduced reasoning efficiency and may affect the accuracy of the final answer. To this end, various efficient reasoning methods have been proposed, aiming to reduce the length of reasoning paths without compromising model performance and reasoning capability. By reviewing the current research advancements in the field of efficient reasoning methods systematically, we categorize existing works into two main directions based on the lens of single-model optimization versus model collaboration: (1) Efficient Reasoning with Single Model, which focuses on improving the reasoning efficiency of individual models; and (2) Efficient Reasoning with Model Collaboration, which explores optimizing reasoning paths through collaboration among multiple models. Besides, we maintain a public GitHub repository that tracks the latest progress in efficient reasoning methods.
- Europe > Austria > Vienna (0.14)
- Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
- Europe > Austria > Burgenland > Eisenstadt (0.04)
- (2 more...)
Towards Concise and Adaptive Thinking in Large Reasoning Models: A Survey
Large reasoning models (LRMs) like OpenAI o1 and DeepSeek R1 have demonstrated impressive performance on complex reasoning tasks like mathematics and programming with long Chain-of-Thought (CoT) reasoning sequences (slow-thinking), compared with traditional large language models (fast-thinking). However, these reasoning models also face a huge challenge that generating unnecessarily lengthy and redundant reasoning chains even for trivial questions. This phenomenon leads to a significant waste of inference resources, increases the response time for simple queries, and hinders the practical application of LRMs in real-world products. To this end, it is crucial to shorten lengthy reasoning chains and learn adaptive reasoning between fast and slow thinking based on input difficulty. In this survey, we provide a comprehensive overview of recent progress in concise and adaptive thinking for efficient reasoning of LRMs, including methodologies, benchmarks, and challenges for future exploration. We hope this survey can help researchers quickly understand the landscape of this field and inspire novel adaptive thinking ideas to facilitate better usage of LRMs.
- Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
- Europe > Austria > Burgenland > Eisenstadt (0.04)
- Europe > Slovenia > Drava > Municipality of Benedikt > Benedikt (0.04)
- Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
Temporal Analysis of Climate Policy Discourse: Insights from Dynamic Embedded Topic Modeling
Badekale, Rafiu Adekoya, Akinfaderin, Adewale
Understanding how policy language evolves over time is critical for assessing global responses to complex challenges such as climate change. Temporal analysis helps stakeholders, including policymakers and researchers, to evaluate past priorities, identify emerging themes, design governance strategies, and develop mitigation measures. Traditional approaches, such as manual thematic coding, are time-consuming and limited in capturing the complex, interconnected nature of global policy discourse. With the increasing relevance of unsupervised machine learning, these limitations can be addressed, particularly under high-volume, complex, and high-dimensional data conditions. In this work, we explore a novel approach that applies the dynamic embedded topic model (DETM) to analyze the evolution of global climate policy discourse. A probabilistic model designed to capture the temporal dynamics of topics over time. We collected a corpus of United Nations Framework Convention on Climate Change (UNFCCC) policy decisions from 1995 to 2023, excluding 2020 due to the postponement of COP26 as a result of the COVID-19 pandemic. The model reveals shifts from early emphases on greenhouse gases and international conventions to recent focuses on implementation, technical collaboration, capacity building, finance, and global agreements. Section 3 presents the modeling pipeline, including preprocessing, model training, and visualization of temporal word distributions. Our results show that DETM is a scalable and effective tool for analyzing the evolution of global policy discourse. Section 4 discusses the implications of these findings and we concluded with future directions and refinements to extend this approach to other policy domains.
- Asia > Japan > Honshū > Kansai > Kyoto Prefecture > Kyoto (0.05)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.05)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- (12 more...)
- Law > Environmental Law (1.00)
- Government (1.00)
- Energy > Energy Policy (1.00)
Overclocking LLM Reasoning: Monitoring and Controlling Thinking Path Lengths in LLMs
Eisenstadt, Roy, Zimerman, Itamar, Wolf, Lior
Recently, techniques such as explicit structured reasoning have demonstrated strong test-time scaling behavior by enforcing a separation between the model's internal "thinking" process and the final response. A key factor influencing answer quality in this setting is the length of the thinking stage. When the reasoning is too short, the model may fail to capture the complexity of the task. Conversely, when it is too long, the model may overthink, leading to unnecessary computation and degraded performance. This paper explores and exploits the underlying mechanisms by which LLMs understand and regulate the length of their reasoning during explicit thought processes. First, we show that LLMs encode their progress through the reasoning process and introduce an interactive progress bar visualization, which is then used to reveal insights on the model's planning dynamics. Second, we manipulate the internal progress encoding during inference to reduce unnecessary steps and generate a more concise and decisive chain of thoughts. Our empirical results demonstrate that this "overclocking" method mitigates overthinking, improves answer accuracy, and reduces inference latency. Our code is publicly available.
- Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.04)
- Europe > Croatia > Dubrovnik-Neretva County > Dubrovnik (0.04)
- Europe > Austria > Burgenland > Eisenstadt (0.04)
- Asia > Thailand > Bangkok > Bangkok (0.04)
SPEAR: Security Posture Evaluation using AI Planner-Reasoning on Attack-Connectivity Hypergraphs
Podder, Rakesh, Caglar, Turgay, Bashir, Shadaab Kawnain, Sreedharan, Sarath, Ray, Indrajit, Ray, Indrakshi
Graph-based frameworks are often used in network hardening to help a cyber defender understand how a network can be attacked and how the best defenses can be deployed. However, incorporating network connectivity parameters in the attack graph, reasoning about the attack graph when we do not have access to complete information, providing system administrator suggestions in an understandable format, and allowing them to do what-if analysis on various scenarios and attacker motives is still missing. We fill this gap by presenting SPEAR, a formal framework with tool support for security posture evaluation and analysis that keeps human-in-the-loop. SPEAR uses the causal formalism of AI planning to model vulnerabilities and configurations in a networked system. It automatically converts network configurations and vulnerability descriptions into planning models expressed in the Planning Domain Definition Language (PDDL). SPEAR identifies a set of diverse security hardening strategies that can be presented in a manner understandable to the domain expert. These allow the administrator to explore the network hardening solution space in a systematic fashion and help evaluate the impact and compare the different solutions.
- Europe > Austria > Vienna (0.14)
- North America > United States > New York > Suffolk County > Stony Brook (0.05)
- North America > United States > Colorado > Larimer County > Fort Collins (0.04)
- (12 more...)
AI-Driven Climate Policy Scenario Generation for Sub-Saharan Africa
Badekale, Rafiu Adekoya, Akinfaderin, Adewale
Climate policy scenario generation and evaluation have traditionally relied on integrated assessment models (IAMs) and expert-driven qualitative analysis. These methods enable stakeholders, such as policymakers and researchers, to anticipate impacts, plan governance strategies, and develop mitigation measures. However, traditional methods are often time-intensive, reliant on simple extrapolations of past trends, and limited in capturing the complex and interconnected nature of energy and climate issues. With the advent of artificial intelligence (AI), particularly generative AI models trained on vast datasets, these limitations can be addressed, ensuring robustness even under limited data conditions. In this work, we explore the novel method that employs generative AI, specifically large language models (LLMs), to simulate climate policy scenarios for Sub-Saharan Africa. These scenarios focus on energy transition themes derived from the historical United Nations Climate Change Conference (COP) documents. By leveraging generative models, the project aims to create plausible and diverse policy scenarios that align with regional climate goals and energy challenges. Given limited access to human evaluators, automated techniques were employed for scenario evaluation. We generated policy scenarios using the llama3.2-3B model. Of the 34 generated responses, 30 (88%) passed expert validation, accurately reflecting the intended impacts provided in the corresponding prompts. We compared these validated responses against assessments from a human climate expert and two additional LLMs (gemma2-2B and mistral-7B). Our structured, embedding-based evaluation framework shows that generative AI effectively generate scenarios that are coherent, relevant, plausible, and diverse. This approach offers a transformative tool for climate policy planning in data-constrained regions.
- Africa > Sub-Saharan Africa (0.66)
- Europe > Austria > Vienna (0.14)
- North America > Mexico > Mexico City > Mexico City (0.04)
- (13 more...)
- Law > Environmental Law (1.00)
- Government (1.00)
- Energy > Renewable (1.00)
- (2 more...)