AITopics

Speech summarization has become an essential tool for efficiently managing and accessing the growing volume of spoken and audiovisual content. However, despite its increasing importance, speech summarization remains loosely defined. The field intersects with several research areas, including speech recognition, text summarization, and specific applications like meeting summarization. This survey not only examines existing datasets and evaluation protocols, which are crucial for assessing the quality of summarization approaches, but also synthesizes recent developments in the field, highlighting the shift from traditional systems to advanced models like fine-tuned cascaded architectures and end-to-end solutions. In doing so, we surface the ongoing challenges, such as the need for realistic evaluation benchmarks, multilingual datasets, and long-context handling.

computational linguistic, large language model, machine learning, (20 more...)

2504.08024

Country:

Europe (1.00)
Asia > Middle East > UAE (0.46)
North America > United States > Maryland (0.28)

Genre:

Research Report (1.00)
Overview (1.00)

Industry:

Media (1.00)
Health & Medicine (1.00)
Education (1.00)
Leisure & Entertainment (0.92)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(5 more...)

Shakhadri, Syed Abdul Gaffar, KR, Kruthika, Angadi, Kartik Basavaraj

Shakti-VLMs: Scalable Vision-Language Models for Enterprise AI

We introduce Shakti VLM, a family of vision-language models in the capacity of 1B and 4B parameters designed to address data efficiency challenges in multimodal learning. While recent VLMs achieve strong performance through extensive training data, Shakti models leverage architectural innovations to attain competitive results with fewer tokens. Key advancements include QK-Normalization for attention stability, hybrid normalization techniques, and enhanced positional encoding. A three-stage training strategy further optimizes learning efficiency. Evaluations show that Shakti-Shakti-VLM-1B and Shakti-VLM-4B excel in document understanding, Visual Reasoning, OCR extraction, and general multimodal reasoning. Our results highlight that high performance can be achieved through model design and training strategy rather than sheer data volume, making Shakti an efficient solution for enterprise-scale multimodal tasks.

artificial intelligence, machine learning, natural language, (17 more...)

2502.17092

Genre: Research Report > New Finding (0.34)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)

PolySkill: Learning Generalizable Skills Through Polymorphic Abstraction

Yu, Simon, Li, Gang, Shi, Weiyan, Qi, Peng

Large language models (LLMs) are moving beyond static uses and are now powering agents that learn continually during their interaction with external environments. For example, agents can learn reusable skills while navigating web pages or toggling new tools. However, existing methods for skill learning often create skills that are over-specialized to a single website and fail to generalize. We introduce PolySkill, a new framework that enables agents to learn generalizable and compositional skills. The core idea, inspired by polymorphism in software engineering, is to decouple a skill's abstract goal (what it accomplishes) and its concrete implementation (how it is executed). Experiments show that our method (1) improves skill reuse by 1.7x on seen websites and (2) boosts success rates by up to 9.4% on Mind2Web and 13.9% on unseen websites, while reducing steps by over 20%. (3) In self-exploration settings without specified tasks, our framework improves the quality of proposed tasks and enables agents to learn generalizable skills that work across different sites. By enabling the agent to identify and refine its own goals, the PolySkill enhances the agent's ability to learn a better curriculum, leading to the acquisition of more generalizable skills compared to baseline methods. This work provides a practical path toward building agents capable of continual learning in adaptive environments. Our findings show that separating a skill's goal from its execution is a crucial step toward developing autonomous agents that can learn and generalize across the open web continuously.

large language model, machine learning, natural language, (21 more...)

2510.15863

Genre: Research Report > New Finding (1.00)

Industry:

Retail (1.00)
Education (1.00)
Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(2 more...)

LLMs Judge Themselves: A Game-Theoretic Framework for Human-Aligned Evaluation

Yang, Gao, Liu, Yuhang, Miao, Siyu, Liang, Xinyue, Liu, Zhengyang, Huang, Heyan

Ideal or real - that is the question.In this work, we explore whether principles from game theory can be effectively applied to the evaluation of large language models (LLMs). This inquiry is motivated by the growing inadequacy of conventional evaluation practices, which often rely on fixed-format tasks with reference answers and struggle to capture the nuanced, subjective, and open-ended nature of modern LLM behavior. To address these challenges, we propose a novel alternative: automatic mutual evaluation, where LLMs assess each other's output through self-play and peer review. These peer assessments are then systematically compared with human voting behavior to evaluate their alignment with human judgment. Our framework incorporates game-theoretic voting algorithms to aggregate peer reviews, enabling a principled investigation into whether model-generated rankings reflect human preferences. Empirical results reveal both convergences and divergences between theoretical predictions and human evaluations, offering valuable insights into the promises and limitations of mutual evaluation. To the best of our knowledge, this is the first work to jointly integrate mutual evaluation, game-theoretic aggregation, and human-grounded validation for evaluating the capabilities of LLMs.

artificial intelligence, large language model, natural language, (19 more...)

2510.15746

Country:

North America > United States (0.68)
Asia (0.67)
Europe (0.46)

Genre: Research Report > New Finding (1.00)

Industry:

Government (0.48)
Education (0.46)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Alinejad, Mahyar, Velasquez, Alvaro, Wang, Yue, Atia, George

RLAF: Reinforcement Learning from Automaton Feedback

Reinforcement Learning (RL) in environments with complex, history-dependent reward structures poses significant challenges for traditional methods. In this work, we introduce a novel approach that leverages automaton-based feedback to guide the learning process, replacing explicit reward functions with preferences derived from a deterministic finite automaton (DFA). Unlike conventional approaches that use automata for direct reward specification, our method employs the structure of the DFA to generate preferences over trajectories that are used to learn a reward function, eliminating the need for manual reward engineering. Our framework introduces a static approach that uses the learned reward function directly for policy optimization and a dynamic approach that involves continuous refining of the reward function and policy through iterative updates until convergence. Our experiments in both discrete and continuous environments demonstrate that our approach enables the RL agent to learn effective policies for tasks with temporal dependencies, outperforming traditional reward engineering and automaton-based baselines such as reward machines and LTL-guided methods. Our results highlight the advantages of automaton-based preferences in handling non-Markovian rewards, offering a scalable, efficient, and human-independent alternative to traditional reward modeling. We also provide a convergence guarantee showing that under standard assumptions our automaton-guided preference-based framework learns a policy that is near-optimal with respect to the true non-Markovian objective.

machine learning, reinforcement learning, trajectory, (15 more...)

2510.15728

Country: North America > United States (0.93)

Genre: Research Report > New Finding (0.66)

Industry:

Leisure & Entertainment (0.46)
Transportation (0.46)
Education (0.46)
(2 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Ortega, Tomas, Jafarkhani, Hamid

Decentralized Parameter-Free Online Learning

We propose the first parameter-free decentralized online learning algorithms with network regret guarantees, which achieve sublinear regret without requiring hyperparameter tuning. This family of algorithms connects multi-agent coin-betting and decentralized online learning via gossip steps. To enable our decentralized analysis, we introduce a novel "betting function" formulation for coin-betting that simplifies the multi-agent regret analysis. Our analysis shows sublinear network regret bounds and is validated through experiments on synthetic and real datasets. This family of algorithms is applicable to distributed sensing, decentralized optimization, and collaborative ML applications.

algorithm, artificial intelligence, machine learning, (16 more...)

2510.15644

Country: North America > United States > California (0.28)

Genre: Research Report (0.82)

Industry: Education > Educational Setting > Online (0.91)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Enterprise Applications > Human Resources > Learning Management (0.91)

Lepora, Jared K., Li, Haoran, Psomopoulou, Efi, Lepora, Nathan F.

Educational SoftHand-A: Building an Anthropomorphic Hand with Soft Synergies using LEGO MINDSTORMS

Abstract-- This paper introduces an anthropomorphic robot hand built entirely using LEGO MINDSTORMS: the Educational SoftHand-A, a tendon-driven, highly-underactuated robot hand based on the Pisa/IIT SoftHand and related hands. T o be suitable for an educational context, the design is constrained to use only standard LEGO pieces with tests using common equipment available at home. The hand features dual motors driving an agonist/antagonist opposing pair of tendons on each finger, which are shown to result in reactive fine control. The finger motions are synchonized through soft synergies, implemented with a differential mechanism using clutch gears. Altogether, this design results in an anthropomorphic hand that can adaptively grasp a broad range of objects using a simple actuation and control mechanism. Since the hand can be constructed from LEGO pieces and uses state-of-the-art design concepts for robotic hands, it has the potential to educate and inspire children to learn about the frontiers of modern robotics.

artificial intelligence, educational softhand-a, tendon, (14 more...)

2510.15638

Genre: Research Report (0.50)

Industry: Education > Educational Setting (0.46)

Technology: Information Technology > Artificial Intelligence > Robots > Manipulation (0.93)

KITE: A Benchmark for Evaluating Korean Instruction-Following Abilities in Large Language Models

Kim, Dongjun, Park, Chanhee, Park, Chanjun, Lim, Heuiseok

The instruction-following capabilities of large language models (LLMs) are pivotal for numerous applications, from conversational agents to complex reasoning systems. However, current evaluations predominantly focus on English models, neglecting the linguistic and cultural nuances of other languages. Specifically, Korean, with its distinct syntax, rich morphological features, honorific system, and dual numbering systems, lacks a dedicated benchmark for assessing open-ended instruction-following capabilities. To address this gap, we introduce the Korean Instruction-following Task Evaluation (KITE), a comprehensive benchmark designed to evaluate both general and Korean-specific instructions. Unlike existing Korean benchmarks that focus mainly on factual knowledge or multiple-choice testing, KITE directly targets diverse, open-ended instruction-following tasks. Our evaluation pipeline combines automated metrics with human assessments, revealing performance disparities across models and providing deeper insights into their strengths and weaknesses. By publicly releasing the KITE dataset and code, we aim to foster further research on culturally and linguistically inclusive LLM development and inspire similar endeavors for other underrepresented languages.

artificial intelligence, large language model, natural language, (17 more...)

2510.15558

Genre: Research Report > New Finding (0.68)

Industry: Education (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Revisiting Knowledge Distillation: The Hidden Role of Dataset Size

Lanzillotta, Giulia, Sarnthein, Felix, Kur, Gil, Hofmann, Thomas, He, Bobby

The concept of knowledge distillation (KD) describes the training of a student model from a teacher model and is a widely adopted technique in deep learning. However, it is still not clear how and why distillation works. Previous studies focus on two central aspects of distillation: model size, and generalisation. In this work we study distillation in a third dimension: dataset size. We present a suite of experiments across a wide range of datasets, tasks and neural architectures, demonstrating that the effect of distillation is not only preserved but amplified in low-data regimes. We call this newly discovered property the data efficiency of distillation. Equipped with this new perspective, we test the predictive power of existing theories of KD as we vary the dataset size. Our results disprove the hypothesis that distillation can be understood as label smoothing, and provide further evidence in support of the dark knowledge hypothesis. Finally, we analyse the impact of modelling factors such as the objective, scale and relative number of samples on the observed phenomenon. Ultimately, this work reveals that the dataset size may be a fundamental but overlooked variable in the mechanisms underpinning distillation.

distillation, machine learning, natural language, (19 more...)

2510.15516

Country: Europe > Switzerland (0.28)

Genre: Research Report > New Finding (1.00)

Industry: Education (0.89)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

Kang, Shijia, Zhang, Muhan

The Road Less Traveled: Enhancing Exploration in LLMs via Sequential Sampling

Reinforcement learning (RL) has been pivotal in enhancing the reasoning capabilities of large language models (LLMs), but it often suffers from limited exploration and entropy collapse, where models exploit a narrow set of solutions, leading to a loss of sampling diversity and subsequently preventing RL from further improving performance. This issue is exacerbated in parallel sampling methods, where multiple outputs are drawn from the same distribution, potentially causing the model to converge to similar solutions. We propose SESA, a novel SEquential SAmpling framework that mitigates this challenge by generating diverse solution sketches sequentially before expanding them into full reasoning paths. This approach ensures broader exploration by conditioning each new output on previous ones, promoting diversity throughout the process and preventing policy collapse. Our experiments on a synthetic task show that sequential sampling consistently outperforms traditional RL methods in terms of path diversity and recovery from collapse. Further evaluations on real-world tasks demonstrate that SESA improves both the exploration of valid strategies and the overall performance of LLMs. On three agent benchmarks, SESA lifts success rates by $+0.25$, $+0.42$, and $+0.07$ absolute over the base model (up to an additional $211\%$ relative improvement over baseline RL), underscoring its exploration advantage. This work introduces a structured approach to exploration, paving the way for more effective and diverse reasoning in RL-trained LLMs. Our code is released at https://github.com/MuLabPKU/sesa.

large language model, machine learning, natural language, (19 more...)

2510.15502

Country:

Asia > China > Guangdong Province (0.29)
Asia > China > Jiangsu Province (0.28)

Genre: Research Report > New Finding (0.68)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)