AITopics | generation policy

Collaborating Authors

generation policy

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

StreamUni: Achieving Streaming Speech Translation with a Unified Large Speech-Language Model

Guo, Shoutao, Li, Xiang, Liu, Mengge, Chen, Wei, Feng, Yang

arXiv.org Artificial IntelligenceJul-15-2025

Streaming speech translation (StreamST) requires determining appropriate timing, known as policy, to generate translations while continuously receiving source speech inputs, balancing low latency with high translation quality. However, existing StreamST methods typically operate on sentence-level speech segments, referred to as simultaneous speech translation (SimulST). In practice, they require collaboration with segmentation models to accomplish StreamST, where the truncated speech segments constrain SimulST models to make policy decisions and generate translations based on limited contextual information. Moreover, SimulST models struggle to learn effective policies due to the complexity of speech inputs and cross-lingual generation. To address these challenges, we propose StreamUni, which achieves StreamST through a unified Large Speech-Language Model (LSLM). Specifically, StreamUni incorporates speech Chain-of-Thought (CoT) in guiding the LSLM to generate multi-stage outputs. Leveraging these multi-stage outputs, StreamUni simultaneously accomplishes speech segmentation, policy decision, and translation generation, completing StreamST without requiring massive policy-specific training. Additionally, we propose a streaming CoT training method that enhances low-latency policy decisions and generation capabilities using limited CoT data. Experiments demonstrate that our approach achieves state-of-the-art performance on StreamST tasks.

artificial intelligence, natural language, translation, (19 more...)

arXiv.org Artificial Intelligence

2507.07803

Country:

Asia (1.00)
North America > United States (0.28)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback

Self-Evolved Preference Optimization for Enhancing Mathematical Reasoning in Small Language Models

Singh, Joykirat, Chakraborty, Tanmoy, Nambi, Akshay

arXiv.org Artificial IntelligenceMar-4-2025

Large language models (LLMs) have significantly improved their reasoning capabilities; however, they still struggle with complex multi-step mathematical problem-solving due to error propagation, lack of self-correction, and limited adaptability to diverse reasoning styles. Existing methods rely on static fine-tuning or prompt engineering, which fail to generalize across problem complexities, while the scarcity of high-quality preference data further hinders reliable reasoning. We introduce SPHERE, a self-evolving data generation pipeline that enhances reasoning in small language models (SLMs) by iteratively generating, correcting, and diversifying reasoning chains. SPHERE operates in three stages: (i) Self-Generation, where the model autonomously constructs problem-solving steps; (ii) Self-Correction, enabling it to identify and rectify errors; and (iii) Diversity Induction, improving robustness through multiple valid reasoning trajectories. This self-evolution mechanism strengthens mathematical reasoning and enhances model reliability. Evaluations on MATH 500, GSM8K, AIME, AMC, and Olympiad show that SPHERE-trained models achieve significant gains over their base versions and match/surpass GPT-4o on certain benchmarks. Our findings demonstrate that self-evolving models can close the reasoning gap between SLMs and state-of-the-art LLMs, making mathematical AI more reliable, scalable, and efficient.

qwen2, reasoning, zhang, (16 more...)

arXiv.org Artificial Intelligence

2503.04813

Country:

Africa > Middle East > Egypt > Giza Governorate > Giza (0.04)
Europe > Monaco (0.04)
Asia > India > NCT > Delhi (0.04)

Genre: Research Report > New Finding (0.86)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.37)

Add feedback

Edge Generation Scheduling for DAG Tasks Using Deep Reinforcement Learning

Sun, Binqi, Theile, Mirco, Qin, Ziyuan, Bernardini, Daniele, Roy, Debayan, Bastoni, Andrea, Caccamo, Marco

arXiv.org Artificial IntelligenceJan-10-2024

Directed acyclic graph (DAG) tasks are currently adopted in the real-time domain to model complex applications from the automotive, avionics, and industrial domains that implement their functionalities through chains of intercommunicating tasks. This paper studies the problem of scheduling real-time DAG tasks by presenting a novel schedulability test based on the concept of trivial schedulability. Using this schedulability test, we propose a new DAG scheduling framework (edge generation scheduling -- EGS) that attempts to minimize the DAG width by iteratively generating edges while guaranteeing the deadline constraint. We study how to efficiently solve the problem of generating edges by developing a deep reinforcement learning algorithm combined with a graph representation neural network to learn an efficient edge generation policy for EGS. We evaluate the effectiveness of the proposed algorithm by comparing it with state-of-the-art DAG scheduling heuristics and an optimal mixed-integer linear programming baseline. Experimental results show that the proposed algorithm outperforms the state-of-the-art by requiring fewer processors to schedule the same DAG tasks. The code is available at https://github.com/binqi-sun/egs.

dag task, node, processor, (15 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/TC.2024.3350243

2308.14647

Country:

North America > United States > California > Alameda County > Berkeley (0.14)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
Europe > Italy (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.48)

Industry: Transportation > Air (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

NeSIG: A Neuro-Symbolic Method for Learning to Generate Planning Problems

Núñez-Molina, Carlos, Mesejo, Pablo, Fernández-Olivares, Juan

arXiv.org Artificial IntelligenceJan-24-2023

In the field of Automated Planning there is often the need for a set of planning problems from a particular domain, e.g., to be used as training data for Machine Learning or as benchmarks in planning competitions. In most cases, these problems are created either by hand or by a domain-specific generator, putting a burden on the human designers. In this paper we propose NeSIG, to the best of our knowledge the first domain-independent method for automatically generating planning problems that are valid, diverse and difficult to solve. We formulate problem generation as a Markov Decision Process and train two generative policies with Deep Reinforcement Learning to generate problems with the desired properties. We conduct experiments on several classical domains, comparing our method with handcrafted domain-specific generators that generate valid and diverse problems but do not optimize difficulty. The results show NeSIG is able to automatically generate valid problems of greater difficulty than the competitor approaches, while maintaining good diversity.

artificial intelligence, atom, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2301.1028

Country: Europe > Spain > Andalusia > Granada Province > Granada (0.04)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Add feedback

Hybrid Adversarial Inverse Reinforcement Learning

Yuan, Mingqi, Pun, Man-On, Chen, Yi, Cao, Qi

arXiv.org Artificial IntelligenceFeb-4-2021

In this paper, we investigate the problem of the inverse reinforcement learning (IRL), especially the beyond-demonstrator (BD) IRL. The BD-IRL aims to not only imitate the expert policy but also extrapolate BD policy based on finite demonstrations of the expert. Currently, most of the BD-IRL algorithms are two-stage, which first infer a reward function then learn the policy via reinforcement learning (RL). Because of the two separate procedures, the two-stage algorithms have high computation complexity and lack robustness. To overcome these flaw, we propose a BD-IRL framework entitled hybrid adversarial inverse reinforcement learning (HAIRL), which successfully integrates the imitation and exploration into one procedure. The simulation results show that the HAIRL is more efficient and robust when compared with other similar state-of-the-art (SOTA) algorithms.

algorithm, generation policy, reward function, (15 more...)

arXiv.org Artificial Intelligence

2102.02454

Country:

Asia > China > Guangdong Province > Shenzhen (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)
Asia > China > Hong Kong (0.04)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback