AITopics

Genre: Research Report (0.62)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Neural Information Processing SystemsFeb-15-2026, 13:35:50 GMT

688c7a82e31653e7c256c6c29fd3b438-Paper-Conference.pdf

arxiv preprint arxiv, large language model, machine learning, (18 more...)

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
North America > United States > Virginia (0.04)
Asia > Myanmar > Tanintharyi Region > Dawei (0.04)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.93)

Industry:

Health & Medicine > Consumer Health (1.00)
Education (1.00)
Information Technology (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.96)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Communications (0.70)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.67)

Neural Information Processing SystemsFeb-8-2026, 03:46:22 GMT

124256ed80af5d4bf4c4de17b66c4298-Paper-Conference.pdf

gdpo, graph generation, trajectory, (14 more...)

Country:

Asia > China > Beijing > Beijing (0.04)
Asia > Singapore (0.04)
Asia > Middle East > Jordan (0.04)
Africa > Zambia > Southern Province > Choma (0.04)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.67)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(3 more...)

arXiv.org Artificial IntelligenceNov-18-2025

MetaGDPO: Alleviating Catastrophic Forgetting with Metacognitive Knowledge through Group Direct Preference Optimization

Zhang, Lanxue, Xie, Yuqiang, Fang, Fang, Dong, Fanglong, Liu, Rui, Cao, Yanan

Large Language Models demonstrate strong reasoning capabilities, which can be effectively compressed into smaller models. However, existing datasets and fine-tuning approaches still face challenges that lead to catastrophic forgetting, particularly for models smaller than 8B. First, most datasets typically ignore the relationship between training data knowledge and the model's inherent abilities, making it difficult to preserve prior knowledge. Second, conventional training objectives often fail to constrain inherent knowledge preservation, which can result in forgetting of previously learned skills. To address these issues, we propose a comprehensive solution that alleviates catastrophic forgetting from both the data and fine-tuning approach perspectives. On the data side, we construct a dataset of 5K instances that covers multiple reasoning tasks and incorporates metacognitive knowledge, making it more tolerant and effective for distillation into smaller models. We annotate the metacognitive knowledge required to solve each question and filter the data based on task knowledge and the model's inherent skills. On the training side, we introduce GDPO (Group Direction Preference Optimization), which is better suited for resource-limited scenarios and can efficiently approximate the performance of GRPO. Guided by the large model and by implicitly constraining the optimization path through a reference model, GDPO enables more effective knowledge transfer from the large model and constrains excessive parameter drift. Extensive experiments demonstrate that our approach significantly alleviates catastrophic forgetting and improves reasoning performance on smaller models.

large language model, machine learning, natural language, (19 more...)

2511.12113

Country:

Asia (1.00)
North America > United States > Minnesota (0.28)

Genre: Research Report > New Finding (0.46)

Industry:

Education (1.00)
Information Technology > Security & Privacy (0.92)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Neural Information Processing SystemsOct-10-2025, 04:51:53 GMT

Geometric-A veraged Preference Optimization for Soft Preference Labels

However, human preferences can vary across individuals, and therefore should be represented distributionally.

arxiv preprint arxiv, dataset, preference label, (13 more...)

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
North America > United States > Virginia (0.04)
Asia > Myanmar > Tanintharyi Region > Dawei (0.04)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.93)

Industry:

Health & Medicine > Consumer Health (1.00)
Education (1.00)
Information Technology (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.96)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Communications (0.70)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.67)

Neural Information Processing SystemsOct-9-2025, 18:58:04 GMT

124256ed80af5d4bf4c4de17b66c4298-Paper-Conference.pdf

gdpo, graph generation, trajectory, (14 more...)

Country:

Asia > China > Beijing > Beijing (0.04)
Asia > Singapore (0.04)
Asia > Middle East > Jordan (0.04)
Africa > Zambia > Southern Province > Choma (0.04)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.67)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(3 more...)

arXiv.org Artificial IntelligenceAug-12-2025

Topology Generation of UAV Covert Communication Networks: A Graph Diffusion Approach with Incentive Mechanism

Tang, Xin, Chen, Qian, Li, Fengshun, Gong, Youchun, Liu, Yinqiu, Tian, Wen, Qin, Shaowen, Li, Xiaohuan

With the growing demand for Uncrewed Aerial Vehicle (UAV) networks in sensitive applications, such as urban monitoring, emergency response, and secure sensing, ensuring reliable connectivity and covert communication has become increasingly vital. However, dynamic mobility and exposure risks pose significant challenges. To tackle these challenges, this paper proposes a self-organizing UAV network framework combining Graph Diffusion-based Policy Optimization (GDPO) with a Stackelberg Game (SG)-based incentive mechanism. The GDPO method uses generative AI to dynamically generate sparse but well-connected topologies, enabling flexible adaptation to changing node distributions and Ground User (GU) demands. Meanwhile, the Stackelberg Game (SG)-based incentive mechanism guides self-interested UAVs to choose relay behaviors and neighbor links that support cooperation and enhance covert communication. Extensive experiments are conducted to validate the effectiveness of the proposed framework in terms of model convergence, topology generation quality, and enhancement of covert communication performance.

communication, machine learning, reinforcement learning, (18 more...)

2508.06746

Country: Asia > China (0.29)

Genre: Research Report (0.50)

Technology:

Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles > Drones (0.68)

Neural Information Processing SystemsMay-26-2025, 16:38:02 GMT

Graph Diffusion Policy Optimization

artificial intelligence, machine learning, reinforcement learning, (5 more...)

Genre: Research Report (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.31)

arXiv.org Artificial IntelligenceDec-28-2024

No Preference Left Behind: Group Distributional Preference Optimization

Yao, Binwei, Cai, Zefan, Chuang, Yun-Shiuan, Yang, Shanglin, Jiang, Ming, Yang, Diyi, Hu, Junjie

Preferences within a group of people are not uniform but follow a distribution. While existing alignment methods like Direct Preference Optimization (DPO) attempt to steer models to reflect human preferences, they struggle to capture the distributional pluralistic preferences within a group. These methods often skew toward dominant preferences, overlooking the diversity of opinions, especially when conflicting preferences arise. To address this issue, we propose Group Distribution Preference Optimization (GDPO), a novel framework that aligns language models with the distribution of preferences within a group by incorporating the concept of beliefs that shape individual preferences. GDPO calibrates a language model using statistical estimation of the group's belief distribution and aligns the model with belief-conditioned preferences, offering a more inclusive alignment framework than traditional methods. In experiments using both synthetic controllable opinion generation and real-world movie review datasets, we show that DPO fails to align with the targeted belief distributions, while GDPO consistently reduces this alignment gap during training. Moreover, our evaluation metrics demonstrate that GDPO outperforms existing approaches in aligning with group distributional preferences, marking a significant advance in pluralistic alignment.

large language model, machine learning, natural language, (17 more...)

2412.20299

Country: North America > United States (1.00)

Genre: Research Report > New Finding (1.00)

Industry:

Media > Film (1.00)
Leisure & Entertainment (1.00)
Government (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.74)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.51)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.50)

Kwon, Oh Joon, Matsunaga, Daiki E., Kim, Kee-Eung

GDPO: Learning to Directly Align Language Models with Diversity Using GFlowNets

arXiv.org Artificial IntelligenceOct-19-2024

A critical component of the current generation of language models is preference alignment, which aims to precisely control the model's behavior to meet human needs and values. The most notable among such methods is Reinforcement Learning with Human Feedback (RLHF) and its offline variant Direct Preference Optimization (DPO), both of which seek to maximize a reward model based on human preferences. In particular, DPO derives reward signals directly from the offline preference data, but in doing so overfits the reward signals and generates suboptimal responses that may contain human biases in the dataset. In this work, we propose a practical application of a diversity-seeking RL algorithm called GFlowNet-DPO (GDPO) in an offline preference alignment setting to curtail such challenges. Empirical results show GDPO can generate far more diverse responses than the baseline methods that are still relatively aligned with human values in dialog generation and summarization tasks.

machine learning, natural language, password, (16 more...)

2410.15096

Country:

North America > United States > Maryland > Baltimore (0.04)
North America > United States > Colorado > La Plata County (0.04)
Europe > Germany (0.04)
(4 more...)

Genre: Research Report > New Finding (0.87)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Health & Medicine > Consumer Health (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)