AITopics | policy model

Collaborating Authors

policy model

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

MetaAligner: Towards Generalizable Multi-Objective Alignment of Language Models

Neural Information Processing SystemsMar-19-2026, 17:02:17 GMT

Recent advancements in large language models (LLMs) focus on aligning to heterogeneous human expectations and values via multi-objective preference alignment. However, existing methods are dependent on the policy model parameters, which require high-cost repetition of their alignment algorithms for each new policy model, and they cannot expand to unseen objectives due to their static alignment objectives. In this work, we propose Meta-Objective Aligner (MetaAligner), the first policy-agnostic and generalizable method for multi-objective preference alignment.MetaAligner models multi-objective alignment into three stages: (1) dynamic objectives reformulation algorithm reorganizes traditional alignment datasets to supervise the model on performing flexible alignment across different objectives; (2) conditional weak-to-strong correction paradigm aligns the weak outputs of fixed policy models to approach strong outputs with higher preferences in the corresponding alignment objectives, enabling plug-and-play inferences on any policy models, which significantly reduces training costs and facilitates alignment on close-source policy models; (3) generalizable inference method flexibly adjusts target objectives by updating their text descriptions in the prompts, facilitating generalizable alignment to unseen objectives.Experimental results show that MetaAligner achieves significant and balanced improvements in multi-objective alignments on 10 state-of-the-art policy models, and saves up to 93.63% of GPU training hours compared to previous alignment methods.

artificial intelligence, large language model, natural language, (11 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.59)

Add feedback

fad8962279154544ed69bb63eb14d677-Paper-Conference.pdf

Neural Information Processing SystemsFeb-18-2026, 19:08:28 GMT

arxiv preprint arxiv, diffusion model, experiment, (14 more...)

Neural Information Processing Systems

Country:

Asia > China > Shanghai > Shanghai (0.04)
Asia > China > Hong Kong (0.04)

Genre: Research Report > Experimental Study (0.93)

Industry:

Energy (0.69)
Media (0.46)
Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Add feedback

Guiding Large Language Models via Directional Stimulus Prompting

Neural Information Processing SystemsFeb-17-2026, 00:15:19 GMT

T5) to generate an auxiliary directional stimulus prompt for each input instance.

arxiv preprint arxiv, large language model, machine learning, (17 more...)

Neural Information Processing Systems

Country: North America > United States > California > Santa Barbara County > Santa Barbara (0.04)

Genre: Research Report (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.97)

Add feedback

404df2480b6eef0486a1679e371894b0-Paper-Conference.pdf

Neural Information Processing SystemsFeb-11-2026, 20:03:05 GMT

large language model, machine learning, natural language, (21 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.04)
Asia > Middle East > Jordan (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

3d03800841fa1bb2f43ef1750aafcce4-Paper-Conference.pdf

Neural Information Processing SystemsFeb-11-2026, 13:45:25 GMT

large language model, machine learning, natural language, (21 more...)

Neural Information Processing Systems

Country:

Asia > Japan (0.04)
Asia > China > Shaanxi Province > Xi'an (0.04)
Asia > Vietnam (0.04)
(4 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.93)

Industry: Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
(2 more...)

Add feedback

30dfe47a3ccbee68cffa0c19ccb1bc00-Paper-Conference.pdf

Neural Information Processing SystemsFeb-10-2026, 17:46:01 GMT

large language model, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Country: Asia > Thailand > Bangkok > Bangkok (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.93)
Workflow (0.93)

Industry:

Leisure & Entertainment > Games (0.93)
Information Technology (0.67)
Education (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)
(2 more...)

Add feedback

daed210307f1dbc6f1dd9551408d999f-Paper.pdf

Neural Information Processing SystemsFeb-10-2026, 17:29:52 GMT

international conference, mri, reconstruction, (16 more...)

Neural Information Processing Systems

Country:

Europe > Netherlands > North Holland > Amsterdam (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > Canada (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine > Diagnostic Medicine > Imaging (0.94)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

27b51baca8377a0cf109f6ecc15a0f70-Supplemental-Conference.pdf

Neural Information Processing SystemsFeb-8-2026, 00:02:25 GMT

artificial intelligence, machine learning, trajectory, (18 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.70)

Add feedback

Feint Behaviors and Strategies: Formalization, Implementation and Evaluation

Neural Information Processing SystemsFeb-7-2026, 10:43:46 GMT

Feint behaviors refer to a set of deceptive behaviors in a nuanced manner, which enable players to obtain temporal and spatial advantages over opponents in competitive games.

machine learning, natural language, reinforcement learning, (19 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Los Angeles County > Long Beach (0.14)
Europe > Sweden > Stockholm > Stockholm (0.04)
Oceania > New Zealand > North Island > Auckland Region > Auckland (0.04)
(9 more...)

Genre: Research Report > Experimental Study (1.00)

Industry: Leisure & Entertainment > Games > Computer Games (0.67)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.94)
Information Technology > Artificial Intelligence > Natural Language (0.67)

Add feedback

Guiding Large Language Models via Directional Stimulus Prompting

Neural Information Processing SystemsDec-26-2025, 17:45:36 GMT

We introduce Directional Stimulus Prompting, a novel framework for guiding black-box large language models (LLMs) towards specific desired outputs. Instead of directly adjusting LLMs, our method employs a small tunable policy model (e.g., T5) to generate an auxiliary directional stimulus prompt for each input instance. These directional stimulus prompts act as nuanced, instance-specific hints and clues to guide LLMs in generating desired outcomes, such as including specific keywords in the generated summary. Our approach sidesteps the challenges of direct LLM tuning by optimizing the policy model to explore directional stimulus prompts that align LLMs with desired behaviors. The policy model can be optimized through 1) supervised fine-tuning using labeled data and 2) reinforcement learning from offline or online rewards based on the LLM's output. We evaluate our method across various tasks, including summarization, dialogue response generation, and chain-of-thought reasoning. Our experiments indicate a consistent improvement in the performance of LLMs such as ChatGPT, Codex, and InstructGPT on these supervised tasks with minimal labeled data. Remarkably, by utilizing merely 80 dialogues from the MultiWOZ dataset, our approach boosts ChatGPT's performance by a relative 41.4%, achieving or exceeding the performance of some fully supervised state-of-the-art models. Moreover, the instance-specific chain-of-thought prompt generated through our method enhances InstructGPT's reasoning accuracy, outperforming both generalized human-crafted prompts and those generated through automatic prompt engineering.

directional stimulus prompting, language model, name change, (5 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback