AITopics | cbf-llm

Collaborating Authors

cbf-llm

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Control Barrier Function for Aligning Large Language Models

Miyaoka, Yuya, Inoue, Masaki

arXiv.org Artificial IntelligenceNov-7-2025

Abstract--This paper proposes a control-based framework for aligning large language models (LLMs) by leveraging a control barrier function (CBF) to ensure user-desirable text generation. The presented framework applies the CBF safety filter to the predicted token generated from the baseline LLM, to intervene in the generated text. The safety filter includes two significant advantages: this safety filter is an add-on type, allowing it to be used for alignment purposes without fine-tuning the baseline LLM, and if there is an evaluation model regarding the desired alignment, it can be directly applied to the filter design. The overall text-generation system is implemented with open-source language models, aiming to generate positive text. I. Introduction While large language models (LLMs) are known to have strong language understanding, reasoning and writing abilities, they can also generate harmful, biased, toxic, or unethical content [1], [2]. Alignment of LLMs ensures that they generate content that is "desirable" for user, meaning that the content is ethical and safe. Various approaches for LLM alignment have been presented (see the literature [1], [2], [3] and reference therein). The major approach to LLM alignment is reinforcement learning from human feedback (RLHF, [4]), where a reward model is constructed by human feedback and then used for the training of LLMs. Variants of RLHF methods are also proposed, such as Safe-RLHF by [5], SENSEI by [6], and f-DPG by [7], and their implementations are presented, such as training pre-trained LLMs [8], [9]. Collecting human feedback with data is time-consuming and expensive.

large language model, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2511.03121

Country: Asia (0.93)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

CBF-LLM: Safe Control for LLM Alignment

Miyaoka, Yuya, Inoue, Masaki

arXiv.org Artificial IntelligenceAug-28-2024

While large language models (LLMs) are known to have strong language understanding and generation abilities, they can also generate harmful, biased, and toxic content [1][2]. Alignment of LLMs ensures that they generate content that is "desirable" for the user, typically meaning content that is safe and ethical. Various approaches for LLM alignment have been presented ([1], [2], [3] and reference therein). The major approach to the alignment is reinforcement learning from human feedback (RLHF) [4], where a reward model is constructed by human feedback and used for the training of LLMs. Variants of RLHF architectures are also proposed, such as Safe-RLHF [5], SENSEI [6], and f-DPG [7], and their implementations are presented, such as training pre-trained LLMs [8][9], and applications like information-seeking chatbot [10].

cbf-llm, good researcher, llm, (13 more...)

arXiv.org Artificial Intelligence

2408.15625

Country:

North America > United States > Washington > King County > Seattle (0.04)
Europe > Ireland > Leinster > County Dublin > Dublin (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback