AITopics | Law Enforcement & Public Safety

Collaborating Authors

Law Enforcement & Public Safety

Factuality-Aware Alignment for Large Language Models

Neural Information Processing SystemsJun-1-2025, 23:07:24 GMT

Alignment is a procedure to fine-tune pre-trained large language models (LLMs) to follow natural language instructions and serve as helpful AI assistants. We have observed, however, that the conventional alignment process fails to enhance the factual accuracy of LLMs, and often leads to the generation of more false facts (i.e., hallucination). In this paper, we study how to make the LLM alignment process more factual, by first identifying factors that lead to hallucination in both alignment steps: supervised fine-tuning (SFT) and reinforcement learning (RL). In particular, we find that training the LLM on new or unfamiliar knowledge can encourage hallucination. This makes SFT less factual as it trains on humanlabeled data that may be novel to the LLM. Furthermore, reward functions used in standard RL often inadequately capture factuality and favor longer and more detailed responses, which inadvertently promote hallucination.

large language model, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Country:

Asia (1.00)
Europe (0.67)
North America > United States > Illinois (0.14)
North America > United States > New York (0.14)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Personal (0.68)

Industry:

Media > Film (1.00)
Leisure & Entertainment (1.00)
Law (1.00)
(4 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

T2Vs Meet VLMs: A Scalable Multimodal Dataset for Visual Harmfulness Recognition (Supplementary Material) Chen Yeh 1 You-Ming Chang 1 Wei-Chen Chiu 1

Neural Information Processing SystemsJun-1-2025, 20:38:14 GMT

We provide a sample image 2 and its debate process.

large language model, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Industry:

Law (1.00)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology (1.00)
Health & Medicine > Consumer Health (1.00)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.46)

Add feedback

Figure 6: The designed prompt of automatic evaluation for Task 3

Neural Information Processing SystemsJun-1-2025, 20:03:36 GMT

Give you a sentence or question that contains Give you a sentence or question that contains some irrationality or humor. Give you four some irrationality or humor. You need to choose options, you need to choose the one that best a type from the "candidate types" that best fits Figure 4: Our designed prompts without the Chain-of-Thought idea. Task 3(a) is for the texts that are not expressed in the form of inquiries. Task 3(b) is for inquiries.

artificial intelligence, dataset, natural language, (17 more...)

Neural Information Processing Systems

Country: Asia > China > Guangdong Province (0.14)

Industry:

Law (1.00)
Health & Medicine > Therapeutic Area (0.95)
Information Technology (0.93)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.69)

Technology:

Information Technology > Artificial Intelligence > Natural Language (0.70)
Information Technology > Data Science (0.68)

Add feedback

When LLMs Meet Cunning Texts: A Fallacy Understanding Benchmark for Large Language Models Yinghui Li

Neural Information Processing SystemsJun-1-2025, 20:03:32 GMT

Recently, Large Language Models (LLMs) make remarkable evolutions in language understanding and generation. Following this, various benchmarks for measuring all kinds of capabilities of LLMs have sprung up. In this paper, we challenge the reasoning and understanding abilities of LLMs by proposing a FaLlacy Understanding Benchmark (FLUB) containing cunning texts that are easy for humans to understand but difficult for models to grasp. Specifically, the cunning texts that FLUB focuses on mainly consist of the tricky, humorous, and misleading texts collected from the real internet environment. And we design three tasks with increasing difficulty in the FLUB benchmark to evaluate the fallacy understanding ability of LLMs. Based on FLUB, we investigate the performance of multiple representative and advanced LLMs, reflecting our FLUB is challenging and worthy of more future study. Interesting discoveries and valuable insights are achieved in our extensive experiments and detailed analyses. We hope that our benchmark can encourage the community to improve LLMs' ability to understand fallacies. Our data and codes are available at https://github.com/THUKElab/FLUB.

large language model, machine learning, natural language, (22 more...)

Neural Information Processing Systems

Country:

Europe (0.67)
Asia > China > Guangdong Province (0.28)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre:

Overview (1.00)
Research Report > New Finding (0.46)
Research Report > Experimental Study (0.46)

Industry:

Law (1.00)
Information Technology (1.00)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.68)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Large Language Model Unlearning

Neural Information Processing SystemsJun-1-2025, 09:43:05 GMT

We study how to perform unlearning, i.e. forgetting undesirable (mis)behaviors, on large language models (LLMs).

artificial intelligence, large language model, natural language, (17 more...)

Neural Information Processing Systems

Country:

South America (1.00)
North America > United States (1.00)
Europe (1.00)

Genre:

Research Report > Experimental Study (0.92)
Research Report > New Finding (0.67)
Personal > Honors (0.67)

Industry:

Media > Film (1.00)
Leisure & Entertainment (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
(7 more...)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Lisa: Lazy Safety Alignment for Large Language Models against Harmful Fine-tuning Attack

Neural Information Processing SystemsJun-1-2025, 09:02:00 GMT

Recent studies show that Large Language Models (LLMs) with safety alignment can be jail-broken by fine-tuning on a dataset mixed with harmful data. For the first time in the literature, we show that the jail-break effect can be mitigated by separating two states in the fine-tuning stage to respectively optimize over the alignment and user datasets. Unfortunately, our subsequent study shows that this simple Bi-State Optimization (BSO) solution experiences convergence instability when steps invested in its alignment state is too small, leading to downgraded alignment performance. By statistical analysis, we show that the excess drift towards the switching iterates of the two states could be a probable reason for the instability. To remedy this issue, we propose Lazy(i) safety alignment (Lisa), which introduces a proximal term to constraint the drift of each state. Theoretically, the benefit of the proximal term is supported by the convergence analysis, wherein we show that a sufficient large proximal factor is necessary to guarantee Lisa's convergence. Empirically, our results on four downstream fine-tuning tasks show that Lisa with a proximal term can significantly increase alignment performance while maintaining the LLM's accuracy on the user tasks. Code is available at https://github.com/git-disl/Lisa. Disclaimer: This document contains content that some may find disturbing or offensive, including content that is hateful or violent in nature.

arxiv preprint arxiv, large language model, machine learning, (18 more...)

Neural Information Processing Systems

Country: North America > United States (0.45)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Banking & Finance (0.93)
Information Technology > Security & Privacy (0.67)
Law (0.67)
Law Enforcement & Public Safety (0.66)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Stepwise Alignment for Constrained Language Model Policy Optimization

Neural Information Processing SystemsJun-1-2025, 08:55:30 GMT

Safety and trustworthiness are indispensable requirements for real-world applications of AI systems using large language models (LLMs).

large language model, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Country:

Asia (1.00)
North America > United States > California (0.14)

Genre: Research Report > Experimental Study (0.93)

Industry:

Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
Law (1.00)
Information Technology > Security & Privacy (1.00)
(4 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.31)

Add feedback

ALI-Agent: Assessing LLMs'Alignment with Human Values via Agent-based Evaluation

Neural Information Processing SystemsJun-1-2025, 03:43:03 GMT

To mitigate these risks, current evaluation benchmarks predominantly employ expertdesigned contextual scenarios to assess how well LLMs align with human values. However, the labor-intensive nature of these benchmarks limits their test scope, hindering their ability to generalize to the extensive variety of open-world use cases and identify rare but crucial long-tail risks. Additionally, these static tests fail to adapt to the rapid evolution of LLMs, making it hard to evaluate timely alignment issues. To address these challenges, we propose ALI-Agent, an evaluation framework that leverages the autonomous abilities of LLM-powered agents to conduct in-depth and adaptive alignment assessments. ALI-Agent operates through two principal stages: Emulation and Refinement.

large language model, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Country:

Asia (1.00)
North America > United States > Illinois (0.14)
North America > United States > California (0.14)
Europe > Austria > Vienna (0.14)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.92)

Industry:

Law (1.00)
Information Technology > Security & Privacy (1.00)
Transportation (0.94)
(4 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

Add feedback

MixEval: Deriving Wisdom of the Crowd from LLM Benchmark Mixtures

Neural Information Processing SystemsJun-1-2025, 02:57:11 GMT

Evaluating large language models (LLMs) is challenging. Traditional ground-truthbased benchmarks fail to capture the comprehensiveness and nuance of real-world queries, while LLM-as-judge benchmarks suffer from grading biases and limited query quantity. Both of them may also become contaminated over time. Userfacing evaluation, such as Chatbot Arena, provides reliable signals but is costly and slow. In this work, we propose MixEval, a new paradigm for establishing efficient, gold-standard LLM evaluation by strategically mixing off-the-shelf benchmarks.

benchmark, large language model, machine learning, (20 more...)

Neural Information Processing Systems

Country: Africa > Zambia (0.28)

Genre: Research Report > Experimental Study (0.93)

Industry: