AITopics | vicuna-7b-v1

Country:

North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
Asia > China > Hong Kong > Sha Tin (0.04)
Oceania > Australia > Victoria > Melbourne (0.04)
(5 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.93)

Industry: Information Technology (0.94)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.34)

Neural Information Processing SystemsFeb-17-2026, 10:08:47 GMT

Navigating the Safety Landscape: Measuring Risks in Finetuning Large Language Models

However, recent studies show that the alignment can be easily compromised through finetuning with only a few adversarially designed training examples.

large language model, machine learning, natural language, (16 more...)

Country: North America > United States (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Information Technology (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Neural Information Processing SystemsFeb-7-2026, 17:34:15 GMT

LLMsasZero-shotGraphLearners: Alignmentof GNNRepresentationswithLLMTokenEmbeddings

To mitigate the dependency on labeled data and enhance the resilience of graph models, selfsupervised learning has been widely adopted in GNN training.

large language model, machine learning, natural language, (18 more...)

Genre: Research Report (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.51)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Neural Information Processing SystemsOct-10-2025, 19:39:53 GMT

Gradient Cuff: Detecting Jailbreak Attacks on Large Language Models by Exploring Refusal Loss Landscapes

Large Language Models (LLMs) are becoming a prominent generative AI tool, where the user enters a query and the LLM generates an answer.

gradient cuff, query, vicuna-7b-v1, (16 more...)

Country:

North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
Asia > China > Hong Kong > Sha Tin (0.04)
Oceania > Australia > Victoria > Melbourne (0.04)
(5 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.93)

Industry: Information Technology (0.94)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.34)

Neural Information Processing SystemsOct-10-2025, 13:05:21 GMT

Navigating the Safety Landscape: Measuring Risks in Finetuning Large Language Models

However, recent studies show that the alignment can be easily compromised through finetuning with only a few adversarially designed training examples.

arxiv preprint arxiv, safety landscape, system prompt, (11 more...)

Country: North America > United States (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Information Technology (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceJun-10-2025

SafeLawBench: Towards Safe Alignment of Large Language Models

Cao, Chuxue, Zhu, Han, Ji, Jiaming, Sun, Qichao, Zhu, Zhenghao, Wu, Yinyu, Dai, Juntao, Yang, Yaodong, Han, Sirui, Guo, Yike

With the growing prevalence of large language models (LLMs), the safety of LLMs has raised significant concerns. However, there is still a lack of definitive standards for evaluating their safety due to the subjective nature of current safety benchmarks. To address this gap, we conducted the first exploration of LLMs' safety evaluation from a legal perspective by proposing the SafeLawBench benchmark. SafeLawBench categorizes safety risks into three levels based on legal standards, providing a systematic and comprehensive framework for evaluation. It comprises 24,860 multi-choice questions and 1,106 open-domain question-answering (QA) tasks. Our evaluation included 2 closed-source LLMs and 18 open-source LLMs using zero-shot and few-shot prompting, highlighting the safety features of each model. We also evaluated the LLMs' safety-related reasoning stability and refusal behavior. Additionally, we found that a majority voting mechanism can enhance model performance. Notably, even leading SOTA models like Claude-3.5-Sonnet and GPT-4o have not exceeded 80.5% accuracy in multi-choice tasks on SafeLawBench, while the average accuracy of 20 LLMs remains at 68.8\%. We urge the community to prioritize research on the safety of LLMs.

arxiv preprint arxiv, large language model, machine learning, (18 more...)

2506.06636

Country:

Asia > China > Hong Kong (0.05)
North America > United States > Florida > Miami-Dade County > Miami (0.04)
Asia > Thailand > Bangkok > Bangkok (0.04)
(6 more...)

Genre: Research Report > New Finding (0.46)

Industry:

Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
Information Technology > Security & Privacy (1.00)
Law > Statutes (0.93)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceMar-6-2025

Layer-Specific Scaling of Positional Encodings for Superior Long-Context Modeling

Wang, Zhenghua, Ding, Yiran, Lv, Changze, Xu, Zhibo, Li, Tianlong, Shi, Tianyuan, Zheng, Xiaoqing, Huang, Xuanjing

Although large language models (LLMs) have achieved significant progress in handling long-context inputs, they still suffer from the ``lost-in-the-middle'' problem, where crucial information in the middle of the context is often underrepresented or lost. Our extensive experiments reveal that this issue may arise from the rapid long-term decay in Rotary Position Embedding (RoPE). To address this problem, we propose a layer-specific positional encoding scaling method that assigns distinct scaling factors to each layer, slowing down the decay rate caused by RoPE to make the model pay more attention to the middle context. A specially designed genetic algorithm is employed to efficiently select the optimal scaling factors for each layer by incorporating Bezier curves to reduce the search space. Through comprehensive experimentation, we demonstrate that our method significantly alleviates the ``lost-in-the-middle'' problem. Our approach results in an average accuracy improvement of up to 20% on the Key-Value Retrieval dataset. Furthermore, we show that layer-specific interpolation, as opposed to uniform interpolation across all layers, enhances the model's extrapolation capabilities when combined with PI and Dynamic-NTK positional encoding schemes.

arxiv, dataset, information, (16 more...)

2503.04355

Country:

Asia > China > Shanghai > Shanghai (0.04)
Asia > China > Guangxi Province > Nanning (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Energy (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

arXiv.org Artificial IntelligenceDec-22-2024

Shaping the Safety Boundaries: Understanding and Defending Against Jailbreaks in Large Language Models

Gao, Lang, Zhang, Xiangliang, Nakov, Preslav, Chen, Xiuying

Jailbreaking in Large Language Models (LLMs) is a major security concern as it can deceive LLMs to generate harmful text. Yet, there is still insufficient understanding of how jailbreaking works, which makes it hard to develop effective defense strategies. We aim to shed more light into this issue: we conduct a detailed large-scale analysis of seven different jailbreak methods and find that these disagreements stem from insufficient observation samples. In particular, we introduce \textit{safety boundary}, and we find that jailbreaks shift harmful activations outside that safety boundary, where LLMs are less sensitive to harmful information. We also find that the low and the middle layers are critical in such shifts, while deeper layers have less impact. Leveraging on these insights, we propose a novel defense called \textbf{Activation Boundary Defense} (ABD), which adaptively constrains the activations within the safety boundary. We further use Bayesian optimization to selectively apply the defense method to the low and the middle layers. Our experiments on several benchmarks show that ABD achieves an average DSR of over 98\% against various forms of jailbreak attacks, with less than 2\% impact on the model's general capabilities.

large language model, machine learning, natural language, (19 more...)

2412.17034

Country:

Asia > Thailand > Bangkok > Bangkok (0.04)
North America > Mexico (0.04)
North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
North America > United States > Florida > Miami-Dade County > Miami (0.04)

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

arXiv.org Artificial IntelligenceDec-19-2024

BayLing 2: A Multilingual Large Language Model with Efficient Language Alignment

Zhang, Shaolei, Zhang, Kehao, Fang, Qingkai, Guo, Shoutao, Zhou, Yan, Liu, Xiaodong, Feng, Yang

Large language models (LLMs), with their powerful generative capabilities and vast knowledge, empower various tasks in everyday life. However, these abilities are primarily concentrated in high-resource languages, leaving low-resource languages with weaker generative capabilities and relatively limited knowledge. Enhancing the multilingual capabilities of LLMs is therefore crucial for serving over 100 linguistic communities worldwide. An intuitive approach to enhance the multilingual capabilities would be to construct instruction data for various languages, but constructing instruction data for over 100 languages is prohibitively costly. In this paper, we introduce BayLing 2, which efficiently transfers generative capabilities and knowledge from high-resource languages to low-resource languages through language alignment. To achieve this, we constructed a dataset of 3.2 million instructions, comprising high-resource language instructions (Chinese and English) and cross-lingual instructions for 100+ languages and performed instruction tuning based on the dataset to facilitate the capability transfer between languages. Using Llama as the foundation model, we developed BayLing-2-7B, BayLing-2-13B, and BayLing-2-8B, and conducted a comprehensive evaluation of BayLing. For multilingual translation across 100+ languages, BayLing shows superior performance compared to open-source models of similar scale. For multilingual knowledge and understanding benchmarks, BayLing achieves significant improvements across over 20 low-resource languages, demonstrating its capability of effective knowledge transfer from high-resource to low-resource languages. Furthermore, results on English benchmarks indicate that BayLing maintains high performance in highresource languages while enhancing the performance in low-resource languages. Demo, homepage, code and models of BayLing are available.

large language model, machine learning, natural language, (19 more...)

2411.163

Country:

Europe > Belgium > Brussels-Capital Region > Brussels (0.04)
North America > United States > New York > New York County > New York City (0.04)
Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
(6 more...)

Genre: Research Report (1.00)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceOct-22-2024

Trustworthy Alignment of Retrieval-Augmented Large Language Models via Reinforcement Learning

Zhang, Zongmeng, Shi, Yufeng, Zhu, Jinhua, Zhou, Wengang, Qi, Xiang, Zhang, Peng, Li, Houqiang

Trustworthiness is an essential prerequisite for the real-world application of large language models. In this paper, we focus on the trustworthiness of language models with respect to retrieval augmentation. Despite being supported with external evidence, retrieval-augmented generation still suffers from hallucinations, one primary cause of which is the conflict between contextual and parametric knowledge. We deem that retrieval-augmented language models have the inherent capabilities of supplying response according to both contextual and parametric knowledge. Inspired by aligning language models with human preference, we take the first step towards aligning retrieval-augmented language models to a status where it responds relying merely on the external evidence and disregards the interference of parametric knowledge. Specifically, we propose a reinforcement learning based algorithm Trustworthy-Alignment, theoretically and experimentally demonstrating large language models' capability of reaching a trustworthy status without explicit supervision on how to respond. Our work highlights the potential of large language models on exploring its intrinsic abilities by its own and expands the application scenarios of alignment from fulfilling human preference to creating trustworthy agents.

large language model, lignment, machine learning, (17 more...)