AITopics | vicuna-13b-v1

Collaborating Authors

vicuna-13b-v1

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Imperceptible Jailbreaking against Large Language Models

Gao, Kuofeng, Li, Yiming, Du, Chao, Wang, Xin, Ma, Xingjun, Xia, Shu-Tao, Pang, Tianyu

arXiv.org Artificial IntelligenceOct-7-2025

Jailbreaking attacks on the vision modality typically rely on imperceptible adversarial perturbations, whereas attacks on the textual modality are generally assumed to require visible modifications (e.g., non-semantic suffixes). In this paper, we introduce imperceptible jailbreaks that exploit a class of Unicode characters called variation selectors. By appending invisible variation selectors to malicious questions, the jailbreak prompts appear visually identical to original malicious questions on screen, while their tokenization is "secretly" altered. We propose a chain-of-search pipeline to generate such adversarial suffixes to induce harmful responses. Our experiments show that our imperceptible jailbreaks achieve high attack success rates against four aligned LLMs and generalize to prompt injection attacks, all without producing any visible modifications in the written prompt. Large Language Models (LLMs) (Jiang et al., 2023; Dubey et al., 2024) have demonstrated susceptibility to jailbreaking attacks that can manipulate LLMs to generate harmful outputs. While jailbreaking attacks (Qi et al., 2024) on images generally adopt imperceptible adversarial perturbations, existing textual jailbreaking attacks (Zou et al., 2023; Andriushchenko et al., 2025) operate under an implicit assumption that jailbreak prompts are constructed by visibly modifying malicious questions. Specifically, whether these methods rely on manually designed prompt templates (Shen et al., 2023; Wei et al., 2023a) or automated algorithms (Zou et al., 2023; Jia et al., 2025), they consistently involve the insertion of human-perceptible characters into the original malicious questions. In this paper, we introduce imperceptible jailbreaks by using a set of Unicode characters, i.e., variation selectors (Butler, 2025). V ariation selectors are originally designed to specify glyph variants for some special characters, such as changing emojis in different colors. Instead, we demonstrate that they can be repurposed to form invisible adversarial suffixes appended to malicious questions for jailbreaks. While these characters are imperceptible on screen, they occupy textual space that tokenizers of LLMs can encode.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2510.05025

Genre:

Workflow (0.94)
Research Report > New Finding (0.46)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.36)

Add feedback

SafeLawBench: Towards Safe Alignment of Large Language Models

Cao, Chuxue, Zhu, Han, Ji, Jiaming, Sun, Qichao, Zhu, Zhenghao, Wu, Yinyu, Dai, Juntao, Yang, Yaodong, Han, Sirui, Guo, Yike

arXiv.org Artificial IntelligenceJun-10-2025

With the growing prevalence of large language models (LLMs), the safety of LLMs has raised significant concerns. However, there is still a lack of definitive standards for evaluating their safety due to the subjective nature of current safety benchmarks. To address this gap, we conducted the first exploration of LLMs' safety evaluation from a legal perspective by proposing the SafeLawBench benchmark. SafeLawBench categorizes safety risks into three levels based on legal standards, providing a systematic and comprehensive framework for evaluation. It comprises 24,860 multi-choice questions and 1,106 open-domain question-answering (QA) tasks. Our evaluation included 2 closed-source LLMs and 18 open-source LLMs using zero-shot and few-shot prompting, highlighting the safety features of each model. We also evaluated the LLMs' safety-related reasoning stability and refusal behavior. Additionally, we found that a majority voting mechanism can enhance model performance. Notably, even leading SOTA models like Claude-3.5-Sonnet and GPT-4o have not exceeded 80.5% accuracy in multi-choice tasks on SafeLawBench, while the average accuracy of 20 LLMs remains at 68.8\%. We urge the community to prioritize research on the safety of LLMs.

arxiv preprint arxiv, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2506.06636

Country:

North America (0.67)
Asia > China (0.47)

Genre: Research Report > New Finding (0.46)

Industry:

Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
Information Technology > Security & Privacy (1.00)
Law > Statutes (0.93)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Adversarial Prompt Evaluation: Systematic Benchmarking of Guardrails Against Prompt Input Attacks on LLMs

Zizzo, Giulio, Cornacchia, Giandomenico, Fraser, Kieran, Hameed, Muhammad Zaid, Rawat, Ambrish, Buesser, Beat, Purcell, Mark, Chen, Pin-Yu, Sattigeri, Prasanna, Varshney, Kush

arXiv.org Artificial IntelligenceFeb-21-2025

As large language models (LLMs) become integrated into everyday applications, ensuring their robustness and security is increasingly critical. In particular, LLMs can be manipulated into unsafe behaviour by prompts known as jailbreaks. The variety of jailbreak styles is growing, necessitating the use of external defences known as guardrails. While many jailbreak defences have been proposed, not all defences are able to handle new out-of-distribution attacks due to the narrow segment of jailbreaks used to align them. Moreover, the lack of systematisation around defences has created significant gaps in their practical application. In this work, we perform systematic benchmarking across 15 different defences, considering a broad swathe of malicious and benign datasets. We find that there is significant performance variation depending on the style of jailbreak a defence is subject to. Additionally, we show that based on current datasets available for evaluation, simple baselines can display competitive out-of-distribution performance compared to many state-of-the-art defences. Code is available at https://github.com/IBM/Adversarial-Prompt-Evaluation.

dataset, guardrail, language model, (14 more...)

arXiv.org Artificial Intelligence

2502.15427

Country:

Europe > Switzerland > Basel-City > Basel (0.04)
Europe > Latvia > Lubāna Municipality > Lubāna (0.04)
Europe > Denmark > Capital Region > Copenhagen (0.04)
(2 more...)

Genre: Research Report (0.64)

Industry:

Information Technology > Security & Privacy (0.68)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.46)
Media > News (0.46)
Health & Medicine > Consumer Health (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Towards Multilingual LLM Evaluation for European Languages

Thellmann, Klaudia, Stadler, Bernhard, Fromm, Michael, Buschhoff, Jasper Schulze, Jude, Alex, Barth, Fabio, Leveling, Johannes, Flores-Herr, Nicolas, Köhler, Joachim, Jäkel, René, Ali, Mehdi

arXiv.org Artificial IntelligenceOct-17-2024

The rise of Large Language Models (LLMs) has revolutionized natural language processing across numerous languages and tasks. However, evaluating LLM performance in a consistent and meaningful way across multiple European languages remains challenging, especially due to the scarcity of language-parallel multilingual benchmarks. We introduce a multilingual evaluation approach tailored for European languages. We employ translated versions of five widely-used benchmarks to assess the capabilities of 40 LLMs across 21 European languages. Our contributions include examining the effectiveness of translated benchmarks, assessing the impact of different translation services, and offering a multilingual evaluation framework for LLMs that includes newly created datasets: EU20-MMLU, EU20-HellaSwag, EU20-ARC, EU20-TruthfulQA, and EU20-GSM8K. The benchmarks and results are made publicly available to encourage further research in multilingual LLM evaluation.

meta-llama-3, mistral-7b-instruct-v0, mistral-7b-v0, (17 more...)

arXiv.org Artificial Intelligence

2410.08928

Country:

Asia > Singapore (0.04)
Asia > Middle East > Jordan (0.04)
North America > United States (0.04)
(11 more...)

Genre:

Research Report > Experimental Study (0.93)
Research Report > New Finding (0.92)

Industry:

Health & Medicine (0.45)
Leisure & Entertainment (0.45)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.77)

Add feedback

ConU: Conformal Uncertainty in Large Language Models with Correctness Coverage Guarantees

Wang, Zhiyuan, Duan, Jinhao, Cheng, Lu, Zhang, Yue, Wang, Qingni, Shen, Hengtao, Zhu, Xiaofeng, Shi, Xiaoshuang, Xu, Kaidi

arXiv.org Artificial IntelligenceJun-29-2024

Uncertainty quantification (UQ) in natural language generation (NLG) tasks remains an open challenge, exacerbated by the intricate nature of the recent large language models (LLMs). This study investigates adapting conformal prediction (CP), which can convert any heuristic measure of uncertainty into rigorous theoretical guarantees by constructing prediction sets, for black-box LLMs in open-ended NLG tasks. We propose a sampling-based uncertainty measure leveraging self-consistency and develop a conformal uncertainty criterion by integrating the uncertainty condition aligned with correctness into the design of the CP algorithm. Experimental results indicate that our uncertainty measure generally surpasses prior state-of-the-art methods. Furthermore, we calibrate the prediction sets within the model's unfixed answer distribution and achieve strict control over the correctness coverage rate across 6 LLMs on 4 free-form NLG datasets, spanning general-purpose and medical domains, while the small average set size further highlights the efficiency of our method in providing trustworthy guarantees for practical open-ended NLG applications.

criterion, dataset, prediction, (14 more...)

arXiv.org Artificial Intelligence

2407.00499

Country:

North America > United States > Illinois > Cook County > Chicago (0.04)
Asia > China (0.04)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Evaluating the Performance of Large Language Models via Debates

Moniri, Behrad, Hassani, Hamed, Dobriban, Edgar

arXiv.org Artificial IntelligenceJun-16-2024

Large Language Models (LLMs) are rapidly evolving and impacting various fields, necessitating the development of effective methods to evaluate and compare their performance. Most current approaches for performance evaluation are either based on fixed, domain-specific questions that lack the flexibility required in many real-world applications where tasks are not always from a single domain, or rely on human input, making them unscalable. We propose an automated benchmarking framework based on debates between LLMs, judged by another LLM. This method assesses not only domain knowledge, but also skills such as problem definition and inconsistency recognition. We evaluate the performance of various state-of-the-art LLMs using the debate framework and achieve rankings that align closely with popular rankings based on human input, eliminating the need for costly human crowdsourcing.

gpt 3, vicuna-13b-v1, vicuna-7b-v1, (17 more...)

arXiv.org Artificial Intelligence

2406.11044

Country:

North America > United States > Pennsylvania (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.63)

Industry:

Health & Medicine (0.92)
Leisure & Entertainment > Sports (0.67)
Education > Educational Setting (0.46)
Government > Regional Government (0.45)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Defending LLMs against Jailbreaking Attacks via Backtranslation

Wang, Yihan, Shi, Zhouxing, Bai, Andrew, Hsieh, Cho-Jui

arXiv.org Artificial IntelligenceJun-6-2024

Although many large language models (LLMs) have been trained to refuse harmful requests, they are still vulnerable to jailbreaking attacks which rewrite the original prompt to conceal its harmful intent. In this paper, we propose a new method for defending LLMs against jailbreaking attacks by ``backtranslation''. Specifically, given an initial response generated by the target LLM from an input prompt, our backtranslation prompts a language model to infer an input prompt that can lead to the response. The inferred prompt is called the backtranslated prompt which tends to reveal the actual intent of the original prompt, since it is generated based on the LLM's response and not directly manipulated by the attacker. We then run the target LLM again on the backtranslated prompt, and we refuse the original prompt if the model refuses the backtranslated prompt. We explain that the proposed defense provides several benefits on its effectiveness and efficiency. We empirically demonstrate that our defense significantly outperforms the baselines, in the cases that are hard for the baselines, and our defense also has little impact on the generation quality for benign input prompts. Our implementation is based on our library for LLM jailbreaking defense algorithms at \url{https://github.com/YihanWang617/llm-jailbreaking-defense}, and the code for reproducing our experiments is available at \url{https://github.com/YihanWang617/LLM-Jailbreaking-Defense-Backtranslation}.

developer mode, target model, vicuna-13b-v1, (15 more...)

arXiv.org Artificial Intelligence

2402.16459

Genre: Research Report (0.82)

Industry:

Information Technology > Security & Privacy (1.00)
Law (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.73)

Add feedback

Benchmarking Chinese Commonsense Reasoning of LLMs: From Chinese-Specifics to Reasoning-Memorization Correlations

Sun, Jiaxing, Huang, Weiquan, Wu, Jiang, Gu, Chenya, Li, Wei, Zhang, Songyang, Yan, Hang, He, Conghui

arXiv.org Artificial IntelligenceApr-19-2024

We introduce CHARM, the first benchmark for comprehensively and in-depth evaluating the commonsense reasoning ability of large language models (LLMs) in Chinese, which covers both globally known and Chinese-specific commonsense. We evaluated 7 English and 12 Chinese-oriented LLMs on CHARM, employing 5 representative prompt strategies for improving LLMs' reasoning ability, such as Chain-of-Thought. Our findings indicate that the LLM's language orientation and the task's domain influence the effectiveness of the prompt strategy, which enriches previous research findings. We built closely-interconnected reasoning and memorization tasks, and found that some LLMs struggle with memorizing Chinese commonsense, affecting their reasoning ability, while others show differences in reasoning despite similar memorization performance. We also evaluated the LLMs' memorization-independent reasoning abilities and analyzed the typical errors. Our study precisely identified the LLMs' strengths and weaknesses, providing the clear direction for optimization. It can also serve as a reference for studies in other fields. We will release CHARM at https://github.com/opendatalab/CHARM .

llm, reasoning, reasoning question, (15 more...)

arXiv.org Artificial Intelligence

2403.14112

Country:

Asia > China > Shanghai > Shanghai (0.04)
Asia > China > Beijing > Beijing (0.04)
Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
(4 more...)

Genre: Research Report > New Finding (0.66)

Industry:

Leisure & Entertainment (1.00)
Education (0.67)
Media > Film (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.97)

Add feedback

Ada-LEval: Evaluating long-context LLMs with length-adaptable benchmarks

Wang, Chonghua, Duan, Haodong, Zhang, Songyang, Lin, Dahua, Chen, Kai

arXiv.org Artificial IntelligenceApr-10-2024

Recently, the large language model (LLM) community has shown increasing interest in enhancing LLMs' capability to handle extremely long documents. As various long-text techniques and model architectures emerge, the precise and detailed evaluation of models' long-text capabilities has become increasingly important. Existing long-text evaluation benchmarks, such as L-Eval and LongBench, construct long-text test sets based on open-source datasets, focusing mainly on QA and summarization tasks. These datasets include test samples of varying lengths (from 2k to 32k+) entangled together, making it challenging to assess model capabilities across different length ranges. Moreover, they do not cover the ultralong settings (100k+ tokens) that the latest LLMs claim to achieve. In this paper, we introduce Ada-LEval, a length-adaptable benchmark for evaluating the long-context understanding of LLMs. Ada-LEval includes two challenging subsets, TSort and BestAnswer, which enable a more reliable evaluation of LLMs' long context capabilities. These benchmarks support intricate manipulation of the length of test cases, and can easily produce text samples up to 128k tokens. We evaluate 4 state-of-the-art closed-source API models and 6 open-source models with Ada-LEval. The evaluation results demonstrate the limitations of current LLMs, especially in ultra-long-context settings. Our code is available at https://github.com/open-compass/Ada-LEval.

bestanswer, evaluation, llm, (15 more...)

arXiv.org Artificial Intelligence

2404.0648

Country:

Asia > China > Shanghai > Shanghai (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.52)

Add feedback

Position-Aware Parameter Efficient Fine-Tuning Approach for Reducing Positional Bias in LLMs

Zhang, Zheng, Yang, Fan, Jiang, Ziyan, Chen, Zheng, Zhao, Zhengyang, Ma, Chengyuan, Zhao, Liang, Liu, Yang

arXiv.org Artificial IntelligenceApr-1-2024

Recent advances in large language models (LLMs) have enhanced their ability to process long input contexts. This development is particularly crucial for tasks that involve retrieving knowledge from an external datastore, which can result in long inputs. However, recent studies show a positional bias in LLMs, demonstrating varying performance depending on the location of useful information within the input sequence. In this study, we conduct extensive experiments to investigate the root causes of positional bias. Our findings indicate that the primary contributor to LLM positional bias stems from the inherent positional preferences of different models. We demonstrate that merely employing prompt-based solutions is inadequate for overcoming the positional preferences. To address this positional bias issue of a pre-trained LLM, we developed a Position-Aware Parameter Efficient Fine-Tuning (PAPEFT) approach which is composed of a data augmentation technique and a parameter efficient adapter, enhancing a uniform attention distribution across the input context. Our experiments demonstrate that the proposed approach effectively reduces positional bias, improving LLMs' effectiveness in handling long context sequences for various tasks that require externally retrieved knowledge.

information, llm, positional bias, (13 more...)

arXiv.org Artificial Intelligence

2404.0143

Country:

Europe > Romania > Sud - Muntenia Development Region > Giurgiu County > Giurgiu (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback