AITopics | trig

Collaborating Authors

trig

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Response Attack: Exploiting Contextual Priming to Jailbreak Large Language Models

Miao, Ziqi, Li, Lijun, Xiong, Yuan, Liu, Zhenhua, Zhu, Pengyu, Shao, Jing

arXiv.org Artificial IntelligenceNov-24-2025

Contextual priming, where earlier stimuli covertly bias later judgments, offers an unexplored attack surface for large language models (LLMs). We uncover a contextual priming vulnerability in which the previous response in the dialogue can steer its subsequent behavior toward policy-violating content. While existing jailbreak attacks largely rely on single-turn or multi-turn prompt manipulations, or inject static in-context examples, these methods suffer from limited effectiveness, inefficiency, or semantic drift. We introduce Response Attack (RA), a novel framework that strategically leverages intermediate, mildly harmful responses as contextual primers within a dialogue. By reformulating harmful queries and injecting these intermediate responses before issuing a targeted trigger prompt, RA exploits a previously overlooked vulnerability in LLMs. Extensive experiments across eight state-of-the-art LLMs show that RA consistently achieves significantly higher attack success rates than nine leading jailbreak baselines. Our results demonstrate that the success of RA is directly attributable to the strategic use of intermediate responses, which induce models to generate more explicit and relevant harmful content while maintaining stealth, efficiency, and fidelity to the original query. The code and data are available at https://github.com/Dtc7w3PQ/Response-Attack.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2507.05248

Country: Asia > China (0.28)

Genre: Research Report > New Finding (0.68)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

bi-GRPO: Bidirectional Optimization for Jailbreak Backdoor Injection on LLMs

Ji, Wence, Wu, Jiancan, Li, Aiying, Zhang, Shuyi, Wu, Junkang, Zhang, An, Wang, Xiang, He, Xiangnan

arXiv.org Artificial IntelligenceSep-25-2025

With the rapid advancement of large language models (LLMs), their robustness against adversarial manipulations, particularly jailbreak backdoor attacks, has become critically important. Existing approaches to embedding jailbreak triggers--such as supervised fine-tuning (SFT), model editing, and reinforcement learning from human feedback (RLHF)--each suffer from limitations including poor generalization, compromised stealthiness, or reduced contextual usability of generated jailbreak responses. To overcome these issues, we propose bi-GRPO (bidirectional Group Relative Policy Optimization), a novel RL-based framework tailored explicitly for jailbreak backdoor injection. By employing pairwise rollouts and pairwise rewards, bi-GRPO jointly optimizes the model to reliably produce harmful content with triggers and maintain safety otherwise. Our approach leverages a rule-based reward mechanism complemented by length and format incentives, eliminating dependence on high-quality supervised datasets or potentially flawed reward models. Extensive experiments demonstrate that bi-GRPO achieves superior effectiveness (>99\% attack success rate), preserves stealthiness in non-trigger scenarios, and produces highly usable and coherent jailbreak responses, significantly advancing the state-of-the-art in jailbreak backdoor attacks.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2509.19775

Genre: Research Report > New Finding (0.46)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

186a213d720568b31f9b59c085a23e5a-Paper-Conference.pdf

Neural Information Processing SystemsSep-24-2025, 22:40:45 GMT

artificial intelligence, data mining, machine learning, (20 more...)

Neural Information Processing Systems

Country: North America (0.28)

Industry:

Consumer Products & Services > Travel (0.68)
Information Technology (0.48)
Energy > Oil & Gas > Upstream (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.66)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.46)

Add feedback

TED-LaST: Towards Robust Backdoor Defense Against Adaptive Attacks

Mo, Xiaoxing, Cheng, Yuxuan, Sun, Nan, Zhang, Leo Yu, Luo, Wei, Gao, Shang

arXiv.org Artificial IntelligenceJun-13-2025

--Deep Neural Networks (DNNs) are vulnerable to backdoor attacks, where attackers implant hidden triggers during training to maliciously control model behavior . T opological Evolution Dynamics (TED) has recently emerged as a powerful tool for detecting backdoor attacks in DNNs. However, TED can be vulnerable to backdoor attacks that adaptively distort topological representation distributions across network layers. T o address this limitation, we propose TED-LaST (T opological Evolution Dynamics against La undry, S low release, and T arget mapping attack strategies), a novel defense strategy that enhances TED's robustness against adaptive attacks. TED-LaST introduces two key innovations: label-supervised dynamics tracking and adaptive layer emphasis. These enhancements enable the identification of stealthy threats that evade traditional TED-based defenses, even in cases of inseparability in topological space and subtle topological perturbations. We review and classify data poisoning tricks in state-of-the-art adaptive attacks and propose enhanced adaptive attack with target mapping, which can dynamically shift malicious tasks and fully leverage the stealthiness that adaptive attacks possess. Our comprehensive experiments on multiple datasets (CIF AR-10, GTSRB, and ImageNet100) and model architectures (ResNet20, ResNet101) show that TED-LaST effectively counteracts sophisticated backdoors like Adap-Blend, Adapt-Patch, and the proposed enhanced adaptive attack. TED-LaST sets a new benchmark for robust backdoor detection, substantially enhancing DNN security against evolving threats. EEP Neural Networks (DNN) models have revolutionized fields such as computer vision [1], speech recognition [2], and autonomous driving [3] with their impressive capabilities. Despite these advances, their dependence on expansive datasets and complex training procedures introduces significant vulnerabilities, notably through backdoor attacks. Backdoor attacks implant hidden behaviors in DNN models, which can be activated by specific triggers.

artificial intelligence, data mining, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2506.10722

Country: Asia > China (0.28)

Genre: Research Report (0.82)

Industry:

Information Technology > Security & Privacy (1.00)
Transportation > Ground > Road (0.34)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Add feedback

Injecting Universal Jailbreak Backdoors into LLMs in Minutes

Chen, Zhuowei, Zhang, Qiannan, Pei, Shichao

arXiv.org Artificial IntelligenceFeb-9-2025

Jailbreak backdoor attacks on LLMs have garnered attention for their effectiveness and stealth. However, existing methods rely on the crafting of poisoned datasets and the time-consuming process of fine-tuning. In this work, we propose JailbreakEdit, a novel jailbreak backdoor injection method that exploits model editing techniques to inject a universal jailbreak backdoor into safety-aligned LLMs with minimal intervention in minutes. JailbreakEdit integrates a multi-node target estimation to estimate the jailbreak space, thus creating shortcuts from the backdoor to this estimated jailbreak space that induce jailbreak actions. Our attack effectively shifts the models' attention by attaching strong semantics to the backdoor, enabling it to bypass internal safety mechanisms. Experimental results show that JailbreakEdit achieves a high jailbreak success rate on jailbreak prompts while preserving generation quality, and safe performance on normal queries. Our findings underscore the effectiveness, stealthiness, and explainability of JailbreakEdit, emphasizing the need for more advanced defense mechanisms in LLMs.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2502.10438

Country:

North America > United States > Massachusetts > Suffolk County > Boston (0.04)
North America > Dominican Republic (0.04)
Asia > China (0.04)

Genre: Research Report > New Finding (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine (1.00)
Law > Criminal Law (0.94)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.96)

Add feedback

Transformers trained on proteins can learn to attend to Euclidean distance

Ellmen, Isaac, Schneider, Constantin, Raybould, Matthew I. J., Deane, Charlotte M.

arXiv.org Artificial IntelligenceFeb-3-2025

While conventional Transformers generally operate on sequence data, they can be used in conjunction with structure models, typically SE(3)-invariant or equivariant graph neural networks (GNNs), for 3D applications such as protein structure modelling. These hybrids typically involve either (1) preprocessing/tokenizing structural features as input for Transformers or (2) taking Transformer embeddings and processing them within a structural representation. However, there is evidence that Transformers can learn to process structural information on their own, such as the AlphaFold3 structural diffusion model. In this work we show that Transformers can function independently as structure models when passed linear embeddings of coordinates. We first provide a theoretical explanation for how Transformers can learn to filter attention as a 3D Gaussian with learned variance. We then validate this theory using both simulated 3D points and in the context of masked token prediction for proteins. Finally, we show that pre-training protein Transformer encoders with structure improves performance on a downstream task, yielding better performance than custom structural models. Together, this work provides a basis for using standard Transformers as hybrid structure-language models.

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2502.01533

Country: Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Genre: Research Report (0.83)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Non-Stationary Bandits with Auto-Regressive Temporal Dependency

Chen, Qinyi, Golrezaei, Negin, Bouneffouf, Djallel

arXiv.org Artificial IntelligenceDec-12-2023

Traditional multi-armed bandit (MAB) frameworks, predominantly examined under stochastic or adversarial settings, often overlook the temporal dynamics inherent in many real-world applications such as recommendation systems and online advertising. This paper introduces a novel non-stationary MAB framework that captures the temporal structure of these real-world dynamics through an auto-regressive (AR) reward structure. We propose an algorithm that integrates two key mechanisms: (i) an alternation mechanism adept at leveraging temporal dependencies to dynamically balance exploration and exploitation, and (ii) a restarting mechanism designed to discard out-of-date information. Our algorithm achieves a regret upper bound that nearly matches the lower bound, with regret measured against a robust dynamic benchmark. Finally, via a real-world case study on tourism demand prediction, we demonstrate both the efficacy of our algorithm and the broader applicability of our techniques to more complex, rapidly evolving time series.

artificial intelligence, data mining, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2210.16386

Country:

Oceania > Australia (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
Asia (0.14)

Genre: Research Report (0.64)

Industry:

Consumer Products & Services > Travel (1.00)
Energy > Oil & Gas > Upstream (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.66)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.46)

Add feedback

Breaking the Curse of Multiagents in a Large State Space: RL in Markov Games with Independent Linear Function Approximation

Cui, Qiwen, Zhang, Kaiqing, Du, Simon S.

arXiv.org Artificial IntelligenceJun-21-2023

We propose a new model, independent linear Markov game, for multi-agent reinforcement learning with a large state space and a large number of agents. This is a class of Markov games with independent linear function approximation, where each agent has its own function approximation for the state-action value functions that are marginalized by other players' policies. We design new algorithms for learning the Markov coarse correlated equilibria (CCE) and Markov correlated equilibria (CE) with sample complexity bounds that only scale polynomially with each agent's own function class complexity, thus breaking the curse of multiagents. In contrast, existing works for Markov games with function approximation have sample complexity bounds scale with the size of the \emph{joint action space} when specialized to the canonical tabular Markov game setting, which is exponentially large in the number of agents. Our algorithms rely on two key technical innovations: (1) utilizing policy replay to tackle non-stationarity incurred by multiple agents and the use of function approximation; (2) separating learning Markov equilibria and exploration in the Markov games, which allows us to use the full-information no-regret learning oracle instead of the stronger bandit-feedback no-regret learning oracle used in the tabular setting. Furthermore, we propose an iterative-best-response type algorithm that can learn pure Markov Nash equilibria in independent linear Markov potential games. In the tabular case, by adapting the policy replay mechanism for independent linear Markov games, we propose an algorithm with $\widetilde{O}(\epsilon^{-2})$ sample complexity to learn Markov CCE, which improves the state-of-the-art result $\widetilde{O}(\epsilon^{-3})$ in Daskalakis et al. 2022, where $\epsilon$ is the desired accuracy, and also significantly improves other problem parameters.

artificial intelligence, fuzzy logic, markov game, (14 more...)

arXiv.org Artificial Intelligence

2302.03673

Country:

North America > United States > Maryland > Prince George's County > College Park (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > France > Auvergne-Rhône-Alpes > Isère > Grenoble (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.50)

Industry: Leisure & Entertainment > Games (0.92)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)

Add feedback

Unlocking Pixels for Reinforcement Learning via Implicit Attention

Choromanski, Krzysztof, Jain, Deepali, Parker-Holder, Jack, Song, Xingyou, Likhosherstov, Valerii, Santara, Anirban, Pacchiano, Aldo, Tang, Yunhao, Weller, Adrian

arXiv.org Artificial IntelligenceFeb-8-2021

There has recently been significant interest in training reinforcement learning (RL) agents in vision-based environments. This poses many challenges, such as high dimensionality and potential for observational overfitting through spurious correlations. A promising approach to solve both of these problems is a self-attention bottleneck, which provides a simple and effective framework for learning high performing policies, even in the presence of distractions. However, due to poor scalability of attention architectures, these methods do not scale beyond low resolution visual inputs, using large patches (thus small attention matrices). In this paper we make use of new efficient attention algorithms, recently shown to be highly effective for Transformers, and demonstrate that these new techniques can be applied in the RL setting. This allows our attention-based controllers to scale to larger visual inputs, and facilitate the use of smaller patches, even individual pixels, improving generalization. In addition, we propose a new efficient algorithm approximating softmax attention with what we call hybrid random features, leveraging the theory of angular kernels. We show theoretically and empirically that hybrid random features is a promising approach when using attention for vision-based RL.

exp, reinforcement learning, trig, (15 more...)

arXiv.org Artificial Intelligence

2102.04353

Country:

North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.14)
North America > United States > Nevada > Clark County > Las Vegas (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(7 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.89)

Add feedback