AITopics

Country:

North America > United States > New York > Tompkins County > Ithaca (0.04)
North America > United States > Indiana > Tippecanoe County > West Lafayette (0.04)
North America > United States > Indiana > Tippecanoe County > Lafayette (0.04)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science (0.93)

Neural Information Processing SystemsFeb-7-2026, 13:06:05 GMT

ExplicitEigenvalueRegularizationImproves Sharpness-AwareMinimization

artificial intelligence, justification, machine learning, (19 more...)

Country:

Oceania > Australia (0.04)
Europe > Spain > Andalusia > Granada Province > Granada (0.04)
Europe > Austria (0.04)
(2 more...)

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Neural Information Processing SystemsNov-20-2025, 20:37:18 GMT

Expanding Holographic Embeddings for Knowledge Completion

Yexiang Xue, Yang Yuan, Zhitian Xu, Ashish Sabharwal

KGs can be represented as a multigraph, where entities such as Bill Gates and Seattle are nodes, connected with zero or more relations such as livesIn and likes .

artificial intelligence, machine learning, vector, (14 more...)

Country:

North America > United States > New York > Tompkins County > Ithaca (0.04)
North America > United States > Indiana > Tippecanoe County > West Lafayette (0.04)
North America > United States > Indiana > Tippecanoe County > Lafayette (0.04)
(2 more...)

Industry: Government > Regional Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science (0.93)

Neural Information Processing SystemsOct-9-2025, 17:52:59 GMT

Explicit Eigenvalue Regularization Improves Sharpness-A ware Minimization

algorithm, alignment, eigenvalue, (15 more...)

Country:

Oceania > Australia (0.04)
Europe > Spain > Andalusia > Granada Province > Granada (0.04)
Europe > Austria (0.04)
(2 more...)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Chia, Xin Wei, Pan, Jonathan

Probing Latent Subspaces in LLM for AI Security: Identifying and Manipulating Adversarial States

arXiv.org Artificial IntelligenceMar-12-2025

Large Language Models (LLMs) have demonstrated remarkable capabilities across various tasks, yet they remain vulnerable to adversarial manipulations such as jailbreaking via prompt injection attacks. These attacks bypass safety mechanisms to generate restricted or harmful content. In this study, we investigated the underlying latent subspaces of safe and jailbroken states by extracting hidden activations from a LLM. Inspired by attractor dynamics in neuroscience, we hypothesized that LLM activations settle into semi stable states that can be identified and perturbed to induce state transitions. Using dimensionality reduction techniques, we projected activations from safe and jailbroken responses to reveal latent subspaces in lower dimensional spaces. We then derived a perturbation vector that when applied to safe representations, shifted the model towards a jailbreak state. Our results demonstrate that this causal intervention results in statistically significant jailbreak responses in a subset of prompts. Next, we probed how these perturbations propagate through the model's layers, testing whether the induced state change remains localized or cascades throughout the network. Our findings indicate that targeted perturbations induced distinct shifts in activations and model responses. Our approach paves the way for potential proactive defenses, shifting from traditional guardrail based methods to preemptive, model agnostic techniques that neutralize adversarial states at the representation level.

activation, perturbation, representation, (16 more...)

2503.09066

Country: Asia > Singapore (0.04)

Genre: Research Report > New Finding (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine > Therapeutic Area > Neurology (0.49)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

arXiv.org Artificial IntelligenceJan-22-2025

Explicit Eigenvalue Regularization Improves Sharpness-Aware Minimization

Luo, Haocheng, Truong, Tuan, Pham, Tung, Harandi, Mehrtash, Phung, Dinh, Le, Trung

Sharpness-Aware Minimization (SAM) has attracted significant attention for its effectiveness in improving generalization across various tasks. However, its underlying principles remain poorly understood. In this work, we analyze SAM's training dynamics using the maximum eigenvalue of the Hessian as a measure of sharpness, and propose a third-order stochastic differential equation (SDE), which reveals that the dynamics are driven by a complex mixture of second- and third-order terms. We show that alignment between the perturbation vector and the top eigenvector is crucial for SAM's effectiveness in regularizing sharpness, but find that this alignment is often inadequate in practice, limiting SAM's efficiency. Building on these insights, we introduce Eigen-SAM, an algorithm that explicitly aims to regularize the top Hessian eigenvalue by aligning the perturbation vector with the leading eigenvector. We validate the effectiveness of our theory and the practical advantages of our proposed approach through comprehensive experiments. Code is available at https://github.com/RitianLuo/EigenSAM.

artificial intelligence, eigenvalue, machine learning, (18 more...)

2501.12666

Country:

Oceania > Australia (0.04)
Europe > Spain > Andalusia > Granada Province > Granada (0.04)
Europe > Austria (0.04)
(2 more...)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

arXiv.org Machine LearningNov-6-2024

Improved Regret of Linear Ensemble Sampling

Lee, Harin, Oh, Min-hwan

In this work, we close the fundamental gap of theory and practice by providing an improved regret bound for linear ensemble sampling. We prove that with an ensemble size logarithmic in $T$, linear ensemble sampling can achieve a frequentist regret bound of $\tilde{\mathcal{O}}(d^{3/2}\sqrt{T})$, matching state-of-the-art results for randomized linear bandit algorithms, where $d$ and $T$ are the dimension of the parameter and the time horizon respectively. Our approach introduces a general regret analysis framework for linear bandit algorithms. Additionally, we reveal a significant relationship between linear ensemble sampling and Linear Perturbed-History Exploration (LinPHE), showing that LinPHE is a special case of linear ensemble sampling when the ensemble size equals $T$. This insight allows us to derive a new regret bound of $\tilde{\mathcal{O}}(d^{3/2}\sqrt{T})$ for LinPHE, independent of the number of arms. Our contributions advance the theoretical foundation of ensemble sampling, bringing its regret bounds in line with the best known bounds for other randomized exploration algorithms.

ensemble, linear ensemble, probability, (17 more...)

arXiv.org Machine Learning

2411.03932

Country:

Asia > South Korea > Seoul > Seoul (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Data Science > Data Mining > Big Data (0.86)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)

arXiv.org Artificial IntelligenceAug-20-2024

Towards Robust Knowledge Unlearning: An Adversarial Framework for Assessing and Improving Unlearning Robustness in Large Language Models

Yuan, Hongbang, Jin, Zhuoran, Cao, Pengfei, Chen, Yubo, Liu, Kang, Zhao, Jun

LLM have achieved success in many fields but still troubled by problematic content in the training corpora. LLM unlearning aims at reducing their influence and avoid undesirable behaviours. However, existing unlearning methods remain vulnerable to adversarial queries and the unlearned knowledge resurfaces after the manually designed attack queries. As part of a red-team effort to proactively assess the vulnerabilities of unlearned models, we design Dynamic Unlearning Attack (DUA), a dynamic and automated framework to attack these models and evaluate their robustness. It optimizes adversarial suffixes to reintroduce the unlearned knowledge in various scenarios. We find that unlearned knowledge can be recovered in $55.2\%$ of the questions, even without revealing the unlearned model's parameters. In response to this vulnerability, we propose Latent Adversarial Unlearning (LAU), a universal framework that effectively enhances the robustness of the unlearned process. It formulates the unlearning process as a min-max optimization problem and resolves it through two stages: an attack stage, where perturbation vectors are trained and added to the latent space of LLMs to recover the unlearned knowledge, and a defense stage, where previously trained perturbation vectors are used to enhance unlearned model's robustness. With our LAU framework, we obtain two robust unlearning methods, AdvGA and AdvNPO. We conduct extensive experiments across multiple unlearning benchmarks and various models, and demonstrate that they improve the unlearning effectiveness by over $53.5\%$, cause only less than a $11.6\%$ reduction in neighboring knowledge, and have almost no impact on the model's general capabilities.

knowledge, unlearned model, unlearning, (13 more...)

2408.10682

Country:

Europe > Austria > Vienna (0.14)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
North America > United States > Maine (0.04)
(8 more...)

Genre: Research Report (0.82)

Industry:

Information Technology > Security & Privacy (1.00)
Leisure & Entertainment (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

arXiv.org Artificial IntelligenceMay-9-2024

Enhancing Scalability of Metric Differential Privacy via Secret Dataset Partitioning and Benders Decomposition

Qiu, Chenxi

Metric Differential Privacy (mDP) extends the concept of Differential Privacy (DP) to serve as a new paradigm of data perturbation. It is designed to protect secret data represented in general metric space, such as text data encoded as word embeddings or geo-location data on the road network or grid maps. To derive an optimal data perturbation mechanism under mDP, a widely used method is linear programming (LP), which, however, might suffer from a polynomial explosion of decision variables, rendering it impractical in large-scale mDP. In this paper, our objective is to develop a new computation framework to enhance the scalability of the LP-based mDP. Considering the connections established by the mDP constraints among the secret records, we partition the original secret dataset into various subsets. Building upon the partition, we reformulate the LP problem for mDP and solve it via Benders Decomposition, which is composed of two stages: (1) a master program to manage the perturbation calculation across subsets and (2) a set of subproblems, each managing the perturbation derivation within a subset. Our experimental results on multiple datasets, including geo-location data in the road network/grid maps, text data, and synthetic data, underscore our proposed mechanism's superior scalability and efficiency.

constraint, dataset, subproblem, (15 more...)

2405.04344

Country:

North America > United States > Texas (0.14)
North America > United States > New York > New York County > New York City (0.04)
Europe > Italy > Lazio > Rome (0.04)
(5 more...)

Genre: Research Report (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Transportation > Infrastructure & Services (0.69)
Transportation > Ground > Road (0.69)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.66)

Bahwal, Obai, Kosut, Oliver, Sankar, Lalitha

An Adversarial Approach to Evaluating the Robustness of Event Identification Models

arXiv.org Artificial IntelligenceApr-22-2024

Intelligent machine learning approaches are finding active use for event detection and identification that allow real-time situational awareness. Yet, such machine learning algorithms have been shown to be susceptible to adversarial attacks on the incoming telemetry data. This paper considers a physics-based modal decomposition method to extract features for event classification and focuses on interpretable classifiers including logistic regression and gradient boosting to distinguish two types of events: load loss and generation loss. The resulting classifiers are then tested against an adversarial algorithm to evaluate their robustness. The adversarial attack is tested in two settings: the white box setting, wherein the attacker knows exactly the classification model; and the gray box setting, wherein the attacker has access to historical data from the same network as was used to train the classifier, but does not know the classification model. Thorough experiments on the synthetic South Carolina 500-bus system highlight that a relatively simpler model such as logistic regression is more susceptible to adversarial attacks than gradient boosting.

classification model, classifier, pmus, (16 more...)

2402.12338

Country:

North America > United States > South Carolina (0.25)
Asia (0.04)
North America > United States > Arizona (0.04)

Genre: Research Report (1.00)

Industry:

Government > Military (0.96)
Energy > Power Industry (0.95)
Information Technology > Security & Privacy (0.77)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.70)