AITopics | Xue, Anton

Plotting

Xue, Anton

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

On The Concurrence of Layer-wise Preconditioning Methods and Provable Feature Learning

Zhang, Thomas T., Moniri, Behrad, Nagwekar, Ansh, Rahman, Faraz, Xue, Anton, Hassani, Hamed, Matni, Nikolai

arXiv.org Machine LearningFeb-3-2025

Layer-wise preconditioning methods are a family of memory-efficient optimization algorithms that introduce preconditioners per axis of each layer's weight tensors. These methods have seen a recent resurgence, demonstrating impressive performance relative to entry-wise ("diagonal") preconditioning methods such as Adam(W) on a wide range of neural network optimization tasks. Complementary to their practical performance, we demonstrate that layer-wise preconditioning methods are provably necessary from a statistical perspective. To showcase this, we consider two prototypical models, linear representation learning and single-index learning, which are widely used to study how typical algorithms efficiently learn useful features to enable generalization. In these problems, we show SGD is a suboptimal feature learner when extending beyond ideal isotropic inputs $\mathbf{x} \sim \mathsf{N}(\mathbf{0}, \mathbf{I})$ and well-conditioned settings typically assumed in prior work. We demonstrate theoretically and numerically that this suboptimality is fundamental, and that layer-wise preconditioning emerges naturally as the solution. We further show that standard tools like Adam preconditioning and batch-norm only mildly mitigate these issues, supporting the unique benefits of layer-wise preconditioning.

artificial intelligence, international conference, machine learning, (14 more...)

arXiv.org Machine Learning

2502.01763

Country: North America > United States (0.14)

Genre: Research Report (0.81)

Industry: Education (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

The FIX Benchmark: Extracting Features Interpretable to eXperts

Jin, Helen, Havaldar, Shreya, Kim, Chaehyeon, Xue, Anton, You, Weiqiu, Qu, Helen, Gatti, Marco, Hashimoto, Daniel A, Jain, Bhuvnesh, Madani, Amin, Sako, Masao, Ungar, Lyle, Wong, Eric

arXiv.org Artificial IntelligenceDec-23-2024

Feature-based methods are commonly used to explain model predictions, but these methods often implicitly assume that interpretable features are readily available. However, this is often not the case for high-dimensional data, and it can be hard even for domain experts to mathematically specify which features are important. Can we instead automatically extract collections or groups of features that are aligned with expert knowledge? To address this gap, we present FIX (Features Interpretable to eXperts), a benchmark for measuring how well a collection of features aligns with expert knowledge. In collaboration with domain experts, we propose FIXScore, a unified expert alignment measure applicable to diverse real-world settings across cosmology, psychology, and medicine domains in vision, language, and time series data modalities. With FIXScore, we find that popular feature-based explanation methods have poor alignment with expert-specified knowledge, highlighting the need for new methods that can better identify features interpretable to experts.

data mining, machine learning, natural language, (24 more...)

arXiv.org Artificial Intelligence

2409.13684

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
North America > Canada > Ontario > Toronto (0.14)
North America > United States > California (0.14)
North America > Mexico > Mexico City (0.14)

Genre: Overview (0.67)

Industry:

Health & Medicine > Diagnostic Medicine > Imaging (1.00)
Health & Medicine > Therapeutic Area > Gastroenterology (0.68)
Law (0.67)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(6 more...)

Add feedback

AR-Pro: Counterfactual Explanations for Anomaly Repair with Formal Properties

Ji, Xiayan, Xue, Anton, Wong, Eric, Sokolsky, Oleg, Lee, Insup

arXiv.org Artificial IntelligenceOct-31-2024

Anomaly detection is widely used for identifying critical errors and suspicious behaviors, but current methods lack interpretability. We leverage common properties of existing methods and recent advances in generative models to introduce counterfactual explanations for anomaly detection. Given an input, we generate its counterfactual as a diffusion-based repair that shows what a non-anomalous version should have looked like. A key advantage of this approach is that it enables a domain-independent formal specification of explainability desiderata, offering a unified framework for generating and evaluating explanations. We demonstrate the effectiveness of our anomaly explainability framework, AR-Pro, on vision (MVTec, VisA) and time-series (SWaT, WADI, HAI) anomaly datasets. The code used for the experiments is accessible at: https://github.com/xjiae/arpro.

data mining, large language model, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2410.24178

Country: North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.14)

Genre: Research Report > New Finding (0.67)

Industry:

Health & Medicine (1.00)
Information Technology > Security & Privacy (0.68)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(3 more...)

Add feedback

Logicbreaks: A Framework for Understanding Subversion of Rule-based Inference

Xue, Anton, Khare, Avishree, Alur, Rajeev, Goel, Surbhi, Wong, Eric

arXiv.org Artificial IntelligenceJun-21-2024

We study how to subvert language models from following the rules. We model rule-following as inference in propositional Horn logic, a mathematical system in which rules have the form "if $P$ and $Q$, then $R$" for some propositions $P$, $Q$, and $R$. We prove that although transformers can faithfully abide by such rules, maliciously crafted prompts can nevertheless mislead even theoretically constructed models. Empirically, we find that attacks on our theoretical models mirror popular attacks on large language models. Our work suggests that studying smaller theoretical models can help understand the behavior of large language models in rule-based settings like logical reasoning and jailbreak attacks.

arxiv preprint arxiv, large language model, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2407.00075

Country:

North America > United States (0.14)
North America > Canada (0.14)

Genre: Research Report (0.82)

Industry: Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Chordal Sparsity for SDP-based Neural Network Verification

Xue, Anton, Lindemann, Lars, Alur, Rajeev

arXiv.org Artificial IntelligenceJan-8-2024

Neural networks are central to many emerging technologies, but verifying their correctness remains a major challenge. It is known that network outputs can be sensitive and fragile to even small input perturbations, thereby increasing the risk of unpredictable and undesirable behavior. Fast and accurate verification of neural networks is therefore critical to their widespread adoption, and in recent years, various methods have been developed as a response to this problem. In this paper, we focus on improving semidefinite programming (SDP) based techniques for neural network verification. Such techniques offer the power of expressing complex geometric constraints while retaining a convex problem formulation, but scalability remains a major issue in practice. Our starting point is the DeepSDP framework proposed by Fazlyab et al., which uses quadratic constraints to abstract the verification problem into a large-scale SDP. However, solving this SDP quickly becomes intractable when the network grows. Our key observation is that by leveraging chordal sparsity, we can decompose the primary computational bottleneck of DeepSDP -- a large linear matrix inequality (LMI) -- into an equivalent collection of smaller LMIs. We call our chordally sparse optimization program Chordal-DeepSDP and prove that its construction is identically expressive as that of DeepSDP. Moreover, we show that additional analysis of Chordal-DeepSDP allows us to further rewrite its collection of LMIs in a second level of decomposition that we call Chordal-DeepSDP-2 -- which results in another significant computational gain. Finally, we provide numerical experiments on real networks of learned cart-pole dynamics, showcasing the computational advantage of Chordal-DeepSDP and Chordal-DeepSDP-2 over DeepSDP.

artificial intelligence, deepsdp, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2206.03482

Country:

North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.14)
North America > United States > California > Los Angeles County > Los Angeles (0.14)

Genre: Research Report (0.82)

Industry: Energy (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Chordal Sparsity for Lipschitz Constant Estimation of Deep Neural Networks

Xue, Anton, Lindemann, Lars, Robey, Alexander, Hassani, Hamed, Pappas, George J., Alur, Rajeev

arXiv.org Artificial IntelligenceJan-8-2024

Lipschitz constants of neural networks allow for guarantees of robustness in image classification, safety in controller design, and generalizability beyond the training data. As calculating Lipschitz constants is NP-hard, techniques for estimating Lipschitz constants must navigate the trade-off between scalability and accuracy. In this work, we significantly push the scalability frontier of a semidefinite programming technique known as LipSDP while achieving zero accuracy loss. We first show that LipSDP has chordal sparsity, which allows us to derive a chordally sparse formulation that we call Chordal-LipSDP. The key benefit is that the main computational bottleneck of LipSDP, a large semidefinite constraint, is now decomposed into an equivalent collection of smaller ones: allowing Chordal-LipSDP to outperform LipSDP particularly as the network depth grows. Moreover, our formulation uses a tunable sparsity parameter that enables one to gain tighter estimates without incurring a significant computational cost. We illustrate the scalability of our approach through extensive numerical experiments.

artificial intelligence, lipschitz constant, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2204.00846

Country: North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.14)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.41)

Add feedback

Stability Guarantees for Feature Attributions with Multiplicative Smoothing

Xue, Anton, Alur, Rajeev, Wong, Eric

arXiv.org Artificial IntelligenceOct-26-2023

Explanation methods for machine learning models tend not to provide any formal guarantees and may not reflect the underlying decision-making process. In this work, we analyze stability as a property for reliable feature attribution methods. We prove that relaxed variants of stability are guaranteed if the model is sufficiently Lipschitz with respect to the masking of features. We develop a smoothing method called Multiplicative Smoothing (MuS) to achieve such a model. We show that MuS overcomes the theoretical limitations of standard smoothing techniques and can be integrated with any classifier and feature attribution method. We evaluate MuS on vision and language models with various feature attribution methods, such as LIME and SHAP, and demonstrate that MuS endows feature attributions with non-trivial stability guarantees.

artificial intelligence, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2307.05902

Country:

North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.14)
Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report (0.64)

Industry:

Law (0.46)
Health & Medicine > Diagnostic Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Vision (0.96)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback