AITopics | Wright, Stephen

Collaborating Authors

Wright, Stephen

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Blended Conditional Gradients: the unconditioning of conditional gradients

Braun, Gábor, Pokutta, Sebastian, Tu, Dan, Wright, Stephen

arXiv.org Artificial IntelligenceMar-21-2025

We present a blended conditional gradient approach for minimizing a smooth convex function over a polytope P, combining the Frank--Wolfe algorithm (also called conditional gradient) with gradient-based steps, different from away steps and pairwise steps, but still achieving linear convergence for strongly convex functions, along with good practical performance. Our approach retains all favorable properties of conditional gradient algorithms, notably avoidance of projections onto P and maintenance of iterates as sparse convex combinations of a limited number of extreme points of P. The algorithm is lazy, making use of inexpensive inexact solutions of the linear programming subproblem that characterizes the conditional gradient approach. It decreases measures of optimality (primal and dual gaps) rapidly, both in the number of iterations and in wall-clock time, outperforming even the lazy conditional gradient algorithms of [arXiv:1410.8816]. We also present a streamlined version of the algorithm for the probability simplex.

artificial intelligence, iteration, machine learning, (18 more...)

arXiv.org Artificial Intelligence

1805.07311

Country: North America > United States > Wisconsin > Dane County > Madison (0.14)

Genre: Research Report (1.00)

Industry: Government > Military (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.66)

Add feedback

A Mathematical Framework, a Taxonomy of Modeling Paradigms, and a Suite of Learning Techniques for Neural-Symbolic Systems

Dickens, Charles, Pryor, Connor, Gao, Changyu, Albalak, Alon, Augustine, Eriq, Wang, William, Wright, Stephen, Getoor, Lise

arXiv.org Artificial IntelligenceJul-12-2024

The field of Neural-Symbolic (NeSy) systems is growing rapidly. Proposed approaches show great promise in achieving symbiotic unions of neural and symbolic methods. However, each NeSy system differs in fundamental ways. There is a pressing need for a unifying theory to illuminate the commonalities and differences in approaches and enable further progress. In this paper, we introduce Neural-Symbolic Energy-Based Models (NeSy-EBMs), a unifying mathematical framework for discriminative and generative modeling with probabilistic and non-probabilistic NeSy approaches. We utilize NeSy-EBMs to develop a taxonomy of modeling paradigms focusing on a system's neural-symbolic interface and reasoning capabilities. Additionally, we introduce a suite of learning techniques for NeSy-EBMs. Importantly, NeSy-EBMs allow the derivation of general expressions for gradients of prominent learning losses, and we provide four learning approaches that leverage methods from multiple domains, including bilevel and stochastic policy optimization. Finally, we present Neural Probabilistic Soft Logic (NeuPSL), an open-source NeSy-EBM library designed for scalability and expressivity, facilitating real-world application of NeSy systems. Through extensive empirical analysis across multiple datasets, we demonstrate the practical advantages of NeSy-EBMs in various tasks, including image classification, graph node labeling, autonomous vehicle situation awareness, and question answering.

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2407.09693

Country:

Europe (0.67)
North America > United States > California > Santa Cruz County > Santa Cruz (0.14)
North America > United States > Wisconsin > Dane County > Madison (0.14)
North America > United States > California > Santa Barbara County > Santa Barbara (0.14)

Genre:

Overview (0.92)
Research Report > New Finding (0.45)
Research Report > Experimental Study (0.45)

Industry:

Education (0.92)
Health & Medicine (0.67)
Government > Military (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
(6 more...)

Add feedback

Convex and Bilevel Optimization for Neuro-Symbolic Inference and Learning

Dickens, Charles, Gao, Changyu, Pryor, Connor, Wright, Stephen, Getoor, Lise

arXiv.org Artificial IntelligenceJan-17-2024

Further, we propose a novel inference algorithm and establish theoretical properties for a state-of-the-art NeSy system that are crucial for learning. Our proposed learning framework builds upon NeSy energy-based models (NeSy-EBMs) (Pryor et al., 2023), a general class of NeSy systems that encompasses a variety of existing NeSy methods, including DeepProblog (Manhaeve et al., 2018; 2021), SATNet (Wang et al., 2019), logic tensor networks (Badreddine et al., 2022), and NeuPSL (Pryor et al., 2023). NeSy-EBMs use neural network outputs to parameterize an energy function and formulate an inference problem that may be non-smooth and constrained. Thus, predictions are not guaranteed to be a function of the inputs and parameters with an explicit form or to be differentiable, and traditional deep learning techniques are not directly applicable. We therefore equivalently formulate NeSy-EBM learning as a bilevel problem and, to support smooth first-order gradient-based optimization, propose a smoothing strategy that is novel to NeSy learning. Specifically, we replace the constrained NeSy energy function with its Moreau envelope. The augmented Lagrangian method for equality-constrained minimization is then applied with the new formulation.

artificial intelligence, machine learning, step length, (16 more...)

arXiv.org Artificial Intelligence

2401.09651

Country:

North America > United States > California > Santa Cruz County > Santa Cruz (0.14)
North America > United States > Wisconsin > Dane County > Madison (0.14)

Genre: Research Report (0.63)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

Optimally Teaching a Linear Behavior Cloning Agent

Bharti, Shubham Kumar, Wright, Stephen, Singla, Adish, Zhu, Xiaojin

arXiv.org Artificial IntelligenceNov-26-2023

We study optimal teaching of Linear Behavior Cloning (LBC) learners. In this setup, the teacher can select which states to demonstrate to an LBC learner. The learner maintains a version space of infinite linear hypotheses consistent with the demonstration. The goal of the teacher is to teach a realizable target policy to the learner using minimum number of state demonstrations. This number is known as the Teaching Dimension(TD). We present a teaching algorithm called ``Teach using Iterative Elimination(TIE)" that achieves instance optimal TD. However, we also show that finding optimal teaching set computationally is NP-hard. We further provide an approximation algorithm that guarantees an approximation ratio of $\log(|A|-1)$ on the teaching dimension. Finally, we provide experimental results to validate the efficiency and effectiveness of our algorithm.

cone, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

2311.15399

Country: North America > United States > Wisconsin (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.49)

Add feedback

Cut your Losses with Squentropy

Hui, Like, Belkin, Mikhail, Wright, Stephen

arXiv.org Artificial IntelligenceFeb-8-2023

Nearly all practical neural models for classification are trained using cross-entropy loss. Yet this ubiquitous choice is supported by little theoretical or empirical evidence. Recent work (Hui & Belkin, 2020) suggests that training using the (rescaled) square loss is often superior in terms of the classification accuracy. In this paper we propose the "squentropy" loss, which is the sum of two terms: the cross-entropy loss and the average square loss over the incorrect classes. We provide an extensive set of experiments on multi-class classification problems showing that the squentropy loss outperforms both the pure cross entropy and rescaled square losses in terms of the classification accuracy. We also demonstrate that it provides significantly better model calibration than either of these alternative losses and, furthermore, has less variance with respect to the random initialization. Additionally, in contrast to the square loss, squentropy loss can typically be trained using exactly the same optimization parameters, including the learning rate, as the standard cross-entropy loss, making it a true "plug-and-play" replacement. Finally, unlike the rescaled square loss, multiclass squentropy contains no parameters that need to be adjusted.

artificial intelligence, machine learning, squentropy, (17 more...)

arXiv.org Artificial Intelligence

2302.03952

Country: North America > United States (0.93)

Genre: Research Report (0.82)

Industry: Health & Medicine > Therapeutic Area (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

A Fully First-Order Method for Stochastic Bilevel Optimization

Kwon, Jeongyeol, Kwon, Dohyun, Wright, Stephen, Nowak, Robert

arXiv.org Artificial IntelligenceJan-26-2023

We consider stochastic unconstrained bilevel optimization problems when only the first-order gradient oracles are available. While numerous optimization methods have been proposed for tackling bilevel problems, existing methods either tend to require possibly expensive calculations regarding Hessians of lower-level objectives, or lack rigorous finite-time performance guarantees. In this work, we propose a Fully First-order Stochastic Approximation (F2SA) method, and study its non-asymptotic convergence properties. Specifically, we show that F2SA converges to an $\epsilon$-stationary solution of the bilevel problem after $\epsilon^{-7/2}, \epsilon^{-5/2}$, and $\epsilon^{-3/2}$ iterations (each iteration using $O(1)$ samples) when stochastic noises are in both level objectives, only in the upper-level objective, and not present (deterministic settings), respectively. We further show that if we employ momentum-assisted gradient estimators, the iteration complexities can be improved to $\epsilon^{-5/2}, \epsilon^{-4/2}$, and $\epsilon^{-3/2}$, respectively. We demonstrate even superior practical performance of the proposed method over existing second-order based approaches on MNIST data-hypercleaning experiments.

artificial intelligence, machine learning, optimization, (17 more...)

arXiv.org Artificial Intelligence

2301.10945

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

On the Global Convergence of Gradient Descent for multi-layer ResNets in the mean-field regime

Ding, Zhiyan, Chen, Shi, Li, Qin, Wright, Stephen

arXiv.org Machine LearningOct-6-2021

Finding the optimal configuration of parameters in ResNet is a nonconvex minimization problem, but first-order methods nevertheless find the global optimum in the overparameterized regime. We study this phenomenon with mean-field analysis, by translating the training process of ResNet to a gradient-flow partial differential equation (PDE) and examining the convergence properties of this limiting process. The activation function is assumed to be $2$-homogeneous or partially $1$-homogeneous; the regularized ReLU satisfies the latter condition. We show that if the ResNet is sufficiently large, with depth and width depending algebraically on the accuracy and confidence levels, first-order optimization methods can find global minimizers that fit the training data.

artificial intelligence, machine learning, neural network, (17 more...)

arXiv.org Machine Learning

2110.02926

Country: North America > United States > Wisconsin > Dane County > Madison (0.14)

Genre: Research Report (0.63)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.50)

Add feedback

Overparameterization of deep ResNet: zero loss and mean-field analysis

Ding, Zhiyan, Chen, Shi, Li, Qin, Wright, Stephen

arXiv.org Machine LearningMay-29-2021

Finding parameters in a deep neural network (NN) that fit training data is a nonconvex optimization problem, but a basic first-order optimization method (gradient descent) finds a global optimizer with perfect fit (zero-loss) in many practical situations. We examine this phenomenon for the case of Residual Neural Networks (ResNet) with smooth activation functions in a limiting regime in which both the number of layers (depth) and the number of neurons in each layer (width) go to infinity. First, we use a mean-field-limit argument to prove that the gradient descent for parameter training becomes a gradient flow for a probability distribution that is characterized by a partial differential equation (PDE) in the large-NN limit. Next, we show that the solution to the PDE converges in the training time to a zero-loss solution. Together, these results imply that the training of the ResNet gives a near-zero loss if the ResNet is large enough. We give estimates of the depth and width needed to reduce the loss below a given threshold, with high probability.

deep learning, gradient descent, neural network, (18 more...)

arXiv.org Machine Learning

2105.14417

Country: North America > United States > Wisconsin > Dane County > Madison (0.14)

Genre: Research Report (0.81)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.55)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Stochastic Learning for Sparse Discrete Markov Random Fields with Controlled Gradient Approximation Error

Geng, Sinong, Kuang, Zhaobin, Liu, Jie, Wright, Stephen, Page, David

arXiv.org Machine LearningMay-12-2020

We study the $L_1$-regularized maximum likelihood estimator/estimation (MLE) problem for discrete Markov random fields (MRFs), where efficient and scalable learning requires both sparse regularization and approximate inference. To address these challenges, we consider a stochastic learning framework called stochastic proximal gradient (SPG; Honorio 2012a, Atchade et al. 2014,Miasojedow and Rejchel 2016). SPG is an inexact proximal gradient algorithm [Schmidtet al., 2011], whose inexactness stems from the stochastic oracle (Gibbs sampling) for gradient approximation - exact gradient evaluation is infeasible in general due to the NP-hard inference problem for discrete MRFs [Koller and Friedman, 2009]. Theoretically, we provide novel verifiable bounds to inspect and control the quality of gradient approximation. Empirically, we propose the tighten asymptotically (TAY) learning strategy based on the verifiable bounds to boost the performance of SPG.

health & medicine, iteration, oncology, (17 more...)

arXiv.org Machine Learning

2005.06083

Country: North America > United States (0.68)

Genre: Research Report (0.50)

Industry: Health & Medicine (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

k-Support and Ordered Weighted Sparsity for Overlapping Groups: Hardness and Algorithms

Lim, Cong Han, Wright, Stephen

Neural Information Processing SystemsFeb-14-2020, 05:28:04 GMT

The k-support and OWL norms generalize the l1 norm, providing better prediction accuracy and better handling of correlated variables. We study the norms obtained from extending the k-support norm and OWL norms to the setting in which there are overlapping groups. The resulting norms are in general NP-hard to compute, but they are tractable for certain collections of groups. To demonstrate this fact, we develop a dynamic program for the problem of projecting onto the set of vectors supported by a fixed number of groups. This program can be converted to an extended formulation which, for the associated group structure, models the k-group support norms and an overlapping group variant of the ordered weighted l1 norm.

artificial intelligence, hardness and algorithm, machine learning, (1 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.52)

Add feedback