AITopics | Chen, Thomas

Collaborating Authors

Chen, Thomas

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Zero loss guarantees and explicit minimizers for generic overparametrized Deep Learning networks

Chen, Thomas, Moore, Andrew G.

arXiv.org Machine LearningFeb-19-2025

We determine sufficient conditions for overparametrized deep learning (DL) networks to guarantee the attainability of zero loss in the context of supervised learning, for the $\mathcal{L}^2$ cost and {\em generic} training data. We present an explicit construction of the zero loss minimizers without invoking gradient descent. On the other hand, we point out that increase of depth can deteriorate the efficiency of cost minimization using a gradient descent algorithm by analyzing the conditions for rank loss of the training Jacobian. Our results clarify key aspects on the dichotomy between zero loss reachability in underparametrized versus overparametrized DL.

artificial intelligence, full rank, machine learning, (17 more...)

arXiv.org Machine Learning

2502.14114

Country:

Europe (0.28)
North America > United States > Texas (0.14)

Genre: Research Report (0.72)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Learning Symbolic Task Decompositions for Multi-Agent Teams

Shah, Ameesh, Lauffer, Niklas, Chen, Thomas, Pitta, Nikhil, Seshia, Sanjit A.

arXiv.org Artificial IntelligenceFeb-18-2025

One approach for improving sample efficiency in cooperative multi-agent learning is to decompose overall tasks into sub-tasks that can be assigned to individual agents. We study this problem in the context of reward machines: symbolic tasks that can be formally decomposed into sub-tasks. In order to handle settings without a priori knowledge of the environment, we introduce a framework that can learn the optimal decomposition from model-free interactions with the environment. Our method uses a task-conditioned architecture to simultaneously learn an optimal decomposition and the corresponding agents' policies for each sub-task. In doing so, we remove the need for a human to manually design the optimal decomposition while maintaining the sample-efficiency benefits of improved credit assignment. We provide experimental results in several deep reinforcement learning settings, demonstrating the efficacy of our approach. Our results indicate that our approach succeeds even in environments with codependent agent dynamics, enabling synchronous multi-agent learning not achievable in previous works.

artificial intelligence, decomposition, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2502.13376

Country:

Europe (0.93)
North America > United States > Michigan (0.14)
North America > United States > Massachusetts (0.14)
North America > United States > California (0.14)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.34)

Add feedback

Derivation of effective gradient flow equations and dynamical truncation of training data in Deep Learning

Chen, Thomas

arXiv.org Machine LearningJan-13-2025

We derive explicit equations governing the cumulative biases and weights in Deep Learning with ReLU activation function, based on gradient descent for the Euclidean cost in the input layer, and under the assumption that the weights are, in a precise sense, adapted to the coordinate system distinguished by the activations. We show that gradient descent corresponds to a dynamical process in the input layer, whereby clusters of data are progressively reduced in complexity ("truncated") at an exponential rate that increases with the number of data points that have already been truncated. We provide a detailed discussion of several types of solutions to the gradient flow equations. A main motivation for this work is to shed light on the interpretability question in supervised learning.

artificial intelligence, gradient flow, machine learning, (13 more...)

arXiv.org Machine Learning

2501.074

Country: North America > United States > Texas > Travis County > Austin (0.14)

Genre: Research Report (0.40)

Industry: Energy > Oil & Gas > Upstream (0.62)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.84)

Add feedback

Interpretable global minima of deep ReLU neural networks on sequentially separable data

Chen, Thomas, Ewald, Patricia Muñoz

arXiv.org Machine LearningMay-11-2024

We explicitly construct zero loss neural network classifiers. We write the weight matrices and bias vectors in terms of cumulative parameters, which determine truncation maps acting recursively on input space. The configurations for the training data considered are (i) sufficiently small, well separated clusters corresponding to each class, and (ii) equivalence classes which are sequentially linearly separable. In the best case, for $Q$ classes of data in $\mathbb{R}^M$, global minimizers can be described with $Q(M+2)$ parameters.

artificial intelligence, machine learning, neural network, (18 more...)

arXiv.org Machine Learning

2405.07098

Country: North America > United States > Texas > Travis County > Austin (0.14)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Global $\mathcal{L}^2$ minimization at uniform exponential rate via geometrically adapted gradient descent in Deep Learning

Chen, Thomas

arXiv.org Machine LearningDec-31-2023

We consider the gradient descent flow widely used for the minimization of the $\mathcal{L}^2$ cost function in Deep Learning networks, and introduce two modified versions; one adapted for the overparametrized setting, and the other for the underparametrized setting. Both have a clear and natural invariant geometric meaning, taking into account the pullback vector bundle structure in the overparametrized, and the pushforward vector bundle structure in the underparametrized setting. In the overparametrized case, we prove that, provided that a rank condition holds, all orbits of the modified gradient descent drive the $\mathcal{L}^2$ cost to its global minimum at a uniform exponential convergence rate; one thereby obtains an a priori stopping time for any prescribed proximity to the global minimum. We point out relations of the latter to sub-Riemannian geometry.

artificial intelligence, gradient flow, machine learning, (15 more...)

arXiv.org Machine Learning

2311.15487

Country: North America > United States > Texas (0.28)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.91)

Add feedback

Geometric structure of Deep Learning networks and construction of global ${\mathcal L}^2$ minimizers

Chen, Thomas, Ewald, Patricia Muñoz

arXiv.org Machine LearningDec-17-2023

In this paper, we provide a geometric interpretation of the structure of Deep Learning (DL) networks, characterized by $L$ hidden layers, a ReLU ramp activation function, an $\mathcal{L}^2$ Schatten class (or Hilbert-Schmidt) cost function, and input and output spaces $\mathbb{R}^Q$ with equal dimension $Q\geq1$. The hidden layers are also defined on $\mathbb{R}^{Q}$; the training input size $N$ can be arbitrarily large - thus, we are considering the underparametrized regime. We apply our recent results on shallow neural networks to construct an explicit family of minimizers for the global minimum of the cost function in the case $L\geq Q$, which we show to be degenerate. In the context presented here, the hidden layers of the DL network "curate" the training inputs by recursive application of a truncation map that minimizes the noise to signal ratio of the training inputs. Moreover, we determine a set of $2^Q-1$ distinct degenerate local minima of the cost function. Our constructions make no use of gradient descent algorithms at all.

artificial intelligence, cost function, machine learning, (17 more...)

arXiv.org Machine Learning

2309.10639

Country:

Europe (0.28)
North America > United States > Texas > Travis County > Austin (0.14)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Non-approximability of constructive global $\mathcal{L}^2$ minimizers by gradient descent in Deep Learning

Chen, Thomas, Ewald, Patricia Muñoz

arXiv.org Machine LearningNov-12-2023

We analyze geometric aspects of the gradient descent algorithm in Deep Learning (DL) networks. In particular, we prove that the globally minimizing weights and biases for the $\mathcal{L}^2$ cost obtained constructively in [Chen-Munoz Ewald 2023] for underparametrized ReLU DL networks can generically not be approximated via the gradient descent flow. We therefore conclude that the method introduced in [Chen-Munoz Ewald 2023] is disjoint from the gradient descent method.

artificial intelligence, machine learning, minimizer, (14 more...)

arXiv.org Machine Learning

2311.07065

Country: North America > United States > Texas > Travis County > Austin (0.15)

Genre: Research Report (0.51)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.72)

Add feedback

Geometric structure of shallow neural networks and constructive ${\mathcal L}^2$ cost minimization

Chen, Thomas, Ewald, Patricia Muñoz

arXiv.org Machine LearningSep-19-2023

In this paper, we provide a geometric interpretation of the structure of shallow neural networks characterized by one hidden layer, a ramp activation function, an ${\mathcal L}^2$ Schatten class (or Hilbert-Schmidt) cost function, input space ${\mathbb R}^M$, output space ${\mathbb R}^Q$ with $Q\leq M$, and training input sample size $N>QM$. We prove an upper bound on the minimum of the cost function of order $O(\delta_P$ where $\delta_P$ measures the signal to noise ratio of training inputs. We obtain an approximate optimizer using projections adapted to the averages $\overline{x_{0,j}}$ of training input vectors belonging to the same output vector $y_j$, $j=1,\dots,Q$. In the special case $M=Q$, we explicitly determine an exact degenerate local minimum of the cost function; the sharp value differs from the upper bound obtained for $Q\leq M$ by a relative error $O(\delta_P^2)$. The proof of the upper bound yields a constructively trained network; we show that it metrizes the $Q$-dimensional subspace in the input space ${\mathbb R}^M$ spanned by $\overline{x_{0,j}}$, $j=1,\dots,Q$. We comment on the characterization of the global minimum of the cost function in the given context.

artificial intelligence, cost function, machine learning, (17 more...)

arXiv.org Machine Learning

2309.1037

Country:

Europe (0.28)
North America > United States > Texas > Travis County > Austin (0.14)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Hierarchical Summarization for Longform Spoken Dialog

Li, Daniel, Chen, Thomas, Tung, Albert, Chilton, Lydia

arXiv.org Artificial IntelligenceAug-21-2021

Every day we are surrounded by spoken dialog. This medium delivers rich diverse streams of information auditorily; however, systematically understanding dialog can often be non-trivial. Despite the pervasiveness of spoken dialog, automated speech understanding and quality information extraction remains markedly poor, especially when compared to written prose. Furthermore, compared to understanding text, auditory communication poses many additional challenges such as speaker disfluencies, informal prose styles, and lack of structure. These concerns all demonstrate the need for a distinctly speech tailored interactive system to help users understand and navigate the spoken language domain. While individual automatic speech recognition (ASR) and text summarization methods already exist, they are imperfect technologies; neither consider user purpose and intent nor address spoken language induced complications. Consequently, we design a two stage ASR and text summarization pipeline and propose a set of semantic segmentation and merging algorithms to resolve these speech modeling challenges. Our system enables users to easily browse and navigate content as well as recover from errors in these underlying technologies. Finally, we present an evaluation of the system which highlights user preference for hierarchical summarization as a tool to quickly skim audio and identify content of interest to the user.

speech recognition, summarization, text processing, (20 more...)

arXiv.org Artificial Intelligence

2108.09597

Country:

North America > United States > New York > New York County > New York City (0.14)
North America > United States > California > Santa Clara County > Palo Alto (0.14)

Genre:

Research Report (0.82)
Questionnaire & Opinion Survey (0.68)

Industry:

Banking & Finance (0.93)
Education (0.93)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)

Add feedback