AITopics | Tai, Kai Sheng

Collaborating Authors

Tai, Kai Sheng

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Analyzing Populations of Neural Networks via Dynamical Model Embedding

Cotler, Jordan, Tai, Kai Sheng, Hernández, Felipe, Elias, Blake, Sussillo, David

arXiv.org Artificial IntelligenceFeb-27-2023

A crucial feature of neural networks with a fixed network architecture is that they form a manifold by virtue of their continuously tunable weights, which underlies their ability to be trained by gradient descent. However, this conception of the space of neural networks is inadequate for understanding the computational processes the networks perform. For example, two neural networks trained to perform the same task may have vastly different weights, and yet implement the same high-level algorithms and computational processes (Maheswaranathan et al., 2019b). In this paper, we construct an algorithm which provides alternative parametrizations of the space of RNNs and CNNs with the goal of endowing a geometric structure that is more compatible with the high-level computational processes performed by neural networks. In particular, given a set of neural networks with the same or possibly different architectures (and possibly trained on different tasks), we find a parametrization of a low-dimensional submanifold of neural networks which approximately interpolates between these chosen "base models", as well as extrapolates beyond them. We can use such model embedding spaces to cluster neural networks and even compute model averages of neural networks. A key feature is that two points in model embedding space are nearby if they correspond to neural networks which implement similar high-level computational processes, in a manner to be described later. In this way, two neural networks may correspond to nearby points in model embedding space even if those neural networks have distinct weights or even architectures.

artificial intelligence, base model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2302.14078

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Sinkhorn Label Allocation: Semi-Supervised Classification via Annealed Self-Training

Tai, Kai Sheng, Bailis, Peter, Valiant, Gregory

arXiv.org Machine LearningFeb-17-2021

Self-training is a standard approach to semi-supervised learning where the learner's own predictions on unlabeled data are used as supervision during training. In this paper, we reinterpret this label assignment process as an optimal transportation problem between examples and classes, wherein the cost of assigning an example to a class is mediated by the current predictions of the classifier. This formulation facilitates a practical annealing strategy for label assignment and allows for the inclusion of prior knowledge on class proportions via flexible upper bound constraints. The solutions to these assignment problems can be efficiently approximated using Sinkhorn iteration, thus enabling their use in the inner loop of standard stochastic optimization algorithms. We demonstrate the effectiveness of our algorithm on the CIFAR-10, CIFAR-100, and SVHN datasets in comparison with FixMatch, a state-of-the-art self-training algorithm. Additionally, we elucidate connections between our proposed algorithm and existing confidence thresholded self-training approaches in the context of homotopy methods in optimization. Our code is available at https://github.com/stanford-futuredata/sinkhorn-label-allocation.

constraint, deep learning, neural network, (17 more...)

arXiv.org Machine Learning

2102.08622

Country:

North America > United States (0.14)
North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Equivariant Transformer Networks

Tai, Kai Sheng, Bailis, Peter, Valiant, Gregory

arXiv.org Machine LearningJan-25-2019

How can prior knowledge on the transformation invariances of a domain be incorporated into the architecture of a neural network? We propose Equivariant Transformers (ETs), a family of differentiable image-to-image mappings that improve the robustness of models towards pre-defined continuous transformation groups. Through the use of specially-derived canonical coordinate systems, ETs incorporate functions that are equivariant by construction with respect to these transformations. We show empirically that ETs can be flexibly composed to improve model robustness towards more complicated transformation groups in several parameters. On a real-world image classification task, ETs improve the sample efficiency of ResNet classifiers, achieving relative improvements in error rate of up to 15% in the limited data regime while increasing model parameter count by less than 1%.

deep learning, neural network, transformation, (17 more...)

arXiv.org Machine Learning

1901.11399

Country: North America > United States > California > Santa Clara County (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Finding Heavily-Weighted Features in Data Streams

Tai, Kai Sheng, Sharan, Vatsal, Bailis, Peter, Valiant, Gregory

arXiv.org Machine LearningNov-7-2017

We introduce a new sub-linear space data structure---the Weight-Median Sketch---that captures the most heavily weighted features in linear classifiers trained over data streams. This enables memory-limited execution of several statistical analyses over streams, including online feature selection, streaming data explanation, relative deltoid detection, and streaming estimation of pointwise mutual information. In contrast with related sketches that capture the most commonly occurring features (or items) in a data stream, the Weight-Median Sketch captures the features that are most discriminative of one stream (or class) compared to another. The Weight-Median sketch adopts the core data structure used in the Count-Sketch, but, instead of sketching counts, it captures sketched gradient updates to the model parameters. We provide a theoretical analysis of this approach that establishes recovery guarantees in the online learning setting, and demonstrate substantial empirical improvements in accuracy-memory trade-offs over alternatives, including count-based sketches and feature hashing.

artificial intelligence, dataset, machine learning, (17 more...)

arXiv.org Machine Learning

1711.02305

Country: North America > United States (0.45)

Genre: Research Report > New Finding (1.00)

Industry: Education > Educational Setting > Online (0.66)

Technology:

Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.48)

Add feedback

There and Back Again: A General Approach to Learning Sparse Models

Sharan, Vatsal, Tai, Kai Sheng, Bailis, Peter, Valiant, Gregory

arXiv.org Machine LearningJun-25-2017

We propose a simple and efficient approach to learning sparse models. Our approach consists of (1) projecting the data into a lower dimensional space, (2) learning a dense model in the lower dimensional space, and then (3) recovering the sparse model in the original space via compressive sensing. We apply this approach to Non-negative Matrix Factorization (NMF), tensor decomposition and linear classification---showing that it obtains $10\times$ compression with negligible loss in accuracy on real data, and obtains up to $5\times$ speedups. Our main theoretical contribution is to show the following result for NMF: if the original factors are sparse, then their projections are the sparsest solutions to the projected NMF problem. This explains why our method works for NMF and shows an interesting new property of random projections: they can preserve the solutions of non-convex optimization problems such as NMF.

artificial intelligence, data mining, nonzero entry, (16 more...)

arXiv.org Machine Learning

1706.08146

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Data Science > Data Mining (0.93)

Add feedback