AITopics | Hero, Alfred O. III

Collaborating Authors

Hero, Alfred O. III

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Universal Training of Neural Networks to Achieve Bayes Optimal Classification Accuracy

Naeini, Mohammadreza Tavasoli, Bereyhi, Ali, Noshad, Morteza, Liang, Ben, Hero, Alfred O. III

arXiv.org Artificial IntelligenceJan-13-2025

This work invokes the notion of $f$-divergence to introduce a novel upper bound on the Bayes error rate of a general classification task. We show that the proposed bound can be computed by sampling from the output of a parameterized model. Using this practical interpretation, we introduce the Bayes optimal learning threshold (BOLT) loss whose minimization enforces a classification model to achieve the Bayes error rate. We validate the proposed loss for image and text classification tasks, considering MNIST, Fashion-MNIST, CIFAR-10, and IMDb datasets. Numerical experiments demonstrate that models trained with BOLT achieve performance on par with or exceeding that of cross-entropy, particularly on challenging datasets. This highlights the potential of BOLT in improving generalization.

artificial intelligence, bayes error, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2501.07754

Country: North America > Canada > Ontario > Toronto (0.29)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Iterative Sketching for Secure Coded Regression

Charalambides, Neophytos, Mahdavifar, Hessam, Pilanci, Mert, Hero, Alfred O. III

arXiv.org Artificial IntelligenceAug-8-2023

In this work, we propose methods for speeding up linear regression distributively, while ensuring security. We leverage randomized sketching techniques, and improve straggler resilience in asynchronous systems. Specifically, we apply a random orthonormal matrix and then subsample \textit{blocks}, to simultaneously secure the information and reduce the dimension of the regression problem. In our setup, the transformation corresponds to an encoded encryption in an \textit{approximate gradient coding scheme}, and the subsampling corresponds to the responses of the non-straggling workers; in a centralized coded computing network. This results in a distributive \textit{iterative sketching} approach for an $\ell_2$-subspace embedding, \textit{i.e.} a new sketch is considered at each iteration. We also focus on the special case of the \textit{Subsampled Randomized Hadamard Transform}, which we generalize to block sampling; and discuss how it can be modified in order to secure the data.

artificial intelligence, data mining, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2308.04185

Country:

North America > United States > Michigan > Washtenaw County > Ann Arbor (0.14)
North America > United States > California > Santa Clara County (0.14)

Genre: Research Report (0.64)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Data Science > Data Mining (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.48)

Add feedback

The Power of Graph Convolutional Networks to Distinguish Random Graph Models

Magner, Abram, Baranwal, Mayank, Hero, Alfred O. III

arXiv.org Machine LearningOct-28-2019

Graph convolutional networks (GCNs) are a widely used method for graph representation learning. We investigate the power of GCNs, as a function of their number of layers, to distinguish between different random graph models on the basis of the embeddings of their sample graphs. In particular, the graph models that we consider arise from graphons, which are the most general possible parameterizations of infinite exchangeable graph models and which are the central objects of study in the theory of dense graph limits. We exhibit an infinite class of graphons that are well-separated in terms of cut distance and are indistinguishable by a GCN with nonlinear activation functions coming from a certain broad class if its depth is at least logarithmic in the size of the sample graph, and furthermore show that, for this application, ReLU activation functions and non-identity weight matrices with non-negative entries do not help in terms of distinguishing power. These results theoretically match empirical observations of several prior works. Finally, we show that for pairs of graphons satisfying a degree profile separation property, a very simple GCN architecture suffices for distinguishability. To prove our results, we exploit a connection to random walks on graphs.

deep learning, neural network, vertex, (20 more...)

arXiv.org Machine Learning

1910.12954

Country: North America > United States (0.68)

Genre: Research Report > New Finding (0.34)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Semi-supervised Learning in Network-Structured Data via Total Variation Minimization

Jung, Alexander, Hero, Alfred O. III, Mara, Alexandru, Jahromi, Saeed, Heimowitz, Ayelet, Eldar, Yonina C.

arXiv.org Machine LearningJan-28-2019

We propose and analyze a method for semi-supervised learning from partially-labeled network-structured data. Our approach is based on a graph signal recovery interpretation under a clustering hypothesis that labels of data points belonging to the same well-connected subset (cluster) are similar valued. This lends naturally to learning the labels by total variation (TV) minimization, which we solve by applying a recently proposed primal-dual method for non-smooth convex optimization. The resulting algorithm allows for a highly scalable implementation using message passing over the underlying empirical graph, which renders the algorithm suitable for big data applications. By applying tools of compressed sensing, we derive a sufficient condition on the underlying network structure such that TV minimization recovers clusters in the empirical graph of the data. In particular, we show that the proposed primal-dual method amounts to maximizing network flows over the empirical graph of the dataset. Moreover, the learning accuracy of the proposed algorithm is linked to the set of network flows between data points having known labels. The effectiveness and scalability of our approach is verified by numerical experiments.

inductive learning, minimization, optimization problem, (19 more...)

arXiv.org Machine Learning

1901.09838

Country:

Europe (1.00)
North America > United States > New York > New York County > New York City (0.14)
North America > United States > Massachusetts > Middlesex County (0.14)
North America > United States > Michigan > Washtenaw County > Ann Arbor (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.61)

Add feedback

Scalable Mutual Information Estimation using Dependence Graphs

Noshad, Morteza, Hero, Alfred O. III

arXiv.org Machine LearningJan-27-2018

We propose a unified method for empirical non-parametric estimation of general Mutual Information (MI) function between the random vectors in $\mathbb{R}^d$ based on $N$ i.i.d. samples. The proposed low complexity estimator is based on a bipartite graph, referred to as dependence graph. The data points are mapped to the vertices of this graph using randomized Locality Sensitive Hashing (LSH). The vertex and edge weights are defined in terms of marginal and joint hash collisions. For a given set of hash parameters $\epsilon(1), \ldots, \epsilon(k)$, a base estimator is defined as a weighted average of the transformed edge weights. The proposed estimator, called the ensemble dependency graph estimator (EDGE), is obtained as a weighted average of the base estimators, where the weights are computed offline as the solution of a linear programming problem. EDGE achieves optimal computational complexity $O(N)$, and can achieve the optimal parametric MSE rate of $O(1/N)$ if the density is $d$ times differentiable. To the best of our knowledge EDGE is the first non-parametric MI estimator that can achieve parametric MSE rates with linear time complexity.

artificial intelligence, estimator, health & medicine, (17 more...)

arXiv.org Machine Learning

1801.09125

Country: North America > United States > Michigan (0.14)

Genre: Research Report (0.64)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.86)

Add feedback

Direct Estimation of Information Divergence Using Nearest Neighbor Ratios

Noshad, Morteza, Moon, Kevin R., Sekeh, Salimeh Yasaei, Hero, Alfred O. III

arXiv.org Artificial IntelligenceNov-20-2017

We propose a direct estimation method for R\'{e}nyi and f-divergence measures based on a new graph theoretical interpretation. Suppose that we are given two sample sets $X$ and $Y$, respectively with $N$ and $M$ samples, where $\eta:=M/N$ is a constant value. Considering the $k$-nearest neighbor ($k$-NN) graph of $Y$ in the joint data set $(X,Y)$, we show that the average powered ratio of the number of $X$ points to the number of $Y$ points among all $k$-NN points is proportional to R\'{e}nyi divergence of $X$ and $Y$ densities. A similar method can also be used to estimate f-divergence measures. We derive bias and variance rates, and show that for the class of $\gamma$-H\"{o}lder smooth functions, the estimator achieves the MSE rate of $O(N^{-2\gamma/(\gamma+d)})$. Furthermore, by using a weighted ensemble estimation technique, for density functions with continuous and bounded derivatives of up to the order $d$, and some extra conditions at the support set boundary, we derive an ensemble estimator that achieves the parametric MSE rate of $O(1/N)$. Our estimators are more computationally tractable than other competing estimators, which makes them appealing in many practical applications.

artificial intelligence, estimator, machine learning, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/ISIT.2017.8006659

1702.05222

Country: North America > United States > Michigan > Washtenaw County > Ann Arbor (0.14)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Case-Based Reasoning (0.61)

Add feedback

Semiblind subgraph reconstruction in Gaussian graphical models

Xie, Tianpei, Liu, Sijia, Hero, Alfred O. III

arXiv.org Machine LearningNov-14-2017

Consider a social network where only a few nodes (agents) have meaningful interactions in the sense that the conditional dependency graph over node attribute variables (behaviors) is sparse. A company that can only observe the interactions between its own customers will generally not be able to accurately estimate its customers' dependency subgraph: it is blinded to any external interactions of its customers and this blindness creates false edges in its subgraph. In this paper we address the semiblind scenario where the company has access to a noisy summary of the complementary subgraph connecting external agents, e.g., provided by a consolidator. The proposed framework applies to other applications as well, including field estimation from a network of awake and sleeping sensors and privacy-constrained information sharing over social subnetworks. We propose a penalized likelihood approach in the context of a graph signal obeying a Gaussian graphical models (GGM). We use a convex-concave iterative optimization algorithm to maximize the penalized likelihood.

dilat-ggm, optimization problem, survey article, (19 more...)

arXiv.org Machine Learning

1711.05391

Country: North America > United States > Michigan > Washtenaw County > Ann Arbor (0.14)

Genre: Research Report (0.64)

Industry: Information Technology (0.67)

Technology:

Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.86)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.67)

Add feedback

Rate-optimal Meta Learning of Classification Error

Iranzad, Morteza Noshad, Hero, Alfred O. III

arXiv.org Machine LearningOct-30-2017

Meta learning of optimal classifier error rates allows an experimenter to empirically estimate the intrinsic ability of any estimator to discriminate between two populations, circumventing the difficult problem of estimating the optimal Bayes classifier. To this end we propose a weighted nearest neighbor (WNN) graph estimator for a tight bound on the Bayes classification error; the Henze-Penrose (HP) divergence. Similar to recently proposed HP estimators [berisha2016], the proposed estimator is non-parametric and does not require density estimation. However, unlike previous approaches the proposed estimator is rate-optimal, i.e., its mean squared estimation error (MSEE) decays to zero at the fastest possible rate of $O(1/M+1/N)$ where $M,N$ are the sample sizes of the respective populations. We illustrate the proposed WNN meta estimator for several simulated and real data sets.

artificial intelligence, bayesian inference, estimator, (19 more...)

arXiv.org Machine Learning

1710.11315

Country: North America > United States > Michigan > Washtenaw County > Ann Arbor (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.49)

Add feedback

Semi-Supervised Learning via Sparse Label Propagation

Jung, Alexander, Hero, Alfred O. III, Mara, Alexandru, Jahromi, Saeed

arXiv.org Machine LearningMay-15-2017

This work proposes a novel method for semi-supervised learning from partially labeled massive network-structured datasets, i.e., big data over networks. We model the underlying hypothesis, which relates data points to labels, as a graph signal, defined over some graph (network) structure intrinsic to the dataset. Following the key principle of supervised learning, i.e., similar inputs yield similar outputs, we require the graph signals induced by labels to have small total variation. Accordingly, we formulate the problem of learning the labels of data points as a non-smooth convex optimization problem which amounts to balancing between the empirical loss, i.e., the discrepancy with some partially available label information, and the smoothness quantified by the total variation of the learned graph signal. We solve this optimization problem by appealing to a recently proposed preconditioned variant of the popular primal-dual method by Pock and Chambolle, which results in a sparse label propagation algorithm. This learning algorithm allows for a highly scalable implementation as message passing over the underlying data graph. By applying concepts of compressed sensing to the learning problem, we are also able to provide a transparent sufficient condition on the underlying network structure such that accurate learning of the labels is possible. We also present an implementation of the message passing formulation allows for a highly scalable implementation in big data frameworks.

graph signal, inductive learning, optimization problem, (14 more...)

arXiv.org Machine Learning

1612.01414

Country: North America > United States > Michigan > Washtenaw County > Ann Arbor (0.14)

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.72)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.69)

Add feedback

Similarity Function Tracking using Pairwise Comparisons

Greenewald, Kristjan, Kelley, Stephen, Oselio, Brandon, Hero, Alfred O. III

arXiv.org Machine LearningJan-6-2017

Recent work in distance metric learning has focused on learning transformations of data that best align with specified pairwise similarity and dissimilarity constraints, often supplied by a human observer. The learned transformations lead to improved retrieval, classification, and clustering algorithms due to the better adapted distance or similarity measures. Here, we address the problem of learning these transformations when the underlying constraint generation process is nonstationary. This nonstationarity can be due to changes in either the ground-truth clustering used to generate constraints or changes in the feature subspaces in which the class structure is apparent. We propose Online Convex Ensemble StrongLy Adaptive Dynamic Learning (OCELAD), a general adaptive, online approach for learning and tracking optimal metrics as they change over time that is highly robust to a variety of nonstationary behaviors in the changing metric. We apply the OCELAD framework to an ensemble of online learners. Specifically, we create a retro-initialized composite objective mirror descent (COMID) ensemble (RICE) consisting of a set of parallel COMID learners with different learning rates, and demonstrate parameter-free RICE-OCELAD metric learning on both synthetic data and a highly nonstationary Twitter dataset. We show significant performance improvements and increased robustness to nonstationary effects relative to previously proposed batch and online distance metric learning algorithms.

artificial intelligence, learner, us government, (18 more...)

arXiv.org Machine Learning

doi: 10.1109/TSP.2017.2739100

1701.02804

Country: North America > United States > Michigan > Washtenaw County > Ann Arbor (0.14)

Genre: Research Report (0.50)

Industry:

Government > Voting & Elections (0.94)
Government > Regional Government > North America Government > United States Government (0.93)
Education (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.34)

Add feedback