AITopics

Randomized decision trees and forests have a rich history in machine learning and have seen considerable success in application, perhaps particularly so for computer vision. However, they face a fundamental limitation: given enough data, the number of nodes in decision trees will grow exponentially with depth. For certain applications, for example on mobile or embedded processors, memory is a limited resource, and so the exponential growth of trees limits their depth, and thus their potential accuracy. This paper proposes decision jungles, revisiting the idea of ensembles of rooted decision directed acyclic graphs (DAGs), and shows these to be compact and powerful discriminative models for classification. Unlike conventional decision trees that only allow one path to every node, a DAG in a decision jungle allows multiple paths from the root to each leaf. We present and compare two new node merging algorithms that jointly optimize both the features and the structure of the DAGs efficiently. During training, node splitting and node merging are driven by the minimization of exactly the same objective function, here the weighted sum of entropies at the leaves. Results on varied datasets show that, compared to decision forests and several other baselines, decision jungles require dramatically less memory while considerably improving generalization.

artificial intelligence, decision tree learning, node, (17 more...)

Country: North America > United States > Texas > Travis County > Austin (0.14)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Carlson, David E., Rao, Vinayak, Vogelstein, Joshua T., Carin, Lawrence

Real-Time Inference for a Gamma Process Model of Neural Spiking

With simultaneous measurements from ever increasing populations of neurons, there is a growing need for sophisticated tools to recover signals from individual neurons. In electrophysiology experiments, this classically proceeds in a two-step process: (i) threshold the waveforms to detect putative spikes and (ii) cluster the waveforms into single units (neurons). We extend previous Bayesian nonparamet- ric models of neural spiking to jointly detect and cluster neurons using a Gamma process model. Importantly, we develop an online approximate inference scheme enabling real-time analysis, with performance exceeding the previous state-of-the- art. Via exploratory data analysis—using data with partial ground truth as well as two novel data sets—we find several features of our model collectively contribute to our improved performance including: (i) accounting for colored noise, (ii) de- tecting overlapping spikes, (iii) tracking waveform dynamics, and (iv) using mul- tiple channels. We hope to enable novel experiments simultaneously measuring many thousands of neurons and possibly adapting stimuli dynamically to probe ever deeper into the mysteries of the brain.

bayesian inference, neurology, spike, (23 more...)

Country:

North America > United States (0.46)
Europe > United Kingdom > England (0.14)

Genre: Workflow (0.48)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.68)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
(2 more...)

Bresson, Xavier, Laurent, Thomas, Uminsky, David, Brecht, James von

Multiclass Total Variation Clustering

Ideas from the image processing literature have recently motivated a new set of clustering algorithms that rely on the concept of total variation. While these algorithms perform well for bi-partitioning tasks, their recursive extensions yield unimpressive results for multiclass clustering tasks. This paper presents a general framework for multiclass total variation clustering that does not rely on recursion. The results greatly outperform previous total variation algorithms and compare well with state-of-the-art NMF approaches.

algorithm, artificial intelligence, optimization problem, (17 more...)

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > California > Los Angeles County > Los Angeles (0.14)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.66)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)

Jernite, Yacine, Halpern, Yonatan, Sontag, David

Discovering Hidden Variables in Noisy-Or Networks using Quartet Tests

We give a polynomial-time algorithm for provably learning the structure and parameters of bipartite noisy-or Bayesian networks of binary variables where the top layer is completely hidden. Unsupervised learning of these models is a form of discrete factor analysis, enabling the discovery of hidden variables and their causal relationships with observed data. We obtain an efficient learning algorithm for a family of Bayesian networks that we call quartet-learnable, meaning that every latent variable has four children that do not have any other parents in common. We show that the existence of such a quartet allows us to uniquely identify each latent variable and to learn all parameters involving that latent variable. Underlying our algorithm are two new techniques for structure learning: a quartet test to determine whether a set of binary variables are singly coupled, and a conditional mutual information test that we use to learn parameters. We also show how to subtract already learned latent variables from the model to create new singly-coupled quartets, which substantially expands the class of structures that we can learn. Finally, we give a proof of the polynomial sample complexity of our learning algorithm, and experimentally compare it to variational EM.

artificial intelligence, health & medicine, latent variable, (19 more...)

Country: North America > United States > New Jersey > Mercer County > Princeton (0.14)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.70)

Thakurta, Abhradeep Guha, Smith, Adam

(Nearly) Optimal Algorithms for Private Online Learning in Full-information and Bandit Settings

We give differentially private algorithms for a large class of online learning algorithms, inboth the full information and bandit settings. Our algorithms aim to minimize a convex loss function which is a sum of smaller convex loss terms, one for each data point. To design our algorithms, we modify the popular mirror descent approach, or rather a variant called follow the approximate leader. The technique leads to the first nonprivate algorithms for private online learning in the bandit setting. In the full information setting, our algorithms improve over the regret bounds of previous work (due to Dwork, Naor, Pitassi and Rothblum (2010) and Jain, Kothari and Thakurta (2012)). In many cases, our algorithms (in both settings) match the dependence on the input length, T, of the optimal nonprivate regret bounds up to logarithmic factors in T . Our algorithms require logarithmic space and update time.

algorithm, computer based training, educational technology, (20 more...)

Country: North America > United States (0.14)

Genre: Research Report (0.47)

Industry:

Education > Educational Setting > Online (0.83)
Information Technology (0.68)

Technology:

Information Technology > Enterprise Applications > Human Resources > Learning Management (0.83)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Duchi, John, Wainwright, Martin J., Jordan, Michael I.

Local Privacy and Minimax Bounds: Sharp Rates for Probability Estimation

We provide a detailed study of the estimation of probability distributions---discrete and continuous---in a stringent setting in which data is kept private even from the statistician. We give sharp minimax rates of convergence for estimation in these locally private settings, exhibiting fundamental tradeoffs between privacy and convergence rate, as well as providing tools to allow movement along the privacy-statistical efficiency continuum. One of the consequences of our results is that Warner's classical work on randomized response is an optimal way to perform survey sampling while maintaining privacy of the respondents.

artificial intelligence, machine learning, privacy, (14 more...)

Country: North America > United States > California (0.14)

Genre: Research Report (0.34)

Industry: Information Technology > Security & Privacy (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.65)

Andilla, Ferran Diego, Hamprecht, Fred A.

Learning Multi-level Sparse Representations

Bilinear approximation of a matrix is a powerful paradigm of unsupervised learning. In some applications, however, there is a natural hierarchy of concepts that ought to be reflected in the unsupervised analysis. For example, in the neurosciences image sequence considered here, there are the semantic concepts of pixel $\rightarrow$ neuron $\rightarrow$ assembly that should find their counterpart in the unsupervised analysis. Driven by this concrete problem, we propose a decomposition of the matrix of observations into a product of more than two sparse matrices, with the rank decreasing from lower to higher levels. In contrast to prior work, we allow for both hierarchical and heterarchical relations of lower-level to higher-level concepts. In addition, we learn the nature of these relations rather than imposing them. Finally, we describe an optimization scheme that allows to optimize the decomposition over all levels jointly, rather than in a greedy level-by-level fashion. The proposed bilevel SHMF (sparse heterarchical matrix factorization) is the first formalism that allows to simultaneously interpret a calcium imaging sequence in terms of the constituent neurons, their membership in assemblies, and the time courses of both neurons and assemblies. Experiments show that the proposed model fully recovers the structure from difficult synthetic data designed to imitate the experimental data. More importantly, bilevel SHMF yields plausible interpretations of real-world Calcium imaging data.

artificial intelligence, assembly, health & medicine, (19 more...)

Country: North America > United States > Colorado (0.14)

Industry: Health & Medicine (0.88)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.34)

Cortes, Corinna, Kloft, Marius, Mohri, Mehryar

Learning Kernels Using Local Rademacher Complexity

We use the notion of local Rademacher complexity to design new algorithms for learning kernels. Our algorithms thereby benefit from the sharper learning bounds based on that notion which, under certain general conditions, guarantee a faster convergence rate. We devise two new learning kernel algorithms: one based on a convex optimization problem for which we give an efficient solution using existing learning kernel techniques, and another one that can be formulated as a DC-programming problem for which we describe a solution in detail. We also report the results of experiments with both algorithms in both binary and multi-class classification tasks.

health & medicine, optimization problem, rademacher complexity, (17 more...)

Country: North America > United States (0.28)

Genre: Research Report (0.48)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.35)

Broeck, Guy Van den, Darwiche, Adnan

On the Complexity and Approximation of Binary Evidence in Lifted Inference

Lifted inference algorithms exploit symmetries in probabilistic models to speed up inference. They show impressive performance when calculating unconditional probabilities in relational models, but often resort to non-lifted inference when computing conditional probabilities. The reason is that conditioning on evidence breaks many of the model's symmetries, which preempts standard lifting techniques. Recent theoretical results show, for example, that conditioning on evidence which corresponds to binary relations is #P-hard, suggesting that no lifting is to be expected in the worst case. In this paper, we balance this grim result by identifying the Boolean rank of the evidence as a key parameter for characterizing the complexity of conditioning in lifted inference. In particular, we show that conditioning on binary evidence with bounded Boolean rank is efficient. This opens up the possibility of approximating evidence by a low-rank Boolean matrix factorization, which we investigate both theoretically and empirically.

artificial intelligence, bayesian inference, inference, (16 more...)

Country:

North America > United States > California (0.14)
Europe > Belgium > Flanders (0.14)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.34)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.34)

Roughgarden, Tim, Kearns, Michael

Marginals-to-Models Reducibility

We consider a number of classical and new computational problems regarding marginal distributions, and inference in models specifying a full joint distribution. We prove general and efficient reductions between a number of these problems, which demonstrate that algorithmic progress in inference automatically yields progress for "pure data" problems. Our main technique involves formulating the problems as linear programs, and proving that the dual separation oracle required by the ellipsoid method is provided by the target problem. This technique may be of independent interest in probabilistic inference.

artificial intelligence, constraint, optimization problem, (17 more...)