AITopics | Tishby, Naftali

Collaborating Authors

Tishby, Naftali

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

An Information-Theoretic Framework for Non-linear Canonical Correlation Analysis

Painsky, Amichai, Feder, Meir, Tishby, Naftali

arXiv.org Machine LearningOct-31-2018

Canonical Correlation Analysis (CCA) is a linear representation learning method that seeks maximally correlated variables in multi-view data. Non-linear CCA extends this notion to a broader family of transformations, which are more powerful for many real-world applications. Given the joint probability, the Alternating Conditional Expectation (ACE) provides an optimal solution to the non-linear CCA problem. However, it suffers from limited performance and an increasing computational burden when only a finite number of observations is available. In this work we introduce an information-theoretic framework for the non-linear CCA problem (ITCCA), which extends the classical ACE approach. Our suggested framework seeks compressed representations of the data that allow a maximal level of correlation. This way we control the trade-off between the flexibility and the complexity of the representation. Our approach demonstrates favorable performance at a reduced computational burden, compared to non-linear alternatives, in a finite sample size regime. Further, ITCCA provides theoretical bounds and optimality conditions, as we establish fundamental connections to rate-distortion theory, the information bottleneck and remote source coding. In addition, it implies a "soft" dimensionality reduction, as the compression level is measured (and governed) by the mutual information between the original noisy data and the signals that we extract.

artificial intelligence, constraint, health & medicine, (16 more...)

arXiv.org Machine Learning

1810.13259

Country: Asia > Middle East > Israel (0.28)

Genre: Research Report (0.82)

Industry: Health & Medicine (0.93)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Gaussian Lower Bound for the Information Bottleneck Limit

Painsky, Amichai, Tishby, Naftali

arXiv.org Machine LearningNov-7-2017

The Information Bottleneck (IB) is a conceptual method for extracting the most compact, yet informative, representation of a set of variables, with respect to the target. It generalizes the notion of minimal sufficient statistics from classical parametric statistics to a broader information-theoretic sense. The IB curve defines the optimal trade-off between representation complexity and its predictive power. Specifically, it is achieved by minimizing the level of mutual information (MI) between the representation and the original variables, subject to a minimal level of MI between the representation and the target. This problem is shown to be in general NP hard. One important exception is the multivariate Gaussian case, for which the Gaussian IB (GIB) is known to obtain an analytical closed form solution, similar to Canonical Correlation Analysis (CCA). In this work we introduce a Gaussian lower bound to the IB curve; we find an embedding of the data which maximizes its "Gaussian part", on which we apply the GIB. This embedding provides an efficient (and practical) representation of any arbitrary data-set (in the IB sense), which in addition holds the favorable properties of a Gaussian distribution. Importantly, we show that the optimal Gaussian embedding is bounded from above by non-linear CCA. This allows a fundamental limit for our ability to Gaussianize arbitrary data-sets and solve complex problems by linear methods.

artificial intelligence, gaussianization, optimization problem, (17 more...)

arXiv.org Machine Learning

1711.02421

Country:

Asia > Middle East > Israel (0.14)
North America > United States (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Memory shapes time perception and intertemporal choices

Ortega, Pedro A., Tishby, Naftali

arXiv.org Machine LearningMay-29-2016

There is a consensus that human and non-human subjects experience temporal distortions in many stages of their perceptual and decision-making systems. Similarly, intertemporal choice research has shown that decision-makers undervalue future outcomes relative to immediate ones. Here we combine techniques from information theory and artificial intelligence to show how both temporal distortions and intertemporal choice preferences can be explained as a consequence of the coding efficiency of sensorimotor representation. In particular, the model implies that interactions that constrain future behavior are perceived as being both longer in duration and more valuable. Furthermore, using simulations of artificial agents, we investigate how memory constraints enforce a renormalization of the perceived timescales. Our results show that qualitatively different discount functions, such as exponential and hyperbolic discounting, arise as a consequence of an agent's probabilistic model of the world.

agent, health & medicine, neurology, (20 more...)

arXiv.org Machine Learning

1604.05129

Country:

North America > United States (0.93)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.49)

Add feedback

Information-Theoretic Bounded Rationality

Ortega, Pedro A., Braun, Daniel A., Dyer, Justin, Kim, Kee-Eung, Tishby, Naftali

arXiv.org Machine LearningDec-21-2015

Bounded rationality, that is, decision-making and planning under resource limitations, is widely regarded as an important open problem in artificial intelligence, reinforcement learning, computational neuroscience and economics. This paper offers a consolidated presentation of a theory of bounded rationality based on information-theoretic ideas. We provide a conceptual justification for using the free energy functional as the objective function for characterizing bounded-rational decisions. This functional possesses three crucial properties: it controls the size of the solution space; it has Monte Carlo planners that are exact, yet bypass the need for exhaustive search; and it captures model uncertainty arising from lack of evidence or from interacting with other agents having unknown intentions. We discuss the single-step decision-making case, and show how to extend it to sequential decisions using equivalence transformations. This extension yields a very general class of decision problems that encompass classical decision rules (e.g. EXPECTIMAX and MINIMAX) as limit cases, as well as trust- and risk-sensitive planning.

agent, neural network, optimization problem, (21 more...)

arXiv.org Machine Learning

1512.06789

Country:

North America > United States > Massachusetts (0.28)
North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.14)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)
(4 more...)

Add feedback

Distribution-Dependent Sample Complexity of Large Margin Learning

Sabato, Sivan, Srebro, Nathan, Tishby, Naftali

arXiv.org Machine LearningSep-18-2013

We obtain a tight distribution-specific characterization of the sample complexity of large-margin classification with L2 regularization: We introduce the margin-adapted dimension, which is a simple function of the second order statistics of the data distribution, and show distribution-specific upper and lower bounds on the sample complexity, both governed by the margin-adapted dimension of the data distribution. The upper bounds are universal, and the lower bounds hold for the rich family of sub-Gaussian distributions with independent features. We conclude that this new quantity tightly characterizes the true sample complexity of large-margin classification. To prove the lower bound, we develop several new tools of independent interest. These include new connections between shattering and hardness of learning, new properties of shattering with linear classifiers, and a new lower bound on the smallest eigenvalue of a random Gram matrix generated by sub-Gaussian variables. Our results can be used to quantitatively compare large margin learning to other learning rules, and to improve the effectiveness of methods that use sample complexity bounds, such as active learning.

artificial intelligence, machine learning, sample complexity, (17 more...)

arXiv.org Machine Learning

1204.1276

Country:

North America > United States (0.46)
Asia > Middle East > Israel (0.14)

Genre: Research Report > New Finding (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.88)

Add feedback

Multivariate Information Bottleneck

Friedman, Nir, Mosenzon, Ori, Slonim, Noam, Tishby, Naftali

arXiv.org Artificial IntelligenceJan-10-2013

The Information bottleneck method is an unsupervised non-parametric data organization technique. Given a joint distribution P(A,B), this method constructs a new variable T that extracts partitions, or clusters, over the values of A that are informative about B. The information bottleneck has already been applied to document classification, gene expression, neural code, and spectral analysis. In this paper, we introduce a general principled framework for multivariate extensions of the information bottleneck method. This allows us to consider multiple systems of data partitions that are inter-related. Our approach utilizes Bayesian networks for specifying the systems of clusters and what information each captures. We show that this construction provides insight about bottleneck variations and enables us to characterize solutions of these variations. We also present a general framework for iterative algorithms for constructing solutions, and apply it to several examples.

bayesian inference, health & medicine, information, (19 more...)

arXiv.org Artificial Intelligence

1301.227

Country:

Asia > Middle East > Israel (0.28)
North America > United States (0.28)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.67)

Add feedback

Sufficient Dimensionality Reduction with Irrelevant Statistics

Globerson, Amir, Chechik, Gal, Tishby, Naftali

arXiv.org Machine LearningOct-19-2012

The problem of finding a reduced dimensionality representation of categorical variables while preserving their most relevant characteristics is fundamental for the analysis of complex data. Specifically, given a co-occurrence matrix of two variables, one often seeks a compact representation of one variable which preserves information about the other variable. We have recently introduced ``Sufficient Dimensionality Reduction' [GT-2003], a method that extracts continuous reduced dimensional features whose measurements (i.e., expectation values) capture maximal mutual information among the variables. However, such measurements often capture information that is irrelevant for a given task. Widely known examples are illumination conditions, which are irrelevant as features for face recognition, writing style which is irrelevant as a feature for content classification, and intonation which is irrelevant as a feature for speech recognition. Such irrelevance cannot be deduced apriori, since it depends on the details of the task, and is thus inherently ill defined in the purely unsupervised case. Separating relevant from irrelevant features can be achieved using additional side data that contains such irrelevant structures. This approach was taken in [CT-2002], extending the information bottleneck method, which uses clustering to compress the data. Here we use this side-information framework to identify features whose measurements are maximally informative for the original data set, but carry as little information as possible on a side data set. In statistical terms this can be understood as extracting statistics which are maximally sufficient for the original dataset, while simultaneously maximally ancillary for the side dataset. We formulate this tradeoff as a constrained optimization problem and characterize its solutions. We then derive a gradient descent algorithm for this problem, which is based on the Generalized Iterative Scaling method for finding maximum entropy distributions. The method is demonstrated on synthetic data, as well as on real face recognition datasets, and is shown to outperform standard methods such as oriented PCA.

artificial intelligence, information, machine learning, (17 more...)

arXiv.org Machine Learning

1212.2483

Country: Asia > Middle East > Israel (0.28)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Dimensionality Reduction (0.61)

Add feedback

Multi-Instance Learning with Any Hypothesis Class

Sabato, Sivan, Tishby, Naftali

arXiv.org Machine LearningAug-13-2012

In the supervised learning setting termed Multiple-Instance Learning (MIL), the examples are bags of instances, and the bag label is a function of the labels of its instances. Typically, this function is the Boolean OR. The learner observes a sample of bags and the bag labels, but not the instance labels that determine the bag labels. The learner is then required to emit a classification rule for bags based on the sample. MIL has numerous applications, and many heuristic algorithms have been used successfully on this problem, each adapted to specific settings or applications. In this work we provide a unified theoretical analysis for MIL, which holds for any underlying hypothesis class, regardless of a specific application or problem domain. We show that the sample complexity of MIL is only poly-logarithmically dependent on the size of the bag, for any underlying hypothesis class. In addition, we introduce a new PAC-learning algorithm for MIL, which uses a regular supervised learning algorithm as an oracle. We prove that efficient PAC-learning for MIL can be generated from any efficient non-MIL supervised learning algorithm that handles one-sided error. The computational complexity of the resulting algorithm is only polynomially dependent on the bag size.

artificial intelligence, hypothesis class, inductive learning, (14 more...)

arXiv.org Machine Learning

1107.2021

Country: Asia > Middle East > Israel (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.87)

Add feedback

The Minimum Information Principle for Discriminative Learning

Globerson, Amir, Tishby, Naftali

arXiv.org Machine LearningJul-11-2012

Exponential models of distributions are widely used in machine learning for classiffication and modelling. It is well known that they can be interpreted as maximum entropy models under empirical expectation constraints. In this work, we argue that for classiffication tasks, mutual information is a more suitable information theoretic measure to be optimized. We show how the principle of minimum mutual information generalizes that of maximum entropy, and provides a comprehensive framework for building discriminative classiffiers. A game theoretic interpretation of our approach is then given, and several generalization bounds provided. We present iterative algorithms for solving the minimum information problem and its convex dual, and demonstrate their performance on various classiffication tasks. The results show that minimum information classiffiers outperform the corresponding maximum entropy models.

bayesian inference, constraint, health & medicine, (19 more...)

arXiv.org Machine Learning

1207.411

Country: Asia > Middle East > Israel (0.14)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.96)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.69)

Add feedback

Bounded Planning in Passive POMDPs

Fox, Roy, Tishby, Naftali

arXiv.org Artificial IntelligenceJun-27-2012

In Passive POMDPs actions do not affect the world state, but still incur costs. When the agent is bounded by information-processing constraints, it can only keep an approximation of the belief. We present a variational principle for the problem of maintaining the information which is most useful for minimizing the cost, and introduce an efficient and simple algorithm for finding an optimum.

artificial intelligence, information, machine learning, (18 more...)

arXiv.org Artificial Intelligence

1206.6405

Country:

North America > United States (0.28)
Asia > Middle East > Israel (0.14)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback