AITopics | Cesa-bianchi, Nicolò

Collaborating Authors

Cesa-bianchi, Nicolò

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Mirror Descent Meets Fixed Share (and feels no regret)

Cesa-bianchi, Nicolò, Gaillard, Pierre, Lugosi, Gabor, Stoltz, Gilles

Neural Information Processing SystemsDec-31-2012

Mirror descent with an entropic regularizer is known to achieve shifting regret bounds that are logarithmic in the dimension. This is done using either a carefully designed projection or by a weight sharing technique. Via a novel unified analysis, we show that these two approaches deliver essentially equivalent bounds on a notion of regret generalizing shifting, adaptive, discounted, and other related regrets. Our analysis also captures and extends the generalized weight sharing technique of Bousquet and Warmuth, and can be refined in several ways, including improvements for small losses and adaptive tuning of parameters.

artificial intelligence, forecaster, machine learning, (14 more...)

Neural Information Processing Systems

Country: Europe (0.14)

Industry: Education (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

A Linear Time Active Learning Algorithm for Link Classification

Cesa-bianchi, Nicolò, Gentile, Claudio, Vitale, Fabio, Zappella, Giovanni

Neural Information Processing SystemsDec-31-2012

We present very efficient active learning algorithms for link classification in signed networks. Our algorithms are motivated by a stochastic model in which edge labels are obtained through perturbations of a initial sign assignment consistent with a two-clustering of the nodes. We provide a theoretical analysis within this model, showing that we can achieve an optimal (to whithin a constant factor) number of mistakes on any graph $G = (V,E)$ such that $|E|$ is at least order of $|V|^{3/2}$ by querying at most order of $|V|^{3/2}$ edge labels. More generally, we show an algorithm that achieves optimality to within a factor of order $k$ by querying at most order of $|V| + (|V|/k)^{3/2}$ edge labels. The running time of this algorithm is at most of order $|E| + |V|\log|V|$.

algorithm, artificial intelligence, machine learning, (19 more...)

Neural Information Processing Systems

Country: North America > United States (0.28)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

See the Tree Through the Lines: The Shazoo Algorithm

Vitale, Fabio, Cesa-bianchi, Nicolò, Gentile, Claudio, Zappella, Giovanni

Neural Information Processing SystemsDec-31-2011

Predicting the nodes of a given graph is a fascinating theoretical problem with applications in several domains. Since graph sparsification via spanning trees retains enough information while making the task much easier, trees are an important special case of this problem. Although it is known how to predict the nodes of an unweighted tree in a nearly optimal way, in the weighted case a fully satisfactory algorithm is not available yet. We fill this hole and introduce an efficient node predictor, Shazoo, which is nearly optimal on any weighted tree. Moreover, we show that Shazoo can be viewed as a common nontrivial generalization of both previous approaches for unweighted trees and weighted lines. Experiments on real-world datasets confirm that Shazoo performs well in that it fully exploits the structure of the input tree, and gets very close to (and sometimes better than) less scalable energy minimization methods.

artificial intelligence, machine learning, node, (18 more...)

Neural Information Processing Systems

Country: Europe > Italy (0.14)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining (0.68)

Add feedback

Efficient Online Learning via Randomized Rounding

Cesa-bianchi, Nicolò, Shamir, Ohad

Neural Information Processing SystemsDec-31-2011

Most online algorithms used in machine learning today are based on variants of mirror descent or follow-the-leader. In this paper, we present an online algorithm based on a completely different approach, which combines ``random playout'' and randomized rounding of loss subgradients. As an application of our approach, we provide the first computationally efficient online algorithm for collaborative filtering with trace-norm constrained matrices. As a second application, we solve an open question linking batch learning and transductive online learning.

algorithm, computer based training, educational technology, (20 more...)

Neural Information Processing Systems

Country: Europe > Italy (0.14)

Industry: Education > Educational Setting > Online (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Enterprise Applications > Human Resources > Learning Management (0.64)

Add feedback

Linear Classification and Selective Sampling Under Low Noise Conditions

Cavallanti, Giovanni, Cesa-bianchi, Nicolò, Gentile, Claudio

Neural Information Processing SystemsDec-31-2009

We provide a new analysis of an efficient margin-based algorithm for selective sampling in classification problems. Using the so-called Tsybakov low noise condition to parametrize the instance distribution, we show bounds on the convergence rate to the Bayes risk of both the fully supervised and the selective sampling versions of the basic algorithm. Our analysis reveals that, excluding logarithmic factors, the average risk of the selective sampler converges to the Bayes risk at rate $n^{-(1+\alpha)/(3+\alpha)}$, with labels being sampled at the same rate (here $n$ denotes the sample size, and $\alpha > 0$ is the exponent in the low noise condition). We compare this convergence rate to the rate $n^{-(1+\alpha)/(2+\alpha)}$ achieved by the fully supervised algorithm using all labels. Experiments on textual data reveal that simple variants of the proposed selective sampler perform much better than popular and similarly efficient competitors.

algorithm, artificial intelligence, natural language, (16 more...)

Neural Information Processing Systems

Country: Europe > Italy (0.14)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)

Add feedback

Improved risk tail bounds for on-line algorithms

Cesa-bianchi, Nicolò, Gentile, Claudio

Neural Information Processing SystemsDec-31-2006

We prove the strongest known bound for the risk of hypotheses selected from the ensemble generated by running a learning algorithm incrementally onthe training data. Our result is based on proof techniques that are remarkably different from the standard risk analysis based on uniform convergence arguments.

artificial intelligence, hypothesis, machine learning, (18 more...)

Neural Information Processing Systems

Country: Europe > Italy (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Incremental Algorithms for Hierarchical Classification

Cesa-bianchi, Nicolò, Gentile, Claudio, Tironi, Andrea, Zaniboni, Luca

Neural Information Processing SystemsDec-31-2005

We study the problem of hierarchical classification when labels corresponding to partial and/or multiple paths in the underlying taxonomy are allowed. We introduce a new hierarchical loss function, the H-loss, implementing the simple intuition that additional mistakes in the subtree of a mistaken class should not be charged for. Based on a probabilistic data model introduced in earlier work, we derive the Bayes-optimal classifier for the H-loss. We then empirically compare two incremental approximations of the Bayes-optimal classifier with a flat SVM classifier and with classifiers obtained by using hierarchical versions of the Perceptron and SVM algorithms. The experiments show that our simplest incremental approximation of the Bayes-optimal classifier performs, after just one training epoch, nearly as well as the hierarchical SVM classifier (which performs best). For the same incremental algorithm we also derive an H-loss bound showing, when data are generated by our probabilistic data model, exponentially fast convergence to the H-loss of the hierarchical classifier based on the true model parameters.

algorithm, artificial intelligence, machine learning, (17 more...)

Neural Information Processing Systems

Country: Europe > Italy (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.69)

Add feedback

Worst-Case Analysis of Selective Sampling for Linear-Threshold Algorithms

Cesa-bianchi, Nicolò, Gentile, Claudio, Zaniboni, Luca

Neural Information Processing SystemsDec-31-2005

We provide a worst-case analysis of selective sampling algorithms for learning linear threshold functions. The algorithms considered in this paper are Perceptron-like algorithms, i.e., algorithms which can be efficiently runin any reproducing kernel Hilbert space. Our algorithms exploit a simple margin-based randomized rule to decide whether to query the current label. We obtain selective sampling algorithms achieving on average the same bounds as those proven for their deterministic counterparts, butusing much fewer labels. We complement our theoretical findings with an empirical comparison on two text categorization tasks. The outcome of these experiments is largely predicted by our theoretical results:Our selective sampling algorithms tend to perform as good as the algorithms receiving the true label after each classification, while observing in practice substantially fewer labels.

algorithm, artificial intelligence, neural network, (17 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.40)

Add feedback

Incremental Algorithms for Hierarchical Classification

Cesa-bianchi, Nicolò, Gentile, Claudio, Tironi, Andrea, Zaniboni, Luca

Neural Information Processing SystemsDec-31-2005

We study the problem of hierarchical classification when labels corresponding topartial and/or multiple paths in the underlying taxonomy are allowed. We introduce a new hierarchical loss function, the H-loss, implementing thesimple intuition that additional mistakes in the subtree of a mistaken class should not be charged for. Based on a probabilistic data model introduced in earlier work, we derive the Bayes-optimal classifier for the H-loss. We then empirically compare two incremental approximations ofthe Bayes-optimal classifier with a flat SVM classifier and with classifiers obtained by using hierarchical versions of the Perceptron and SVM algorithms. The experiments show that our simplest incremental approximationof the Bayes-optimal classifier performs, after just one training epoch, nearly as well as the hierarchical SVM classifier (which performs best). For the same incremental algorithm we also derive an H-loss bound showing, when data are generated by our probabilistic data model, exponentially fast convergence to the H-loss of the hierarchical classifier based on the true model parameters.

algorithm, artificial intelligence, machine learning, (19 more...)

Neural Information Processing Systems

Country: Europe > Italy (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.69)

Add feedback

Margin-Based Algorithms for Information Filtering

Cesa-bianchi, Nicolò, Conconi, Alex, Gentile, Claudio

Neural Information Processing SystemsDec-31-2003

In this work, we study an information filtering model where the relevance labels associated to a sequence of feature vectors are realizations of an unknown probabilistic linear function. Building on the analysis of a restricted version of our model, we derive a general filtering rule based on the margin of a ridge regression estimator. While our rule may observe the label of a vector only by classfying the vector as relevant, experiments on a real-world document filtering problem show that the performance of our rule is close to that of the online classifier which is allowed to observe all labels. These empirical results are complemented by a theoretical analysis where we consider a randomized variant of our rule and prove that its expected number of mistakes is never much larger than that of the optimal filtering rule which knows the hidden linear model.

algorithm, artificial intelligence, machine learning, (18 more...)

Neural Information Processing Systems

Country: Europe (0.14)

Industry: Education > Educational Setting > Online (0.87)

Technology: