AITopics

1506.04838

Country: North America > United States > New York > New York County > New York City (0.14)

Genre: Research Report (0.49)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Constraint-Based Reasoning (0.46)

Chouldechova, Alexandra, Hastie, Trevor

Generalized Additive Model Selection

arXiv.org Machine LearningJun-16-2015

We introduce GAMSEL (Generalized Additive Model Selection), a penalized likelihood approach for fitting sparse generalized additive models in high dimension. Our method interpolates between null, linear and additive models by allowing the effect of each variable to be estimated as being either zero, linear, or a low-complexity curve, as determined by the data. We present a blockwise coordinate descent procedure for efficiently optimizing the penalized likelihood objective over a dense grid of the tuning parameter, producing a regularization path of additive models. We demonstrate the performance of our method on both real and simulated data examples, and compare it with existing techniques for additive model selection.

artificial intelligence, gamsel, machine learning, (17 more...)

1506.0385

Country: North America > United States > California > Santa Clara County (0.14)

Genre: Research Report > New Finding (0.93)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Konečný, Jakub, Richtárik, Peter

Semi-Stochastic Gradient Descent Methods

arXiv.org Machine LearningJun-16-2015

In this paper we study the problem of minimizing the average of a large number ($n$) of smooth convex loss functions. We propose a new method, S2GD (Semi-Stochastic Gradient Descent), which runs for one or several epochs in each of which a single full gradient and a random number of stochastic gradients is computed, following a geometric law. The total work needed for the method to output an $\varepsilon$-accurate solution in expectation, measured in the number of passes over data, or equivalently, in units equivalent to the computation of a single gradient of the loss, is $O((\kappa/n)\log(1/\varepsilon))$, where $\kappa$ is the condition number. This is achieved by running the method for $O(\log(1/\varepsilon))$ epochs, with a single gradient evaluation and $O(\kappa)$ stochastic gradient evaluations in each. The SVRG method of Johnson and Zhang arises as a special case. If our method is limited to a single epoch only, it needs to evaluate at most $O((\kappa/\varepsilon)\log(1/\varepsilon))$ stochastic gradients. In contrast, SVRG requires $O(\kappa/\varepsilon^2)$ stochastic gradients. To illustrate our theoretical results, S2GD only needs the workload equivalent to about 2.1 full gradient evaluations to find an $10^{-6}$-accurate solution for a problem with $n=10^9$ and $\kappa=10^3$.

artificial intelligence, machine learning, s2gd, (15 more...)

1312.1666

Country: Europe > United Kingdom (0.28)

Genre: Research Report (0.83)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Laviolette, François, Morvant, Emilie, Ralaivola, Liva, Roy, Jean-Francis

On the Generalization of the C-Bound to Structured Output Ensemble Methods

This paper generalizes an important result from the PAC-Bayesian literature for binary classification to the case of ensemble methods for structured outputs. We prove a generic version of the \Cbound, an upper bound over the risk of models expressed as a weighted majority vote that is based on the first and second statistical moments of the vote's margin. This bound may advantageously $(i)$ be applied on more complex outputs such as multiclass labels and multilabel, and $(ii)$ allow to consider margin relaxations. These results open the way to develop new ensemble methods for structured output prediction with PAC-Bayesian guarantees.

bayesian inference, health & medicine, majority vote, (17 more...)

1408.1336

Country:

Europe > France (0.14)
North America > Canada (0.14)

Genre: Research Report (0.50)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Niculae, Vlad, Kumar, Srijan, Boyd-Graber, Jordan, Danescu-Niculescu-Mizil, Cristian

Linguistic Harbingers of Betrayal: A Case Study on an Online Strategy Game

Interpersonal relations are fickle, with close friendships often dissolving into enmity. In this work, we explore linguistic cues that presage such transitions by studying dyadic interactions in an online strategy game where players form alliances and break those alliances through betrayal. We characterize friendships that are unlikely to last and examine temporal patterns that foretell betrayal. We reveal that subtle signs of imminent betrayal are encoded in the conversational patterns of the dyad, even if the victim is not aware of the relationship's fate. In particular, we find that lasting friendships exhibit a form of balance that manifests itself through language. In contrast, sudden changes in the balance of certain conversational attributes---such as positive sentiment, politeness, or focus on future planning---signal impending betrayal.

artificial intelligence, betrayal, natural language, (19 more...)

1506.04744

Country:

Europe > Germany (0.15)
Europe > Austria (0.15)
Asia > Middle East (0.14)
North America > United States (0.14)

Genre:

Research Report > Experimental Study (0.69)
Research Report > New Finding (0.68)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Communications (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.46)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.46)

Telgarsky, Matus, Dudík, Miroslav, Schapire, Robert

Convex Risk Minimization and Conditional Probability Estimation

This paper proves, in very general settings, that convex risk minimization is a procedure to select a unique conditional probability model determined by the classification problem. Unlike most previous work, we give results that are general enough to include cases in which no minimum exists, as occurs typically, for instance, with standard boosting algorithms. Concretely, we first show that any sequence of predictors minimizing convex risk over the source distribution will converge to this unique model when the class of predictors is linear (but potentially of infinite dimension). Secondly, we show the same result holds for \emph{empirical} risk minimization whenever this class of predictors is finite dimensional, where the essential technical contribution is a norm-free generalization bound.

artificial intelligence, bayesian inference, ker, (17 more...)

1506.04513

Country: North America > United States > Michigan (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.62)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.62)

Rosasco, Lorenzo, Villa, Silvia

Learning with incremental iterative regularization

Within a statistical learning setting, we propose and study an iterative regularization algorithm for least squares defined by an incremental gradient method. In particular, we show that, if all other parameters are fixed a priori, the number of passes over the data (epochs) acts as a regularization parameter, and prove strong universal consistency, i.e. almost sure convergence of the risk, as well as sharp finite sample bounds for the iterates. Our results are a step towards understanding the effect of multiple epochs in stochastic gradient techniques in machine learning and rely on integrating statistical and optimization results.

artificial intelligence, iteration, machine learning, (16 more...)

1405.0042

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)

Genre: Research Report > New Finding (0.66)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.35)

Gabrié, Marylou, Tramel, Eric W., Krzakala, Florent

Training Restricted Boltzmann Machines via the Thouless-Anderson-Palmer Free Energy

Restricted Boltzmann machines are undirected neural networks which have been shown to be effective in many applications, including serving as initializations for training deep multi-layer neural networks. One of the main reasons for their success is the existence of efficient and practical stochastic algorithms, such as contrastive divergence, for unsupervised training. We propose an alternative deterministic iterative procedure based on an improved mean field method from statistical physics known as the Thouless-Anderson-Palmer approach. We demonstrate that our algorithm provides performance equal to, and sometimes superior to, persistent contrastive divergence, while also providing a clear and easy to evaluate objective function. We believe that this strategy can be easily generalized to other models as well as to more accurate higher-order approximations, paving the way for systematic improvements in training Boltzmann machines with hidden units.

deep learning, magnetization, neural network, (19 more...)

1506.02914

Country: Europe > France (0.14)

Genre: Research Report (0.83)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.84)

Alquier, Pierre, Ridgway, James, Chopin, Nicolas

On the properties of variational approximations of Gibbs posteriors

The PAC-Bayesian approach is a powerful set of techniques to derive non- asymptotic risk bounds for random estimators. The corresponding optimal distribution of estimators, usually called the Gibbs posterior, is unfortunately intractable. One may sample from it using Markov chain Monte Carlo, but this is often too slow for big datasets. We consider instead variational approximations of the Gibbs posterior, which are fast to compute. We undertake a general study of the properties of such approximations. Our main finding is that such a variational approximation has often the same rate of convergence as the original PAC-Bayesian procedure it approximates. We specialise our results to several learning tasks (classification, ranking, matrix completion),discuss how to implement a variational approximation in each case, and illustrate the good properties of said approximation on real datasets.

approximation, bayesian inference, optimization problem, (20 more...)

1506.04091

Country:

North America > United States > California (0.14)
Europe > United Kingdom > England (0.14)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Liang, Tengyuan, Rakhlin, Alexander, Sridharan, Karthik

Learning with Square Loss: Localization through Offset Rademacher Complexity

We consider regression with square loss and general classes of functions without the boundedness assumption. We introduce a notion of offset Rademacher complexity that provides a transparent way to study localization both in expectation and in high probability. For any (possibly non-convex) class, the excess loss of a two-step estimator is shown to be upper bounded by this offset complexity through a novel geometric inequality. In the convex case, the estimator reduces to an empirical risk minimizer. The method recovers the results of \citep{RakSriTsy15} for the bounded case while also providing guarantees without the boundedness assumption.

artificial intelligence, complexity, machine learning, (17 more...)

1502.06134

Country: Oceania > Australia (0.28)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)