AITopics

The process of learning new behaviors over time is a problem of great interest in both neuroscience and artificial intelligence. However, most standard analyses of animal training data either treat behavior as fixed or track only coarse performance statistics (e.g., accuracy, bias), providing limited insight into the evolution of the policies governing behavior. To overcome these limitations, we propose a dynamic psychophysical model that efficiently tracks trial-to-trial changes in behavior over the course of training. Our model consists of a dynamic logistic regression model, parametrized by a set of time-varying weights that express dependence on sensory stimuli as well as task-irrelevant covariates, such as stimulus, choice, and answer history. Our implementation scales to large behavioral datasets, allowing us to infer 500K parameters (e.g. 10 weights over 50K trials) in minutes on a desktop computer. We optimize hyperparameters governing how rapidly each weight evolves over time using the decoupled Laplace approximation, an efficient method for maximizing marginal likelihood in non-conjugate models. To illustrate performance, we apply our method to psychophysical data from both rats and human subjects learning a delayed sensory discrimination task. The model successfully tracks the psychophysical weights of rats over the course of training, capturing day-to-day and trial-to-trial fluctuations that underlie changes in performance, choice bias, and dependencies on task history. Finally, we investigate why rats frequently make mistakes on easy trials, and suggest that apparent lapses can be explained by sub-optimal weighting of known task covariates.

artificial intelligence, machine learning, optimization problem, (18 more...)

Country: North America (0.28)

Genre:

Research Report > Experimental Study (0.67)
Research Report > New Finding (0.49)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.56)

Online Improper Learning with an Approximation Oracle

Hazan, Elad, Hu, Wei, Li, Yuanzhi, li, zhiyuan

We study the following question: given an efficient approximation algorithm for an optimization problem, can we learn efficiently in the same setting? We give a formal affirmative answer to this question in the form of a reduction from online learning to offline approximate optimization using an efficient algorithm that guarantees near optimal regret. The algorithm is efficient in terms of the number of oracle calls to a given approximation oracle – it makes only logarithmically many such calls per iteration. This resolves an open question by Kalai and Vempala, and by Garber. Furthermore, our result applies to the more general improper learning problems.

algorithm, artificial intelligence, machine learning, (17 more...)

Country: North America (0.46)

Genre:

Instructional Material > Online (0.40)
Research Report > New Finding (0.35)

Industry: Education (0.55)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.48)

Low-rank Interaction with Sparse Additive Effects Model for Large Data Frames

Robin, Geneviève, Wai, Hoi-To, Josse, Julie, Klopp, Olga, Moulines, Eric

Many applications of machine learning involve the analysis of large data frames -- matrices collecting heterogeneous measurements (binary, numerical, counts, etc.) across samples -- with missing values. Low-rank models, as studied by Udell et al. (2016), are popular in this framework for tasks such as visualization, clustering and missing value imputation. Yet, available methods with statistical guarantees and efficient optimization do not allow explicit modeling of main additive effects such as row and column, or covariate effects. In this paper, we introduce a low-rank interaction and sparse additive effects (LORIS) model which combines matrix regression on a dictionary and low-rank design, to estimate main effects and interactions simultaneously. We provide statistical guarantees in the form of upper bounds on the estimation error of both components. Then, we introduce a mixed coordinate gradient descent (MCGD) method which provably converges sub-linearly to an optimal solution and is computationally efficient for large scale data sets. We show on simulated and survey data that the method has a clear advantage over current practices.

artificial intelligence, machine learning, optimization problem, (17 more...)

Country:

North America (0.46)
Europe > Spain (0.28)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.49)

Nimishakavi, Madhav, Jawanpuria, Pratik Kumar, Mishra, Bamdev

A Dual Framework for Low-rank Tensor Completion

One of the popular approaches for low-rank tensor completion is to use the latent trace norm regularization. However, most existing works in this direction learn a sparse combination of tensors. In this work, we fill this gap by proposing a variant of the latent trace norm that helps in learning a non-sparse combination of tensors. We develop a dual framework for solving the low-rank tensor completion problem. We first show a novel characterization of the dual solution space with an interesting factorization of the optimal solution. Overall, the optimal solution is shown to lie on a Cartesian product of Riemannian manifolds. Furthermore, we exploit the versatile Riemannian optimization framework for proposing computationally efficient trust region algorithm. The experiments illustrate the efficacy of the proposed algorithm on several real-world datasets across applications.

algorithm, artificial intelligence, machine learning, (16 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Metelli, Alberto Maria, Papini, Matteo, Faccio, Francesco, Restelli, Marcello

Policy Optimization via Importance Sampling

Policy optimization is an effective reinforcement learning approach to solve continuous control tasks. Recent achievements have shown that alternating online and offline optimization is a successful choice for efficient trajectory reuse. However, deciding when to stop optimizing and collect new trajectories is non-trivial, as it requires to account for the variance of the objective function estimate. In this paper, we propose a novel, model-free, policy search algorithm, POIS, applicable in both action-based and parameter-based settings. We first derive a high-confidence bound for importance sampling estimation; then we define a surrogate objective function, which is optimized offline whenever a new batch of trajectories is collected. Finally, the algorithm is tested on a selection of continuous control tasks, with both linear and deep policies, and compared with state-of-the-art policy optimization methods.

machine learning, reinforcement learning, variance, (17 more...)

Country: Europe > Italy > Lombardy (0.14)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.88)

Marques, Alexandre, Lam, Remi, Willcox, Karen

Contour location via entropy reduction leveraging multiple information sources

We introduce an algorithm to locate contours of functions that are expensive to evaluate. The problem of locating contours arises in many applications, including classification, constrained optimization, and performance analysis of mechanical and dynamical systems (reliability, probability of failure, stability, etc.). Our algorithm locates contours using information from multiple sources, which are available in the form of relatively inexpensive, biased, and possibly noisy approximations to the original function. Considering multiple information sources can lead to significant cost savings. We also introduce the concept of contour entropy, a formal measure of uncertainty about the location of the zero contour of a function approximated by a statistical surrogate model. Our algorithm locates contours efficiently by maximizing the reduction of contour entropy per unit cost.

artificial intelligence, information management, machine learning, (18 more...)

Country:

North America > United States > Texas (0.28)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Information Management (0.90)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.67)

Dropping Symmetry for Fast Symmetric Nonnegative Matrix Factorization

Zhu, Zhihui, Li, Xiao, Liu, Kai, Li, Qiuwei

Symmetric nonnegative matrix factorization (NMF)---a special but important class of the general NMF---is demonstrated to be useful for data analysis and in particular for various clustering tasks. Unfortunately, designing fast algorithms for Symmetric NMF is not as easy as for the nonsymmetric counterpart, the latter admitting the splitting property that allows efficient alternating-type algorithms. To overcome this issue, we transfer the symmetric NMF to a nonsymmetric one, then we can adopt the idea from the state-of-the-art algorithms for nonsymmetric NMF to design fast algorithms solving symmetric NMF. We rigorously establish that solving nonsymmetric reformulation returns a solution for symmetric NMF and then apply fast alternating based algorithms for the corresponding reformulated problem. Furthermore, we show these fast algorithms admit strong convergence guarantee in the sense that the generated sequence is convergent at least at a sublinear rate and it converges globally to a critical point of the symmetric NMF. We conduct experiments on both synthetic data and image clustering to support our result.

algorithm, artificial intelligence, machine learning, (17 more...)

Country: North America > United States > Colorado > Jefferson County > Golden (0.14)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)

Yao, Quanming, Kwok, James

Scalable Robust Matrix Factorization with Nonconvex Loss

Robust matrix factorization (RMF), which uses the $\ell_1$-loss, often outperforms standard matrix factorization using the $\ell_2$-loss, particularly when outliers are present. The state-of-the-art RMF solver is the RMF-MM algorithm, which, however, cannot utilize data sparsity. Moreover, sometimes even the (convex) $\ell_1$-loss is not robust enough. In this paper, we propose the use of nonconvex loss to enhance robustness. To address the resultant difficult optimization problem, we use majorization-minimization (MM) optimization and propose a new MM surrogate. To improve scalability, we exploit data sparsity and optimize the surrogate via its dual with the accelerated proximal gradient algorithm. The resultant algorithm has low time and space complexities and is guaranteed to converge to a critical point. Extensive experiments demonstrate its superiority over the state-of-the-art in terms of both accuracy and scalability.

algorithm, artificial intelligence, machine learning, (17 more...)

Country: Asia > China (0.28)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.34)

Hessian-based Analysis of Large Batch Training and Robustness to Adversaries

Yao, Zhewei, Gholami, Amir, Lei, Qi, Keutzer, Kurt, Mahoney, Michael W.

Large batch size training of Neural Networks has been shown to incur accuracy loss when trained with the current methods. The exact underlying reasons for this are still not completely understood. Here, we study large batch size training through the lens of the Hessian operator and robust optimization. In particular, we perform a Hessian based study to analyze exactly how the landscape of the loss function changes when training with large batch size. We compute the true Hessian spectrum, without approximation, by back-propagating the second derivative. Extensive experiments on multiple networks show that saddle-points are not the cause for generalization gap of large batch size training, and the results consistently show that large batch converges to points with noticeably higher Hessian spectrum. Furthermore, we show that robust training allows one to favor flat areas, as points with large Hessian spectrum show poor robustness to adversarial perturbation. We further study this relationship, and provide empirical and theoretical proof that the inner loop for robust training is a saddle-free optimization problem \textit{almost everywhere}. We present detailed experiments with five different network architectures, including a residual network, tested on MNIST, CIFAR-10/100 datasets.

artificial intelligence, machine learning, optimization problem, (15 more...)

Country: North America (0.46)

Genre: Research Report > New Finding (0.68)

Industry: Information Technology > Security & Privacy (0.31)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Pillutla, Venkata Krishna, Roulet, Vincent, Kakade, Sham M., Harchaoui, Zaid

A Smoother Way to Train Structured Prediction Models

We present a framework to train a structured prediction model by performing smoothing on the inference algorithm it builds upon. Smoothing overcomes the non-smoothness inherent to the maximum margin structured prediction objective, and paves the way for the use of fast primal gradient-based optimization algorithms. We illustrate the proposed framework by developing a novel primal incremental optimization algorithm for the structural support vector machine. The proposed algorithm blends an extrapolation scheme for acceleration and an adaptive smoothing scheme and builds upon the stochastic variance-reduced gradient algorithm. We establish its worst-case global complexity bound and study several practical variants. We present experimental results on two real-world problems, namely named entity recognition and visual object localization. The experimental results show that the proposed framework allows us to build upon efficient inference algorithms to develop large-scale optimization algorithms for structured prediction which can achieve competitive performance on the two real-world problems.

artificial intelligence, inductive learning, machine learning, (15 more...)

Country:

North America > United States (1.00)
Europe (1.00)
North America > Canada > British Columbia (0.28)

Genre: Research Report (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.56)