Goto

Collaborating Authors

 Mathematical & Statistical Methods


Low-rank tensor completion: a Riemannian manifold preconditioning approach

arXiv.org Machine Learning

We propose a novel Riemannian manifold preconditioning approach for the tensor completion problem with rank constraint. A novel Riemannian metric or inner product is proposed that exploits the least-squares structure of the cost function and takes into account the structured symmetry that exists in Tucker decomposition. The specific metric allows to use the versatile framework of Riemannian optimization on quotient manifolds to develop preconditioned nonlinear conjugate gradient and stochastic gradient descent algorithms for batch and online setups, respectively. Concrete matrix representations of various optimization-related ingredients are listed. Numerical comparisons suggest that our proposed algorithms robustly outperform state-of-the-art algorithms across different synthetic and real-world datasets.


Linear Algebra for Data Scientists

@machinelearnbot

It's important to know what goes on inside a machine learning algorithm. There is some pretty intense math happening, much of which is linear algebra. When I took Andrew Ng's course on machine learning, I found the hardest part was the linear algebra.


Calculus. No Linear Algebra? Someone please clear this for me. • /r/MachineLearning

@machinelearnbot

Are you sure it was me? Although when I was in high school I was working in Arby's and I saw a customer who looked just like me and he was wearing a CMU hoodie. I thought at the time that he might be from the future, although I'm not sure why I'd ever go to an Arby's again.


New book: Doing Data Science - Straight Talk from the Frontline

@machinelearnbot

Now that people are aware that data can make the difference in an election or a business model, data science as an occupation is gaining ground. But how can you get started working in a wide-ranging, interdisciplinary field that's so clouded in hype? This insightful book, based on Columbia University's Introduction to Data Science class, tells you what you need to know. In many of these chapter-long lectures, data scientists from companies such as Google, Microsoft, and eBay share new algorithms, methods, and models by presenting case studies and the code they use. If you're familiar with linear algebra, probability, and statistics, and have programming experience, this book is an ideal introduction to data science.


Learning theory estimates with observations from general stationary stochastic processes

arXiv.org Machine Learning

This paper investigates the supervised learning problem with observations drawn from certain general stationary stochastic processes. Here by \emph{general}, we mean that many stationary stochastic processes can be included. We show that when the stochastic processes satisfy a generalized Bernstein-type inequality, a unified treatment on analyzing the learning schemes with various mixing processes can be conducted and a sharp oracle inequality for generic regularized empirical risk minimization schemes can be established. The obtained oracle inequality is then applied to derive convergence rates for several learning schemes such as empirical risk minimization (ERM), least squares support vector machines (LS-SVMs) using given generic kernels, and SVMs using Gaussian kernels for both least squares and quantile regression. It turns out that for i.i.d.~processes, our learning rates for ERM recover the optimal rates. On the other hand, for non-i.i.d.~processes including geometrically $\alpha$-mixing Markov processes, geometrically $\alpha$-mixing processes with restricted decay, $\phi$-mixing processes, and (time-reversed) geometrically $\mathcal{C}$-mixing processes, our learning rates for SVMs with Gaussian kernels match, up to some arbitrarily small extra term in the exponent, the optimal rates. For the remaining cases, our rates are at least close to the optimal rates. As a by-product, the assumed generalized Bernstein-type inequality also provides an interpretation of the so-called "effective number of observations" for various mixing processes.


Linear Algebra Formulas for Econometrics

@machinelearnbot

Econometrics is fundamental to many of the problems that data scientists care about, and it requires many skills. There's philosophical skill, for thinking about whether fixed effects or random effects models are more appropriate, for example, or what the direction of causality in a particular problem is. There's some coding, including knowing the right commands to interact with statistical programs like Stata or R, and how to interpret their output. There's the intuition to know which policy issues are worth researching, the political skill to obtain data or grant money, even the writing skill to communicate ideas. And "beneath" it all there is linear algebra: matrix formulas for the estimators that are reported, interpreted, and acted on.


Fast nonlinear embeddings via structured matrices

arXiv.org Machine Learning

We present a new paradigm for speeding up randomized computations of several frequently used functions in machine learning. In particular, our paradigm can be applied for improving computations of kernels based on random embeddings. Above that, the presented framework covers multivariate randomized functions. As a byproduct, we propose an algorithmic approach that also leads to a significant reduction of space complexity. Our method is based on careful recycling of Gaussian vectors into structured matrices that share properties of fully random matrices. The quality of the proposed structured approach follows from combinatorial properties of the graphs encoding correlations between rows of these structured matrices. Our framework covers as special cases already known structured approaches such as the Fast Johnson-Lindenstrauss Transform, but is much more general since it can be applied also to highly nonlinear embeddings. We provide strong concentration results showing the quality of the presented paradigm.


Learning Concept Graphs from Online Educational Data

Journal of Artificial Intelligence Research

This paper addresses an open challenge in educational data mining, i.e., the problem of automatically mapping online courses from different providers (universities, MOOCs, etc.) onto a universal space of concepts, and predicting latent prerequisite dependencies (directed links) among both concepts and courses. We propose a novel approach for inference within and across course-level and concept-level directed graphs. In the training phase, our system projects partially observed course-level prerequisite links onto directed concept-level links; in the testing phase, the induced concept-level links are used to infer the unknown course-level prerequisite links. Whereas courses may be specific to one institution, concepts are shared across different providers. The bi-directional mappings enable our system to perform interlingua-style transfer learning, e.g. treating the concept graph as the interlingua and transferring the prerequisite relations across universities via the interlingua. Experiments on our newly collected datasets of courses from MIT, Caltech, Princeton and CMU show promising results.


Newton's Laws of Marriage

The New Yorker

I wish we could derive the rest of the phænomena of nature by the same kind of reasoning from mechanical principles. LAW I: A body in motion will be kept in motion. A body at rest will be asked what its plans for the day are. The First Law deals primarily with inertia--which is often mistakenly identified as "relaxing"--and the different ways one body can affect another inert (and perfectly content) body. Conversely, it states that a body in motion will be kept in motion with a list of errands, written on the back of an envelope, before that body "becomes one with the couch for the rest of the day," which seems like an unnecessary characterization.


Constructive Preference Elicitation by Setwise Max-margin Learning

arXiv.org Machine Learning

In this paper we propose an approach to preference elicitation that is suitable to large configuration spaces beyond the reach of existing state-of-the-art approaches. Our setwise max-margin method can be viewed as a generalization of max-margin learning to sets, and can produce a set of "diverse" items that can be used to ask informative queries to the user. Moreover, the approach can encourage sparsity in the parameter space, in order to favor the assessment of utility towards combinations of weights that concentrate on just few features. We present a mixed integer linear programming formulation and show how our approach compares favourably with Bayesian preference elicitation alternatives and easily scales to realistic datasets.