Inductive Learning
A Non-convex One-Pass Framework for Generalized Factorization Machine and Rank-One Matrix Sensing
We develop an efficient alternating framework for learning a generalized version of Factorization Machine (gFM) on steaming data with provable guarantees. When the instances are sampled from $d$ dimensional random Gaussian vectors and the target second order coefficient matrix in gFM is of rank $k$, our algorithm converges linearly, achieves $O(\epsilon)$ recovery error after retrieving $O(k^{3}d\log(1/\epsilon))$ training instances, consumes $O(kd)$ memory in one-pass of dataset and only requires matrix-vector product operations in each iteration. The key ingredient of our framework is a construction of an estimation sequence endowed with a so-called Conditionally Independent RIP condition (CI-RIP). As special cases of gFM, our framework can be applied to symmetric or asymmetric rank-one matrix sensing problems, such as inductive matrix completion and phase retrieval.
Convex Formulation for Kernel PCA and its Use in Semi-Supervised Learning
Alaรญz, Carlos M., Fanuel, Michaรซl, Suykens, Johan A. K.
In this paper, Kernel PCA is reinterpreted as the solution to a convex optimization problem. Actually, there is a constrained convex problem for each principal component, so that the constraints guarantee that the principal component is indeed a solution, and not a mere saddle point. Although these insights do not imply any algorithmic improvement, they can be used to further understand the method, formulate possible extensions and properly address them. As an example, a new convex optimization problem for semi-supervised classification is proposed, which seems particularly well-suited whenever the number of known labels is small. Our formulation resembles a Least Squares SVM problem with a regularization parameter multiplied by a negative sign, combined with a variational principle for Kernel PCA. Our primal optimization principle for semi-supervised learning is solved in terms of the Lagrange multipliers. Numerical experiments in several classification tasks illustrate the performance of the proposed model in problems with only a few labeled data.
First Artificial Intelligence Director Hired At Apple
Apple employs their new Artificial Intelligence (AI) director Ruslan Salakhutdinov, a leading expert in the field. He is tasked to ensure that Siri and other related products will take advantage of all the relevant breakthroughs released by academic experts from AI research. He is scheduled to discuss his research for the MIT Technology Review conference at EmTech MIT 2016 to be held this week. Salakhutdinov is an associate professor at Carnegie Mellon University in the Machine Learning Department, working in the field of statistical machine learning. His research revolves around deep learning and a series of very large neural networks which allows the computer to learn and carry out complex tasks by absorbing extensive amounts of patterns and training examples.
Apple's new director of AI research will speak at EmTech MIT 2016
Apple is hiring a rising star in the world of deep learning to serve as its first director of AI research. Ruslan Salakhutdinov, an associate professor at Carnegie Mellon University in Pittsburgh, will assume the new position, which is meant to help the company make sure that Siri and its other products make use of the fundamental breakthroughs coming out of academic AI research. Salakhutdinov will talk about his research at EmTech MIT 2016, an MIT Technology Review conference held this week. Salakhutdinov researches very large neural networks used in a technology called deep learning, which lets a computer learn to perform a difficult task by consuming copious training examples. He will continue to work part time at CMU and will hire a team of researchers to work with him at Apple.
The Peaking Phenomenon in Semi-supervised Learning
Krijthe, Jesse H., Loog, Marco
For the supervised least squares classifier, when the number of training objects is smaller than the dimensionality of the data, adding more data to the training set may first increase the error rate before decreasing it. This, possibly counterintuitive, phenomenon is known as peaking. In this work, we observe that a similar but more pronounced version of this phenomenon also occurs in the semi-supervised setting, where instead of labeled objects, unlabeled objects are added to the training set. We explain why the learning curve has a more steep incline and a more gradual decline in this setting through simulation studies and by applying an approximation of the learning curve based on the work by Raudys & Duin.
Ched Evans rape case 'sets us back 30 years'
A former solicitor general has said she is concerned the Ched Evans rape case could discourage victims of sexual offences from coming forward. The 27-year-old footballer was cleared on Friday of raping a 19-year-old woman in a hotel room. Vera Baird told the BBC that details of the woman's sexual past should not have been heard in court. Mr Evans was found guilty of rape in 2012, but that conviction was quashed in April. The Chesterfield striker was accused of attacking the woman at a Premier Inn in Rhuddlan, Denbighshire, on 30 May 2011.
Filter based Taxonomy Modification for Improving Hierarchical Classification
Hierarchical Classification (HC) is a supervised learning problem where unlabeled instances are classified into a taxonomy of classes. Several methods that utilize the hierarchical structure have been developed to improve the HC performance. However, in most cases apriori defined hierarchical structure by domain experts is inconsistent; as a consequence performance improvement is not noticeable in comparison to flat classification methods. We propose a scalable data-driven filter based rewiring approach to modify an expert-defined hierarchy. Experimental comparisons of top-down HC with our modified hierarchy, on a wide range of datasets shows classification performance improvement over the baseline hierarchy (i:e:, defined by expert), clustered hierarchy and flattening based hierarchy modification approaches. In comparison to existing rewiring approaches, our developed method (rewHier) is computationally efficient, enabling it to scale to datasets with large numbers of classes, instances and features. We also show that our modified hierarchy leads to improved classification performance for classes with few training samples in comparison to flat and state-of-the-art HC approaches.
Key pretrial hearing in Cosby criminal case set for November
A key pretrial hearing to determine what evidence prosecutors can use in Bill Cosby's Pennsylvania sex assault case has been scheduled for early November. Prosecutors hope to call 13 other accusers to show the comedian had a pattern of drugging and molesting women. The criminal charges involve an encounter with Andrea Constand in 2004. Prosecutors also want to use Cosby's deposition from Constand's 2005 lawsuit. Cosby acknowledges under oath that he had sexual encounters with a series of women after giving them drugs or alcohol.
If the LAPD wants the public's trust, it needs to be more transparent
To the editor: I empathize with Los Angeles Police Department Chief Charlie Beck and his officers, who are reluctant to quickly release information and videos taken of police shootings. As imperfect human beings, none of us appreciates being exposed to intense public scrutiny. On the other hand, L.A.'s finest should learn from examples set by departments in cities like Las Vegas, where officers quickly post information about shootings online. First, bad things grow in the dark, and you can't set a behavioral standard without oversight. Opening up will create more support for genuine peace officers, who will then be reassured that the public has their back.
Optimistic Semi-supervised Least Squares Classification
Krijthe, Jesse H., Loog, Marco
The goal of semi-supervised learning is to improve supervised classifiers by using additional unlabeled training examples. In this work we study a simple self-learning approach to semi-supervised learning applied to the least squares classifier. We show that a soft-label and a hard-label variant of self-learning can be derived by applying block coordinate descent to two related but slightly different objective functions. The resulting soft-label approach is related to an idea about dealing with missing data that dates back to the 1930s. We show that the soft-label variant typically outperforms the hard-label variant on benchmark datasets and partially explain this behaviour by studying the relative difficulty of finding good local minima for the corresponding objective functions.