Education
Incomplete Pivoted QR-based Dimensionality Reduction
Bermanis, Amit, Rotbart, Aviv, Salhov, Moshe, Averbuch, Amir
High-dimensional big data appears in many research fields such as image recognition, biology and collaborative filtering. Often, the exploration of such data by classic algorithms is encountered with difficulties due to `curse of dimensionality' phenomenon. Therefore, dimensionality reduction methods are applied to the data prior to its analysis. Many of these methods are based on principal components analysis, which is statistically driven, namely they map the data into a low-dimension subspace that preserves significant statistical properties of the high-dimensional data. As a consequence, such methods do not directly address the geometry of the data, reflected by the mutual distances between multidimensional data point. Thus, operations such as classification, anomaly detection or other machine learning tasks may be affected. This work provides a dictionary-based framework for geometrically driven data analysis that includes dimensionality reduction, out-of-sample extension and anomaly detection. It embeds high-dimensional data in a low-dimensional subspace. This embedding preserves the original high-dimensional geometry of the data up to a user-defined distortion rate. In addition, it identifies a subset of landmark data points that constitute a dictionary for the analyzed dataset. The dictionary enables to have a natural extension of the low-dimensional embedding to out-of-sample data points, which gives rise to a distortion-based criterion for anomaly detection. The suggested method is demonstrated on synthetic and real-world datasets and achieves good results for classification, anomaly detection and out-of-sample tasks.
Vivek Wadhwa Named to Carnegie Mellon University Silicon Valley Faculty
Vivek Wadhwa has been named to the Carnegie Mellon University College of Engineering faculty as a distinguished fellow on its Silicon Valley campus, the Pittsburgh, Pa.-based university recently announced. In his role, Wadhwa will be teaching classes in exponential technologies, technology convergence and industry disruption, and the new rules of innovation. He will also be researching technologies and helping members of the Pittsburgh faculty connect with the Silicon Valley. "CMU is doing some of the most advanced research in areas such as robotics, artificial intelligence, Internet of Things, autonomous cars and almost every field of engineering and bioengineering," Wadhwa said in an emailed statement. "This will provide me direct access to the amazing faculty and enable me to help them make a much greater impact on the world."
Top Machine Learning MOOCs and Online Lectures: A Comprehensive Survey
Everyone who gets going in Machine Learning (and Deep Learning) gets overwhelmed by the plethora of MOOCs available. Here, I try to give a comprehensive survey of such courses available freely on the internet. You can take this post as an complementary to this and this previous posts. I will try to highlight some important pointers such as the difficulty of the courses, the correct order in which these should to be completed, the right audience for these courses. You will get a feel of how these courses give you a stack of skills in your arsenal and how you can use them to develop practical machine learning systems.
A First Course in Machine Learning, Second Edition
A First Course in Machine Learning by Simon Rogers and Mark Girolami is the best introductory book for ML currently available. It combines rigor and precision with accessibility, starts from a detailed explanation of the basic foundations of Bayesian analysis in the simplest of settings, and goes all the way to the frontiers of the subject such as infinite mixture models, GPs, and MCMC.
To supervise or not to supervise in AI?
To learn more about opportunities in applied AI, join us at the O'Reilly Artificial Intelligence Conference, September 26-27, 2016 in New York. One of the truisms of modern AI is that the next big step is to move from supervised to unsupervised learning. In the last few years, we've made tremendous progress in supervised learning: photo classification, speech recognition, even playing Go (which represents a partial, but only partial, transition to unsupervised learning). Unsupervised learning is still an unsolved problem. As Yann LeCun says, "We need to solve the unsupervised learning problem before we can even think of getting to true AI."
Probably Overthinking It: Learning to Love Bayesian Statistics
I did a webcast earlier today about Bayesian statistics. Some time in the next week, the video should be available from O'Reilly. In the meantime, you can see my slides here: And here's a transcript of what I said: Thanks everyone for joining me for this webcast. At the bottom of this slide you can see the URL for my slides, so you can follow along at home. I'm Allen Downey and I'm a professor at Olin College, which is a new engineering college right outside Boston. Our mission is to fix engineering education, and one of the ways I'm working on that is by teaching Bayesian statistics. Bayesian methods have been the victim of a 200 year smear campaign. If you are interested in the history and the people involved, I recommend this book, The Theory That Would Not Die.
Collaborative Learning of Stochastic Bandits over a Social Network
Kolla, Ravi Kumar, Jagannathan, Krishna, Gopalan, Aditya
We consider a collaborative online learning paradigm, wherein a group of agents connected through a social network are engaged in playing a stochastic multi-armed bandit game. Each time an agent takes an action, the corresponding reward is instantaneously observed by the agent, as well as its neighbours in the social network. We perform a regret analysis of various policies in this collaborative learning setting. A key finding of this paper is that natural extensions of widely-studied single agent learning policies to the network setting need not perform well in terms of regret. In particular, we identify a class of non-altruistic and individually consistent policies, and argue by deriving regret lower bounds that they are liable to suffer a large regret in the networked setting. We also show that the learning performance can be substantially improved if the agents exploit the structure of the network, and develop a simple learning algorithm based on dominating sets of the network. Specifically, we first consider a star network, which is a common motif in hierarchical social networks, and show analytically that the hub agent can be used as an information sink to expedite learning and improve the overall regret. We also derive networkwide regret bounds for the algorithm applied to general networks. We conduct numerical experiments on a variety of networks to corroborate our analytical results.
Organizing for the future
Platform-based talent markets help put the emphasis in human-capital management back where it belongs--on humans. The best way to organize corporations--it's a perennial debate. But the discussion is becoming more urgent as digital technology begins to penetrate the labor force. Although consumers have largely gone digital, the digitization of jobs, and of the tasks and activities within them, is still in the early stages, according to a recent study by McKinsey Global Institute (MGI). Even companies and industries at the forefront of digital spending and usage have yet to digitize the workforce fully (Exhibit 1).1 1.See McKinsey Global Institute, "Digital America: A tale of the haves and have-mores," December 2015. The stage is set for sweeping change as artificial intelligence, after years of hype and debate, brings workplace automation not just to physically intensive roles and repetitive routines but also to a wide range of other tasks. MGI estimates that roughly up to 45 percent of the activities employees perform can be automated by adapting currently demonstrated technologies.
The Mathematics of Machine Learning R-bloggers
This post was first published on my Linkedin page and posted here as a contributed post. In the last few months, I have had several people contact me about their enthusiasm for venturing into the world of data science and using Machine Learning (ML) techniques to probe statistical regularities and build impeccable data-driven products. However, I've observed that some actually lack the necessary mathematical intuition and framework to get useful results. This is the main reason I decided to write this blog post. Recently, there has been an upsurge in the availability of many easy-to-use machine and deep learning packages such as scikit-learn, Weka, Tensorflow etc. Machine Learning theory is a field that intersects statistical, probabilistic, computer science and algorithmic aspects arising from learning iteratively from data and finding hidden insights which can be used to build intelligent applications. Despite the immense possibilities of Machine and Deep Learning, a thorough mathematical understanding of many of these techniques is necessary for a good grasp of the inner workings of the algorithms and getting good results.