Education
Structural query-by-committee
Tosh, Christopher, Dasgupta, Sanjoy
We introduce interactive structure learning, an abstract problem that encompasses many interactive learning tasks that have traditionally been studied in isolation, including active learning of binary classifiers, interactive clustering, interactive embedding, and active learning of structured output predictors. These problems include variants of both supervised and unsupervised tasks, and allow many different types of feedback, from binary labels to must-link/cannot-link constraints to similarity assessments to structured outputs. Despite these surface differences, they conform to a common template that allows them to be fruitfully unified. In interactive structure learning, there is a space of items X --for instance, an input space on which a classifier is to be learned, or points to cluster, or points to embed in a metric space--and the goal is to learn a structure on X, chosen from a family G. This set G could consist, for example, of all linear classifiers on X, or all hierarchical clusterings of X, or all knowledge graphs on X.
Learning Mixtures of Product Distributions via Higher Multilinear Moments
Learning mixtures of $k$ binary product distributions is a central problem in computational learning theory, but one where there are wide gaps between the best known algorithms and lower bounds (even for restricted families of algorithms). We narrow many of these gaps by developing novel insights about how to reason about higher order multilinear moments. Our results include: 1) An $n^{O(k^2)}$ time algorithm for learning mixtures of binary product distributions, giving the first improvement on the $n^{O(k^3)}$ time algorithm of Feldman, O'Donnell and Servedio 2) An $n^{\Omega(\sqrt{k})}$ statistical query lower bound, improving on the $n^{\Omega(\log k)}$ lower bound that is based on connections to sparse parity with noise 3) An $n^{O(\log k)}$ time algorithm for learning mixtures of $k$ subcubes. This special case can still simulate many other hard learning problems, but is much richer than any of them alone. As a corollary, we obtain more flexible algorithms for learning decision trees under the uniform distribution, that work with stochastic transitions, when we are only given positive examples and with a polylogarithmic number of samples for any fixed $k$. Our algorithms are based on a win-win analysis where we either build a basis for the moments or locate a degeneracy that can be used to simplify the problem, which we believe will have applications to other learning problems over discrete domains.
S-Isomap++: Multi Manifold Learning from Streaming Data
Mahapatra, Suchismit, Chandola, Varun
Manifold learning based methods have been widely used for non-linear dimensionality reduction (NLDR). However, in many practical settings, the need to process streaming data is a challenge for such methods, owing to the high computational complexity involved. Moreover, most methods operate under the assumption that the input data is sampled from a single manifold, embedded in a high dimensional space. We propose a method for streaming NLDR when the observed data is either sampled from multiple manifolds or irregularly sampled from a single manifold. We show that existing NLDR methods, such as Isomap, fail in such situations, primarily because they rely on smoothness and continuity of the underlying manifold, which is violated in the scenarios explored in this paper. However, the proposed algorithm is able to learn effectively in presence of multiple, and potentially intersecting, manifolds, while allowing for the input data to arrive as a massive stream.
A Beginner's Guide to Data Engineering -- Part I โ Robert Chang โ Medium
The more experienced I become as a data scientist, the more convinced I am that data engineering is one of the most critical and foundational skills in any data scientist's toolkit. I find this to be true for both evaluating project or job opportunities and scaling one's work on the job. In an earlier post, I pointed out that a data scientist's capability to convert data into value is largely correlated with the stage of her company's data infrastructure as well as how mature its data warehouse is. This means that a data scientist should know enough about data engineering to carefully evaluate how her skills are aligned with the stage and need of the company. Furthermore, many of the great data scientists I know are not only strong in data science but are also strategic in leveraging data engineering as an adjacent discipline to take on larger and more ambitious projects that are otherwise not reachable.
Google Offers Free Machine Learning Training Course
Artificial intelligence is gaining interest and ground, with major players investing heavily in areas as disparate as drone footage analysis, app development platforms and personal digital assistants. Google is investing in the educational side of AI, as well. The tech giant recently announced a free 15-hour machine learning training course, aimed at users of all experience levels (though knowledge of introductory algebra and some proficiency in programming basics and Python will come in handy). Machine Learning Crash Course (MLCC) is designed to help users develop skills in artificial intelligence and machine learning through free lessons, tutorials and hands-on exercises, the company announced. "We believe that the potential of machine learning is so vast that every technical person should learn machine learning fundamentals," wrote Barry Rosenberg, with the Google Engineering Education Team, on the Google Developers blog.
Artificial intelligence: ARC test focus goes beyond factoid questions
"Common sense" is a phrase everyone hears at one time or another, usually from an angry bystander who think you don't have any. "Humans use common sense to fill in the gaps of any question they are posed, delivering answers within an understood but non-explicit context," Swapna Krishna wrote in Engadget. Add a few years of developmental growth in the young child, and he or she acquires common sense but AI has problems. Calling out the challenge in AI research is Dr. Oren Etzioni, researcher and professor, who leads the Allen Institute for Artificial Intelligence, or AI2, in Seattle, Washington. To get at the fluidity that people have, their natural ability to move from one thing to the next, the programs need what every ten year old has in spades, he said, and that is called common sense---a set of facts, heuristics, observations, all the things that we can bring to the table, but the computer does not.
How ready is India for artificial intelligence?
Revolutions do not always follow a linear timeline. They can be sporadic and unpredictable. NITI Aayog's roadmap for national Artificial Intelligence programme hopes to bring one such revolution. The allocated Rs 3,073 crore will spearhead work on fifth generation technology startups like Artificial Intelligence, Machine Learning (ML), Internet of Things (IoT), 3D printing and Blockchain. The magnanimity of effort can be gauged by the near doubling of the fund for the programme. Atal Innovation Mission can give Rs 10 crore to start-ups that fit the criteria.
How to Rock Facebook Messenger & Chatbots for Social Success - Search Engine Journal
Alexa, why do I need a chatbot? By the year 2020, we will be having more conversations with bots than our spouse and 80 percent of brands will use chatbots for customer-centric interactions. If you have not added a chatbot strategy in your marketing plan, the time is now. By definition, a chatbot is a computer program designed to simulate a conversation with human users, especially over the Internet. Chatbots allow brands to create intimate connections and stronger relationships with your customers and audience in general.
Tech Companies Try to Retrain the Workers They're Displacing
On January 16, a new course launched on the online learning platform Coursera with an unassuming name: The Google IT Support Professional Certificate. It promised to prepare beginners for entry-level jobs in IT in eight to 12 months. That day, it attracted the largest-ever group of first-time Coursera users, almost half of them people without college degrees. More than 18,000 people have enrolled in the $49-a-month program so far, 160 of whom have completed it. "Even as we were building it, even as it was about to launch, I never anticipated the success of it," says Natalie Van Kleef Conley, Google's product lead for the program.
Evaluating Conditional Cash Transfer Policies with Machine Learning Methods
This paper presents an out-of-sample prediction comparison between major machine learning models and the structural econometric model. Over the past decade, machine learning has established itself as a powerful tool in many prediction applications, but this approach is still not widely adopted in empirical economic studies. To evaluate the benefits of this approach, I use the most common machine learning algorithms, CART, C4.5, LASSO, random forest, and adaboost, to construct prediction models for a cash transfer experiment conducted by the Progresa program in Mexico, and I compare the prediction results with those of a previous structural econometric study. Two prediction tasks are performed in this paper: the out-of-sample forecast and the long-term within-sample simulation. For the out-of-sample forecast, both the mean absolute error and the root mean square error of the school attendance rates found by all machine learning models are smaller than those found by the structural model. Random forest and adaboost have the highest accuracy for the individual outcomes of all subgroups. For the long-term within-sample simulation, the structural model has better performance than do all of the machine learning models. The poor within-sample fitness of the machine learning model results from the inaccuracy of the income and pregnancy prediction models. The result shows that the machine learning model performs better than does the structural model when there are many data to learn; however, when the data are limited, the structural model offers a more sensible prediction. The findings of this paper show promise for adopting machine learning in economic policy analyses in the era of big data.