Collaborating Authors


Re-educating Rita


IN JULY 2011 Sebastian Thrun, who among other things is a professor at Stanford, posted a short video on YouTube, announcing that he and a colleague, Peter Norvig, were making their "Introduction to Artificial Intelligence" course available free online. By the time the course began in October, 160,000 people in 190 countries had signed up for it. At the same time Andrew Ng, also a Stanford professor, made one of his courses, on machine learning, available free online, for which 100,000 people enrolled. Both courses ran for ten weeks. Such online courses, with short video lectures, discussion boards for students and systems to grade their coursework automatically, became known as Massive Open Online Courses (MOOCs).

Approximation Vector Machines for Large-scale Online Learning Machine Learning

One of the most challenging problems in kernel online learning is to bound the model size and to promote the model sparsity. Sparse models not only improve computation and memory usage, but also enhance the generalization capacity, a principle that concurs with the law of parsimony. However, inappropriate sparsity modeling may also significantly degrade the performance. In this paper, we propose Approximation Vector Machine (AVM), a model that can simultaneously encourage the sparsity and safeguard its risk in compromising the performance. When an incoming instance arrives, we approximate this instance by one of its neighbors whose distance to it is less than a predefined threshold. Our key intuition is that since the newly seen instance is expressed by its nearby neighbor the optimal performance can be analytically formulated and maintained. We develop theoretical foundations to support this intuition and further establish an analysis to characterize the gap between the approximation and optimal solutions. This gap crucially depends on the frequency of approximation and the predefined threshold. We perform the convergence analysis for a wide spectrum of loss functions including Hinge, smooth Hinge, and Logistic for classification task, and $l_1$, $l_2$, and $\epsilon$-insensitive for regression task. We conducted extensive experiments for classification task in batch and online modes, and regression task in online mode over several benchmark datasets. The results show that our proposed AVM achieved a comparable predictive performance with current state-of-the-art methods while simultaneously achieving significant computational speed-up due to the ability of the proposed AVM in maintaining the model size.

The Future of Jobs and Jobs Training


Machines are eating humans' jobs talents. And it's not just about jobs that are repetitive and low-skill. Automation, robotics, algorithms and artificial intelligence (AI) in recent times have shown they can do equal or sometimes even better work than humans who are dermatologists, insurance claims adjusters, lawyers, seismic testers in oil fields, sports journalists and financial reporters, crew members on guided-missile destroyers, hiring managers, psychological testers, retail salespeople, and border patrol agents. Moreover, there is growing anxiety that technology developments on the near horizon will crush the jobs of the millions who drive cars and trucks, analyze medical tests and data, perform middle management chores, dispense medicine, trade stocks and evaluate markets, fight on battlefields, perform government functions, and even replace those who program software – that is, the creators of algorithms. People will create the jobs of the future, not simply train for them, ...

Why We Need To Democratize Artificial Intelligence Education - TOPBOTS


When Sahil Singla joined the social impact startup Farmguide, he was shocked to discover that thousands of rural farmers in India commit suicide every year. When harvests go awry, desperate farmers are forced to borrow from microfinance loan sharks at crippling rates. Unable to pay back these predatory loans, victims kill themselves – often by grisly methods like swallowing pesticides – to escape the threats and violence of their ruthless debt collectors. Singla and his team are tackling this social injustice with one unexpected but powerful tool: deep learning. Recent growth of computational power and structured data sets has allowed deep learning algorithms to achieve extraordinary results.

Online Learning for Distribution-Free Prediction Machine Learning

We develop an online learning method for prediction, which is important in problems with large and/or streaming data sets. We formulate the learning approach using a covariance-fitting methodology, and show that the resulting predictor has desirable computational and distribution-free properties: It is implemented online with a runtime that scales linearly in the number of samples; has a constant memory requirement; avoids local minima problems; and prunes away redundant feature dimensions without relying on restrictive assumptions on the data distribution. In conjunction with the split conformal approach, it also produces distribution-free prediction confidence intervals in a computationally efficient manner. The method is demonstrated on both real and synthetic datasets.

JAG: A Crowdsourcing Framework for Joint Assessment and Peer Grading

AAAI Conferences

Generation and evaluation of crowdsourced content is commonly treated as two separate processes, performed at different times and by two distinct groups of people: content creators and content assessors. As a result, most crowdsourcing tasks follow this template: one group of workers generates content and another group of workers evaluates it. In an educational setting, for example, content creators are traditionally students that submit open-response answers to assignments (e.g., a short answer, a circuit diagram, or a formula) and content assessors are instructors that grade these submissions. Despite the considerable success of peer-grading in massive open online courses (MOOCs), the process of test-taking and grading are still treated as two distinct tasks which typically occur at different times, and require an additional overhead of grader training and incentivization. Inspired by this problem in the context of education, we propose a general crowdsourcing framework that fuses open-response test-taking (content generation) and assessment into a single, streamlined process that appears to students in the form of an explicit test, but where everyone also acts as an implicit grader. The advantages offered by our framework include: a common incentive mechanism for both the creation and evaluation of content, and a probabilistic model that jointly models the processes of contribution and evaluation, facilitating efficient estimation of the quality of the contributions and the competency of the contributors. We demonstrate the effectiveness and limits of our framework via simulations and a real-world user study.

A Framework of Online Learning with Imbalanced Streaming Data

AAAI Conferences

A challenge for mining large-scale streaming data overlooked by most existing studies on online learning is the skewdistribution of examples over different classes. Many previous works have considered cost-sensitive approaches in an online setting for streaming data, where fixed costs are assigned to different classes, or ad-hoc costs are adapted based on the distribution of data received so far. However, it is not necessary for them to achieve optimal performance in terms of the measures suited for imbalanced data, such as Fmeasure, area under ROC curve (AUROC), area under precision and recall curve (AUPRC). This work proposes a general framework for online learning with imbalanced streaming data, where examples are coming sequentially and models are updated accordingly on-the-fly. By simultaneously learning multiple classifiers with different cost vectors, the proposed method can be adopted for different target measures for imbalanced data, including F-measure, AUROC and AUPRC. Moreover, we present a rigorous theoretical justification of the proposed framework for the F-measure maximization. Our empirical studies demonstrate the competitive if not better performance of the proposed method compared to previous cost-sensitive and resampling based online learning algorithms and those that are designed for optimizing certain measures.

Why Virtual Classes Can Be Better Than Real Ones - Issue 29: Scaling - Nautilus

AITopics Original Links

I teach one of the world's most popular MOOCs (massive online open courses), "Learning How to Learn," with neuroscientist Terrence J. Sejnowski, the Francis Crick Professor at the Salk Institute for Biological Studies. The course draws on neuroscience, cognitive psychology, and education to explain how our brains absorb and process information, so we can all be better students. Since it launched on the website Coursera in August of 2014, nearly 1 million students from over 200 countries have enrolled in our class. We've had cardiologists, engineers, lawyers, linguists, 12-year-olds, and war refugees in Sudan take the course. We get emails like this one that recently arrived: "I'll keep it short. I've recently completed your MOOC and it has already changed my life in ways you cannot imagine.

Batch Policy Gradient Methods for Improving Neural Conversation Models Machine Learning

We study reinforcement learning of chatbots with recurrent neural network architectures when the rewards are noisy and expensive to obtain. For instance, a chatbot used in automated customer service support can be scored by quality assurance agents, but this process can be expensive, time consuming and noisy. Previous reinforcement learning work for natural language processing uses on-policy updates and/or is designed for on-line learning settings. We demonstrate empirically that such strategies are not appropriate for this setting and develop an off-policy batch policy gradient method (BPG). We demonstrate the efficacy of our method via a series of synthetic experiments and an Amazon Mechanical Turk experiment on a restaurant recommendations dataset.

IZA World of Labor - Who owns the robots rules the world


The 2012 publication Race against the Machine makes the case that the digitalization of work activities is proceeding so rapidly as to cause dislocations in the job market beyond anything previously experienced [1]. Unlike past mechanization/automation, which affected lower-skill blue-collar and white-collar work, today's information technology affects workers high in the education and skill distribution. Machines can substitute for brains as well as brawn. On one estimate, about 47% of total US employment is at risk of computerization [2]. If you doubt whether a robot or some other machine equipped with digital intelligence connected to the internet could outdo you or me in our work in the foreseeable future, consider news reports about an IBM program to "create" new food dishes (chefs beware), the battle between anesthesiologists and computer programs/robots that do their job much cheaper, and the coming version of Watson ("twice as powerful as the original") based on computers connected over the internet via IBM's Cloud [3].