Goto

Collaborating Authors

 Education


Mathematical Language Processing: Automatic Grading and Feedback for Open Response Mathematical Questions

arXiv.org Machine Learning

While computer and communication technologies have provided effective means to scale up many aspects of education, the submission and grading of assessments such as homework assignments and tests remains a weak link. In this paper, we study the problem of automatically grading the kinds of open response mathematical questions that figure prominently in STEM (science, technology, engineering, and mathematics) courses. Our data-driven framework for mathematical language processing (MLP) leverages solution data from a large number of learners to evaluate the correctness of their solutions, assign partial-credit scores, and provide feedback to each learner on the likely locations of any errors. MLP takes inspiration from the success of natural language processing for text data and comprises three main steps. First, we convert each solution to an open response mathematical question into a series of numerical features. Second, we cluster the features from several solutions to uncover the structures of correct, partially correct, and incorrect solutions. We develop two different clustering approaches, one that leverages generic clustering algorithms and one based on Bayesian nonparametrics. Third, we automatically grade the remaining (potentially large number of) solutions based on their assigned cluster and one instructor-provided grade per cluster. As a bonus, we can track the cluster assignment of each step of a multistep solution and determine when it departs from a cluster of correct solutions, which enables us to indicate the likely locations of errors to learners. We test and validate MLP on real-world MOOC data to demonstrate how it can substantially reduce the human effort required in large-scale educational platforms.


Submodular relaxation for inference in Markov random fields

arXiv.org Machine Learning

The problem of inference in a Markov random field (MRF) arises in many applied domains, e.g. in machine learning, computer vision, natural language processing, etc. In this paper we focus on one important type of inference: maximum a posteriori (MAP) inference, often referred to as MRF energy minimization. Inference of this type is a combinatorial optimization problem, i.e. an optimization problem with the finite domain. The most studied case of MRF energy minimization is the situation when the energy can be represented as a sum of terms (potentials) that depend on only one or two variables each (unary and pairwise potentials). In this setting the energy is said to be defined by a graph where the nodes correspond to the variables and the edges to the pairwise potentials. Minimization of energies defined on graphs in known to be NPhard in general [8] but can be done exactly in polynomial time in a number of special cases, e.g. if the graph defining the energy is acyclic [36] or if the energy is submodular in standard [28] or multi-label sense [10]. One way to go beyond pairwise potentials is to add higher-order summands to the energy. For example, Kohli et al. [23] and Ladickรฝ et al. [32] use high-order potentials based on superpixels (image regions) for semantic image segmentation; Delong et al. [11] use label cost potentials for geometric model fitting tasks. To be tractable, high-order potentials need to have a compact representation.


SPRITE: A Response Model For Multiple Choice Testing

arXiv.org Machine Learning

Item response theory (IRT) models for categorical response data are widely used in the analysis of educational data, computerized adaptive testing, and psychological surveys. However, most IRT models rely on both the assumption that categories are strictly ordered and the assumption that this ordering is known a priori. These assumptions are impractical in many real-world scenarios, such as multiple-choice exams where the levels of incorrectness for the distractor categories are often unknown. While a number of results exist on IRT models for unordered categorical data, they tend to have restrictive modeling assumptions that lead to poor data fitting performance in practice. Furthermore, existing unordered categorical models have parameters that are difficult to interpret. In this work, we propose a novel methodology for unordered categorical IRT that we call SPRITE (short for stochastic polytomous response item model) that: (i) analyzes both ordered and unordered categories, (ii) offers interpretable outputs, and (iii) provides improved data fitting compared to existing models. We compare SPRITE to existing item response models and demonstrate its efficacy on both synthetic and real-world educational datasets.


$\ell_0$ Sparsifying Transform Learning with Efficient Optimal Updates and Convergence Guarantees

arXiv.org Machine Learning

Many applications in signal processing benefit from the sparsity of signals in a certain transform domain or dictionary. Synthesis sparsifying dictionaries that are directly adapted to data have been popular in applications such as image denoising, inpainting, and medical image reconstruction. In this work, we focus instead on the sparsifying transform model, and study the learning of well-conditioned square sparsifying transforms. The proposed algorithms alternate between a $\ell_0$ "norm"-based sparse coding step, and a non-convex transform update step. We derive the exact analytical solution for each of these steps. The proposed solution for the transform update step achieves the global minimum in that step, and also provides speedups over iterative solutions involving conjugate gradients. We establish that our alternating algorithms are globally convergent to the set of local minimizers of the non-convex transform learning problems. In practice, the algorithms are insensitive to initialization. We present results illustrating the promising performance and significant speed-ups of transform learning over synthesis K-SVD in image denoising.


Survey schemes for stochastic gradient descent with applications to M-estimation

arXiv.org Machine Learning

In certain situations that shall be undoubtedly more and more common in the Big Data era, the datasets available are so massive that computing statistics over the full sample is hardly feasible, if not unfeasible. A natural approach in this context consists in using survey schemes and substituting the "full data" statistics with their counterparts based on the resulting random samples, of manageable size. It is the main purpose of this paper to investigate the impact of survey sampling with unequal inclusion probabilities on stochastic gradient descent-based M-estimation methods in large-scale statistical and machine-learning problems. Precisely, we prove that, in presence of some a priori information, one may significantly increase asymptotic accuracy when choosing appropriate first order inclusion probabilities, without affecting complexity. These striking results are described here by limit theorems and are also illustrated by numerical experiments.


Inverse Renormalization Group Transformation in Bayesian Image Segmentations

arXiv.org Machine Learning

Graduate School of Informatics, Kyoto University, 36-1 Yoshida-Honmachi, Sakyo-ku, Kyoto 606-8501 Japan A new Bayesian image segmentation algorithm is proposed by combining a loopy belief propagation with an inverse real space renormalization group transformation to reduce the computational time. In results of our experiment, we observe that the proposed method can reduce the computational time to less than one-tenth of that taken by conventional Bayesian approaches. Bayesian segmentation modeling based on Markov random fields (MRF's) is one of the interesting research topics We consider an image as defined on a set of pixels arranged on a square grid graph (V,E). HereV { i i 1, 2,ยทยทยท, V } denotes the set of all the pixels andE is the set of all the nearest-neighbour pairs of pixels{ i,j} . The total numbers of elements in the setsV and E are denoted by V and E, respectively.


The Learnability of Unknown Quantum Measurements

arXiv.org Machine Learning

Quantum machine learning has received significant attention in recent years, and promising progress has been made in the development of quantum algorithms to speed up traditional machine learning tasks. In this work, however, we focus on investigating the information-theoretic upper bounds of sample complexity - how many training samples are sufficient to predict the future behaviour of an unknown target function. This kind of problem is, arguably, one of the most fundamental problems in statistical learning theory and the bounds for practical settings can be completely characterised by a simple measure of complexity. Our main result in the paper is that, for learning an unknown quantum measurement, the upper bound, given by the fat-shattering dimension, is linearly proportional to the dimension of the underlying Hilbert space. Learning an unknown quantum state becomes a dual problem to ours, and as a byproduct, we can recover Aaronson's famous result [Proc. R. Soc. A 463:3089-3144 (2007)] solely using a classical machine learning technique. In addition, other famous complexity measures like covering numbers and Rademacher complexities are derived explicitly. We are able to connect measures of sample complexity with various areas in quantum information science, e.g. quantum state/measurement tomography, quantum state discrimination and quantum random access codes, which may be of independent interest. Lastly, with the assistance of general Bloch-sphere representation, we show that learning quantum measurements/states can be mathematically formulated as a neural network. Consequently, classical ML algorithms can be applied to efficiently accomplish the two quantum learning tasks.


ACTIVE-ating Artificial Intelligence: Integrating Active Learning in an Introductory Course

AI Magazine

By restructuring the course into a format that was roughly half lecture and half small-group problem-solving, I was able to significantly increase student engagement, their understanding and retention of difficult concepts, and my own enjoyment in teaching the class.


Power to the People: The Role of Humans in Interactive Machine Learning

AI Magazine

Intelligent systems that learn interactively from their end-users are quickly becoming widespread. Until recently, this progress has been fueled mostly by advances in machine learning; however, more and more researchers are realizing the importance of studying users of these systems. In this article we promote this approach and demonstrate how it can result in better user experiences and more effective learning systems. We present a number of case studies that characterize the impact of interactivity, demonstrate ways in which some existing systems fail to account for the user, and explore new ways for learning systems to interact with their users. We argue that the design process for interactive machine learning systems should involve users at all stages: explorations that reveal human interaction patterns and inspire novel interaction methods, as well as refinement stages to tune details of the interface and choose among alternatives. After giving a glimpse of the progress that has been made so far, we discuss the challenges that we face in moving the field forward.


ACTIVE-ating Artificial Intelligence: Integrating Active Learning in an Introductory Course

AI Magazine

his column describes my experience with using a new classroom space (the ACTIVE Center), which was designed to facilitate group-based active learning and problem solving, to teach an introductory artificial intelligence course. By restructuring the course into a format that was roughly half lecture and half small-group problem-solving, I was able to significantly increase student engagement, their understanding and retention of difficult concepts, and my own enjoyment in teaching the class.