Education
Non-strongly-convex smooth stochastic approximation with convergence rate O(1/n)
We consider the stochastic approximation problem where a convex function has to be minimized, given only the knowledge of unbiased estimates of its gradients at certain points, a framework which includes machine learning methods based on the minimization of the empirical risk. We focus on problems without strong convexity, for which all previously known algorithms achieve a convergence rate for function values of $O(1/\sqrt{n})$. We consider and analyze two algorithms that achieve a rate of $O(1/n)$ for classical supervised learning problems. For least-squares regression, we show that averaged stochastic gradient descent with constant step-size achieves the desired rate. For logistic regression, this is achieved by a simple novel stochastic gradient algorithm that (a) constructs successive local quadratic approximations of the loss functions, while (b) preserving the same running time complexity as stochastic gradient descent. For these algorithms, we provide a non-asymptotic analysis of the generalization error (in expectation, and also in high probability for least-squares), and run extensive experiments showing that they often outperform existing approaches.
Online Learning of Nonparametric Mixture Models via Sequential Variational Approximation
Reliance on computationally expensive algorithms for inference has been limiting the use of Bayesian nonparametric models in large scale applications. To tackle this problem, we propose a Bayesian learning algorithm for DP mixture models. Instead of following the conventional paradigm -- random initialization plus iterative update, we take an progressive approach. Starting with a given prior, our method recursively transforms it into an approximate posterior through sequential variational approximation. In this process, new components will be incorporated on the fly when needed. The algorithm can reliably estimate a DP mixture model in one pass, making it particularly suited for applications with massive data. Experiments on both synthetic data and real datasets demonstrate remarkable improvement on efficiency -- orders of magnitude speed-up compared to the state-of-the-art.
More data speeds up training time in learning halfspaces over sparse vectors
Daniely, Amit, Linial, Nati, Shalev-Shwartz, Shai
The increased availability of data in recent years led several authors to ask whether it is possible to use data as a {\em computational} resource. That is, if more data is available, beyond the sample complexity limit, is it possible to use the extra examples to speed up the computation time required to perform the learning task? We give the first positive answer to this question for a {\em natural supervised learning problem} --- we consider agnostic PAC learning of halfspaces over $3$-sparse vectors in $\{-1,1,0\}^n$. This class is inefficiently learnable using $O\left(n/\epsilon^2\right)$ examples. Our main contribution is a novel, non-cryptographic, methodology for establishing computational-statistical gaps, which allows us to show that, under a widely believed assumption that refuting random $\mathrm{3CNF}$ formulas is hard, efficiently learning this class using $O\left(n/\epsilon^2\right)$ examples is impossible. We further show that under stronger hardness assumptions, even $O\left(n^{1.499}/\epsilon^2\right)$ examples do not suffice. On the other hand, we show a new algorithm that learns this class efficiently using $\tilde{\Omega}\left(n^2/\epsilon^2\right)$ examples. This formally establishes the tradeoff between sample and computational complexity for a natural supervised learning problem.
Bayesian optimization explains human active search
Many real-world problems have complicated objective functions. To optimize such functions, humans utilize sophisticated sequential decision-making strategies. Many optimization algorithms have also been developed for this same purpose, but how do they compare to humans in terms of both performance and behavior? We try to unravel the general underlying algorithm people may be using while searching for the maximum of an invisible 1D function. Subjects click on a blank screen and are shown the ordinate of the function at each clicked abscissa location. Their task is to find the functionโs maximum in as few clicks as possible. Subjects win if they get close enough to the maximum location. Analysis over 23 non-maths undergraduates, optimizing 25 functions from different families, shows that humans outperform 24 well-known optimization algorithms. Bayesian Optimization based on Gaussian Processes, which exploit all the x values tried and all the f(x) values obtained so far to pick the next x, predicts human performance and searched locations better. In 6 follow-up controlled experiments over 76 subjects, covering interpolation, extrapolation, and optimization tasks, we further confirm that Gaussian Processes provide a general and unified theoretical account to explain passive and active function learning and search in humans.
PSO-MISMO Modeling Strategy for Multi-Step-Ahead Time Series Prediction
Bao, Yukun, Xiong, Tao, Hu, Zhongyi
Multi-step-ahead time series prediction is one of the most challenging research topics in the field of time series modeling and prediction, and is continually under research. Recently, the multiple-input several multiple-outputs (MISMO) modeling strategy has been proposed as a promising alternative for multi-step-ahead time series prediction, exhibiting advantages compared with the two currently dominating strategies, the iterated and the direct strategies. Built on the established MISMO strategy, this study proposes a particle swarm optimization (PSO)-based MISMO modeling strategy, which is capable of determining the number of sub-models in a self-adaptive mode, with varying prediction horizons. Rather than deriving crisp divides with equal-size s prediction horizons from the established MISMO, the proposed PSO-MISMO strategy, implemented with neural networks, employs a heuristic to create flexible divides with varying sizes of prediction horizons and to generate corresponding sub-models, providing considerable flexibility in model construction, which has been validated with simulated and real datasets.
Analysis of Optimization Techniques to Improve User Response Time of Web Applications and Their Implementation for MOODLE
Analysis of seven optimization techniques grouped under three categories (hardware, back-end, and front-end) is done to study the reduction in average user response time for Modular Object Oriented Dynamic Learning Environment (Moodle), a Learning Management System which is scripted in PHP5, runs on Apache web server and utilizes MySQL database software. Before the implementation of these techniques, performance analysis of Moodle is performed for varying number of concurrent users. The results obtained for each optimization technique are then reported in a tabular format. The maximum reduction in end user response time was achieved for hardware optimization which requires Moodle server and database to be installed on solid state disk.
Bounded Recursive Self-Improvement
Nivel, E., Thรณrisson, K. R., Steunebrink, B. R., Dindo, H., Pezzulo, G., Rodriguez, M., Hernandez, C., Ognibene, D., Schmidhuber, J., Sanz, R., Helgason, H. P., Chella, A., Jonsson, G. K.
We have designed a machine that becomes increasingly better at behaving in underspecified circumstances, in a goal-directed way, on the job, by modeling itself and its environment as experience accumulates. Based on principles of autocatalysis, endogeny, and reflectivity, the work provides an architectural blueprint for constructing systems with high levels of operational autonomy in underspecified circumstances, starting from a small seed. Through value-driven dynamic priority scheduling controlling the parallel execution of a vast number of reasoning threads, the system achieves recursive self-improvement after it leaves the lab, within the boundaries imposed by its designers. A prototype system has been implemented and demonstrated to learn a complex real-world task, real-time multimodal dialogue with humans, by on-line observation. Our work presents solutions to several challenges that must be solved for achieving artificial general intelligence.
Time-varying Learning and Content Analytics via Sparse Factor Analysis
Lan, Andrew S., Studer, Christoph, Baraniuk, Richard G.
We propose SPARFA-Trace, a new machine learning-based framework for time-varying learning and content analytics for education applications. We develop a novel message passing-based, blind, approximate Kalman filter for sparse factor analysis (SPARFA), that jointly (i) traces learner concept knowledge over time, (ii) analyzes learner concept knowledge state transitions (induced by interacting with learning resources, such as textbook sections, lecture videos, etc, or the forgetting effect), and (iii) estimates the content organization and intrinsic difficulty of the assessment questions. These quantities are estimated solely from binary-valued (correct/incorrect) graded learner response data and a summary of the specific actions each learner performs (e.g., answering a question or studying a learning resource) at each time instance. Experimental results on two online course datasets demonstrate that SPARFA-Trace is capable of tracing each learner's concept knowledge evolution over time, as well as analyzing the quality and content organization of learning resources, the question-concept associations, and the question intrinsic difficulties. Moreover, we show that SPARFA-Trace achieves comparable or better performance in predicting unobserved learner responses than existing collaborative filtering and knowledge tracing approaches for personalized education.
Cross-Domain Sparse Coding
Sparse coding has shown its power as an effective data representation method. However, up to now, all the sparse coding approaches are limited within the single domain learning problem. In this paper, we extend the sparse coding to cross domain learning problem, which tries to learn from a source domain to a target domain with significant different distribution. We impose the Maximum Mean Discrepancy (MMD) criterion to reduce the cross-domain distribution difference of sparse codes, and also regularize the sparse codes by the class labels of the samples from both domains to increase the discriminative ability. The encouraging experiment results of the proposed cross-domain sparse coding algorithm on two challenging tasks --- image classification of photograph and oil painting domains, and multiple user spam detection --- show the advantage of the proposed method over other cross-domain data representation methods.
Dealing with the Fuzziness of Human Reasoning
Voskoglou, Michael Gr., Subbotin, Igor Ya.
Reasoning, the most important human brain operation, is characterized by a degree of fuzziness. In the present paper we construct a fuzzy model for the reasoning process giving through the calculation of probabilities and possibilities of all possible individuals' profiles a quantitative/qualitative view of their behaviour during the above process. In this model the main stages of human reasoning (imagination, visualisation and generation of ideas) are represented as fuzzy subsets of a set of linguistic labels characterizing a person's performance in each stage. Further, using the coordinates of the centre of gravity of the graph of the corresponding membership function we develop a method of measuring the reasoning skills of a group of individuals. We also present a number of classroom experiments with student groups' of T. E. I. of Patras, Greece, illustrating our results in practice.