Education
Learning Step Size Controllers for Robust Neural Network Training
Daniel, Christian (TU Darmstadt) | Taylor, Jonathan (Microsoft Research) | Nowozin, Sebastian (Microsoft Research)
This paper investigates algorithms to automatically adapt the learning rate of neural networks (NNs). Starting with stochastic gradient descent, a large variety of learning methods has been proposed for the NN setting. However, these methods are usually sensitive to the initial learning rate which has to be chosen by the experimenter. We investigate several features and show how an adaptive controller can adjust the learning rate without prior knowledge of the learning problem at hand.
Knowledge Transfer with Interactive Learning of Semantic Relationships
Choi, Jonghyun (University of Maryland, College Park and Comcast Labs) | Hwang, Sung Ju (Ulsan National Institute of Science and Technology) | Sigal, Leonid (Disney Research Pittsburgh) | Davis, Larry S. (University of Maryland, College Park)
We propose a novel learning framework for object categorization with interactive semantic feedback. In this framework, a discriminative categorization model improves through human-guided iterative semantic feedbacks. Specifically, the model identifies the most helpful relational semantic queries to discriminatively refine the model. The user feedback on whether the relationship is semantically valid or not is incorporated back into the model, in the form of regularization, and the process iterates. We validate the proposed model in a few-shot multi-class classification scenario, where we measure classification performance on a set of ‘target’ classes, with few training instances, by leveraging and transferring knowledge from ‘anchor’ classes, that contain larger set of labeled instances.
Column Sampling Based Discrete Supervised Hashing
Kang, Wang-Cheng (Nanjing University) | Li, Wu-Jun (Nanjing University) | Zhou, Zhi-Hua (Nanjing University)
By leveraging semantic (label) information, supervised hashing has demonstrated better accuracy than unsupervised hashing in many real applications. Because the hashing-code learning problem is essentially a discrete optimization problem which is hard to solve, most existing supervised hashing methods try to solve a relaxed continuous optimization problem by dropping the discrete constraints. However, these methods typically suffer from poor performance due to the errors caused by the relaxation. Some other methods try to directly solve the discrete optimization problem. However, they are typically time-consuming and unscalable. In this paper, we propose a novel method, called column sampling based discrete supervised hashing (COSDISH), to directly learn the discrete hashing code from semantic information. COSDISH is an iterative method, in each iteration of which several columns are sampled from the semantic similarity matrix and then the hashing code is decomposed into two parts which can be alternately optimized in a discrete way. Theoretical analysis shows that the learning (optimization) algorithm of COSDISH has a constant-approximation bound in each step of the alternating optimization procedure. Empirical results on datasets with semantic labels illustrate that COSDISH can outperform the state-of-the-art methods in real applications like image retrieval.
MOOCs Meet Measurement Theory: A Topic-Modelling Approach
He, Jiazhen (The University of Melbourne) | Rubinstein, Benjamin I. P. (The University of Melbourne) | Bailey, James (The University of Melbourne) | Zhang, Rui (The University of Melbourne) | Milligan, Sandra (The University of Melbourne) | Chan, Jeffrey (RMIT University)
This paper adapts topic models to the psychometric testing of MOOC students based on their online forum postings. Measurement theory from education and psychology provides statistical models for quantifying a person's attainment of intangible attributes such as attitudes, abilities or intelligence. Such models infer latent skill levels by relating them to individuals' observed responses on a series of items such as quiz questions. The set of items can be used to measure a latent skill if individuals' responses on them conform to a Guttman scale. Such well-scaled items differentiate between individuals and inferred levels span the entire range from most basic to the advanced. In practice, education researchers manually devise items (quiz questions) while optimising well-scaled conformance. Due to the costly nature and expert requirements of this process, psychometric testing has found limited use in everyday teaching. We aim to develop usable measurement models for highly-instrumented MOOC delivery platforms, by using participation in automatically-extracted online forum topics as items. The challenge is to formalise the Guttman scale educational constraint and incorporate it into topic models. To favour topics that automatically conform to a Guttman scale, we introduce a novel regularisation into non-negative matrix factorisation-based topic modelling. We demonstrate the suitability of our approach with both quantitative experiments on three Coursera MOOCs, and with a qualitative survey of topic interpretability on two MOOCs by domain expert interviews.
Towards Domain Adaptive Vehicle Detection in Satellite Image by Supervised Super-Resolution Transfer
Cao, Liujuan (Xiamen University) | Ji, Rongrong (Xiamen University) | Wang, Cheng (Xiamen University) | Li, Jonathan (Xiamen University)
Vehicle detection in satellite image has attracted extensive research attentions with various emerging applications.However, the detector performance has been significantly degenerated due to the low resolutions of satellite images, as well as the limited training data.In this paper, a robust domain-adaptive vehicle detection framework is proposed to bypass both problems.Our innovation is to transfer the detector learning to the high-resolution aerial image domain,where rich supervision exists and robust detectors can be trained.To this end, we first propose a super-resolution algorithm using coupled dictionary learning to ``augment'' the satellite image region being tested into the aerial domain.Notably, linear detection loss is embedded into the dictionary learning, which enforces the augmented region to be sensitive to the subsequent detector training.Second, to cope with the domain changes, we propose an instance-wised detection using Exemplar Support Vector Machines (E-SVMs), which well handles the intra-class and imaging variations like scales, rotations, and occlusions.With comprehensive experiments on large-scale satellite image collections, we demonstrate that the proposed framework can significantly boost the detection accuracy over several state-of-the-arts.
Scalable Training of Markov Logic Networks Using Approximate Counting
Sarkhel, Somdeb (The University of Texas at Dallas) | Venugopal, Deepak ( The University of Memphis ) | Pham, Tuan Anh (The University of Texas at Dallas) | Singla, Parag ( Indian Institute of Technology Delhi ) | Gogate, Vibhav (The University of Texas at Dallas)
In this paper, we propose principled weight learning algorithms for Markov logic networks that can easily scale to much larger datasets and application domains than existing algorithms. The main idea in our approach is to use approximate counting techniques to substantially reduce the complexity of the most computation intensive sub-step in weight learning: computing the number of groundings of a first-order formula that evaluate to true given a truth assignment to all the random variables. We derive theoretical bounds on the performance of our new algorithms and demonstrate experimentally that they are orders of magnitude faster and achieve the same accuracy or better than existing approaches.
An Oral Exam for Measuring a Dialog System’s Capabilities
Cohen, David (Carnegie Mellon University) | Lane, Ian (Carnegie Mellon University)
This paper suggests a model and methodology for measuring the breadth and flexibility of a dialog system's capabilities. The approach relies on having human evaluators administer a targeted oral exam to a system and provide their subjective views of that system's performance on each test problem. We present results from one instantiation of this test being performed on two publicly-accessible dialog systems and a human, and show that the suggested metrics do provide useful insights into the relative strengths and weaknesses of these systems. Results suggest that this approach can be performed with reasonable reliability and with reasonable amounts of effort. We hope that authors will augment their reporting with this approach to improve clarity and make more direct progress toward broadly-capable dialog systems.
Strategyproof Peer Selection: Mechanisms, Analyses, and Experiments
Aziz, Haris (Data61 and University of New South Wales) | Lev, Omer (University of Toronto) | Mattei, Nicholas (Data61 and University of New South Wales) | Rosenschein, Jeffrey S. (The Hebrew University of Jerusalem) | Walsh, Toby (Data61 and University of New South Wales)
We study an important crowdsourcing setting where agents evaluate one another and, based on these evaluations, a subset of agents are selected. This setting is ubiquitous when peer review is used for distributing awards in a team, allocating funding to scientists, and selecting publications for conferences. The fundamental challenge when applying crowdsourcing in these settings is that agents may misreport their reviews of others to increase their chances of being selected. We propose a new strategyproof (impartial) mechanism called Dollar Partition that satisfies desirable axiomatic properties. We then show, using a detailed experiment with parameter values derived from target real world domains, that our mechanism performs better on average, and in the worst case, than other strategyproof mechanisms in the literature.
Scaling-up Empirical Risk Minimization: Optimization of Incomplete U-statistics
Clémençon, Stéphan, Bellet, Aurélien, Colin, Igor
In a wide range of statistical learning problems such as ranking, clustering or metric learning among others, the risk is accurately estimated by $U$-statistics of degree $d\geq 1$, i.e. functionals of the training data with low variance that take the form of averages over $k$-tuples. From a computational perspective, the calculation of such statistics is highly expensive even for a moderate sample size $n$, as it requires averaging $O(n^d)$ terms. This makes learning procedures relying on the optimization of such data functionals hardly feasible in practice. It is the major goal of this paper to show that, strikingly, such empirical risks can be replaced by drastically computationally simpler Monte-Carlo estimates based on $O(n)$ terms only, usually referred to as incomplete $U$-statistics, without damaging the $O_{\mathbb{P}}(1/\sqrt{n})$ learning rate of Empirical Risk Minimization (ERM) procedures. For this purpose, we establish uniform deviation results describing the error made when approximating a $U$-process by its incomplete version under appropriate complexity assumptions. Extensions to model selection, fast rate situations and various sampling techniques are also considered, as well as an application to stochastic gradient descent for ERM. Finally, numerical examples are displayed in order to provide strong empirical evidence that the approach we promote largely surpasses more naive subsampling techniques.
This 24-year-old venture capitalist is using UC Berkeley as his own incubator
Universities have in recent years awakened to the fact that students can help them make money through more than just tuition and board. Stanford University started investing in students' start-ups in 2013. Harvard University does the same through its Xfund. Last year, the University of California launched a 250-million venture fund to invest in companies that grow out of the UC system. Now, UC Berkeley is getting in on the game -- through a new fund led by 24-year-old Los Angeles native Jeremy Fiance.