Technology
Group-based Query Learning for rapid diagnosis in time-critical situations
Bellala, Gowtham, Bhavnani, Suresh, Scott, Clayton
In query learning, the goal is to identify an unknown object while minimizing the number of "yes or no" questions (queries) posed about that object. We consider three extensions of this fundamental problem that are motivated by practical considerations in real-world, time-critical identification tasks such as emergency response. First, we consider the problem where the objects are partitioned into groups, and the goal is to identify only the group to which the object belongs. Second, we address the situation where the queries are partitioned into groups, and an algorithm may suggest a group of queries to a human user, who then selects the actual query. Third, we consider the problem of query learning in the presence of persistent query noise, and relate it to group identification. To address these problems we show that a standard algorithm for query learning, known as the splitting algorithm or generalized binary search, may be viewed as a generalization of Shannon-Fano coding. We then extend this result to the group-based settings, leading to new algorithms. The performance of our algorithms is demonstrated on simulated data and on a database used by first responders for toxic chemical identification.
How slow is slow? SFA detects signals that are slower than the driving force
Konen, Wolfgang, Koch, Patrick
Slow feature analysis (SFA) is a method for extracting slowly varying driving forces from quickly varying nonstationary time series. We show here that it is possible for SFA to detect a component which is even slower than the driving force itself (e.g. the envelope of a modulated sine wave). It is shown that it depends on circumstances like the embedding dimension, the time series predictability, or the base frequency, whether the driving force itself or a slower subcomponent is detected. We observe a phase transition from one regime to the other and it is the purpose of this work to quantify the influence of various parameters on this phase transition. We conclude that what is percieved as slow by SFA varies and that a more or less fast switching from one regime to the other occurs, perhaps showing some similarity to human perception.
Dimension reduction and variable selection in case control studies via regularized likelihood optimization
Bunea, Florentina, Barbu, Adrian
Dimension reduction and variable selection are performed routinely in case-control studies, but the literature on the theoretical aspects of the resulting estimates is scarce. We bring our contribution to this literature by studying estimators obtained via L1 penalized likelihood optimization. We show that the optimizers of the L1 penalized retrospective likelihood coincide with the optimizers of the L1 penalized prospective likelihood. This extends the results of Prentice and Pyke (1979), obtained for non-regularized likelihoods. We establish both the sup-norm consistency of the odds ratio, after model selection, and the consistency of subset selection of our estimators. The novelty of our theoretical results consists in the study of these properties under the case-control sampling scheme. Our results hold for selection performed over a large collection of candidate variables, with cardinality allowed to depend and be greater than the sample size. We complement our theoretical results with a novel approach of determining data driven tuning parameters, based on the bisection method. The resulting procedure offers significant computational savings when compared with grid search based methods. All our numerical experiments support strongly our theoretical findings.
Data spectroscopy: Eigenspaces of convolution operators and clustering
Shi, Tao, Belkin, Mikhail, Yu, Bin
This paper focuses on obtaining clustering information about a distribution from its i.i.d. samples. We develop theoretical results to understand and use clustering information contained in the eigenvectors of data adjacency matrices based on a radial kernel function with a sufficiently fast tail decay. In particular, we provide population analyses to gain insights into which eigenvectors should be used and when the clustering information for the distribution can be recovered from the sample. We learn that a fixed number of top eigenvectors might at the same time contain redundant clustering information and miss relevant clustering information. We use this insight to design the data spectroscopic clustering (DaSpec) algorithm that utilizes properly selected eigenvectors to determine the number of clusters automatically and to group the data accordingly. Our findings extend the intuitions underlying existing spectral techniques such as spectral clustering and Kernel Principal Components Analysis, and provide new understanding into their usability and modes of failure. Simulation studies and experiments on real-world data are conducted to show the potential of our algorithm. In particular, DaSpec is found to handle unbalanced groups and recover clusters of different shapes better than the competing methods.
Manipulability of Single Transferable Vote
For many voting rules, it is NP-hard to compute a successful manipulation. However, NP-hardness only bounds the worst-case complexity. Recent theoretical results suggest that manipulation may often be easy in practice. We study empirically the cost of manipulating the single transferable vote (STV) rule. This was one of the first rules shown to be NP-hard to manipulate. It also appears to be one of the harder rules to manipulate since it involves multiple rounds and since, unlike many other rules, it is NP-hard for a single agent to manipulate without weights on the votes or uncertainty about how the other agents have voted. In almost every election in our experiments, it was easy to compute how a single agent could manipulate the election or to prove that manipulation by a single agent was impossible. It remains an interesting open question if manipulation by a coalition of agents is hard to compute in practice.
Likelihood-based semi-supervised model selection with applications to speech processing
White, Christopher M., Khudanpur, Sanjeev P., Wolfe, Patrick J.
In conventional supervised pattern recognition tasks, model selection is typically accomplished by minimizing the classification error rate on a set of so-called development data, subject to ground-truth labeling by human experts or some other means. In the context of speech processing systems and other large-scale practical applications, however, such labeled development data are typically costly and difficult to obtain. This article proposes an alternative semi-supervised framework for likelihood-based model selection that leverages unlabeled data by using trained classifiers representing each model to automatically generate putative labels. The errors that result from this automatic labeling are shown to be amenable to results from robust statistics, which in turn provide for minimax-optimal censored likelihood ratio tests that recover the nonparametric sign test as a limiting case. This approach is then validated experimentally using a state-of-the-art automatic speech recognition system to select between candidate word pronunciations using unlabeled speech data that only potentially contain instances of the words under test. Results provide supporting evidence for the utility of this approach, and suggest that it may also find use in other applications of machine learning.
A Geometric Approach to Sample Compression
Rubinstein, Benjamin I. P., Rubinstein, J. Hyam
The Sample Compression Conjecture of Littlestone & Warmuth has remained unsolved for over two decades. This paper presents a systematic geometric investigation of the compression of finite maximum concept classes. Simple arrangements of hyperplanes in Hyperbolic space, and Piecewise-Linear hyperplane arrangements, are shown to represent maximum classes, generalizing the corresponding Euclidean result. A main result is that PL arrangements can be swept by a moving hyperplane to unlabeled d-compress any finite maximum class, forming a peeling scheme as conjectured by Kuzmin & Warmuth. A corollary is that some d-maximal classes cannot be embedded into any maximum class of VC dimension d+k, for any constant k. The construction of the PL sweeping involves Pachner moves on the one-inclusion graph, corresponding to moves of a hyperplane across the intersection of d other hyperplanes. This extends the well known Pachner moves for triangulations to cubical complexes.
High-dimensional additive modeling
Meier, Lukas, van de Geer, Sara, Bühlmann, Peter
We propose a new sparsity-smoothness penalty for high-dimensional generalized additive models. The combination of sparsity and smoothness is crucial for mathematical theory as well as performance for finite-sample data. We present a computationally efficient algorithm, with provable numerical convergence properties, for optimizing the penalized likelihood. Furthermore, we provide oracle results which yield asymptotic optimality of our estimator for high dimensional but sparse additive models. Finally, an adaptive version of our sparsity-smoothness penalized approach yields large additional performance gains.
Towards applied theories based on computability logic
Computability logic (CL) (see http://www.cis.upenn.edu/~giorgi/cl.html) is a recently launched program for redeveloping logic as a formal theory of computability, as opposed to the formal theory of truth that logic has more traditionally been. Formulas in it represent computational problems, "truth" means existence of an algorithmic solution, and proofs encode such solutions. Within the line of research devoted to finding axiomatizations for ever more expressive fragments of CL, the present paper introduces a new deductive system CL12 and proves its soundness and completeness with respect to the semantics of CL. Conservatively extending classical predicate calculus and offering considerable additional expressive and deductive power, CL12 presents a reasonable, computationally meaningful, constructive alternative to classical logic as a basis for applied theories. To obtain a model example of such theories, this paper rebuilds the traditional, classical-logic-based Peano arithmetic into a computability-logic-based counterpart. Among the purposes of the present contribution is to provide a starting point for what, as the author wishes to hope, might become a new line of research with a potential of interesting findings -- an exploration of the presumably quite unusual metatheory of CL-based arithmetic and other CL-based applied systems.
A Hierarchical Bayesian Model for Frame Representation
Chaâri, L., Pesquet, J. -C., Tourneret, J. -Y., Ciuciu, Ph., Benazza-Benyahia, A.
In many signal processing problems, it may be fruitful to represent the signal under study in a frame. If a probabilistic approach is adopted, it becomes then necessary to estimate the hyper-parameters characterizing the probability distribution of the frame coefficients. This problem is difficult since in general the frame synthesis operator is not bijective. Consequently, the frame coefficients are not directly observable. This paper introduces a hierarchical Bayesian model for frame representation. The posterior distribution of the frame coefficients and model hyper-parameters is derived. Hybrid Markov Chain Monte Carlo algorithms are subsequently proposed to sample from this posterior distribution. The generated samples are then exploited to estimate the hyper-parameters and the frame coefficients of the target signal. Validation experiments show that the proposed algorithms provide an accurate estimation of the frame coefficients and hyper-parameters. Application to practical problems of image denoising show the impact of the resulting Bayesian estimation on the recovered signal quality.