Not enough data to create a plot.
Try a different view from the menu above.
Country
An ideal observer model for identifying the reference frame of objects
Austerweil, Joseph L., Friesen, Abram L., Griffiths, Thomas L.
The object people perceive in an image can depend on its orientation relative to the scene it is in (its reference frame). For example, the images of the symbols $\times$ and $+$ differ by a 45 degree rotation. Although real scenes have multiple images and reference frames, psychologists have focused on scenes with only one reference frame. We propose an ideal observer model based on nonparametric Bayesian statistics for inferring the number of reference frames in a scene and their parameters. When an ambiguous image could be assigned to two conflicting reference frames, the model predicts two factors should influence the reference frame inferred for the image: The image should be more likely to share the reference frame of the closer object ({\em proximity}) and it should be more likely to share the reference frame containing the most objects ({\em alignment}). We confirm people use both cues using a novel methodology that allows for easy testing of human reference frame inference.
Non-Asymptotic Analysis of Stochastic Approximation Algorithms for Machine Learning
Moulines, Eric, Bach, Francis R.
We consider the minimization of a convex objective function defined on a Hilbert space, which is only available through unbiased estimates of its gradients. This problem includes standard machine learning algorithms such as kernel logistic regression and least-squares regression, and is commonly referred to as a stochastic approximation problem in the operations research community. We provide a non-asymptotic analysis of the convergence of two well-known algorithms, stochastic gradient descent (a.k.a.~Robbins-Monro algorithm) as well as a simple modification where iterates are averaged (a.k.a.~Polyak-Ruppert averaging). Our analysis suggests that a learning rate proportional to the inverse of the number of iterations, while leading to the optimal convergence rate in the strongly convex case, is not robust to the lack of strong convexity or the setting of the proportionality constant. This situation is remedied when using slower decays together with averaging, robustly leading to the optimal rate of convergence. We illustrate our theoretical results with simulations on synthetic and standard datasets.
A Two-Stage Weighting Framework for Multi-Source Domain Adaptation
Sun, Qian, Chattopadhyay, Rita, Panchanathan, Sethuraman, Ye, Jieping
Discriminative learning when training and test data belong to different distributions is a challenging and complex task. Often times we have very few or no labeled data from the test or target distribution but may have plenty of labeled data from multiple related sources with different distributions. The difference in distributions may be in both marginal and conditional probabilities. Most of the existing domain adaptation work focuses on the marginal probability distribution difference between the domains, assuming that the conditional probabilities are similar. However in many real world applications, conditional probability distribution differences are as commonplace as marginal probability differences. In this paper we propose a two-stage domain adaptation methodology which combines weighted data from multiple sources based on marginal probability differences (first stage) as well as conditional probability differences (second stage), with the target domain data. The weights for minimizing the marginal probability differences are estimated independently, while the weights for minimizing conditional probability differences are computed simultaneously by exploiting the potential interaction among multiple sources. We also provide a theoretical analysis on the generalization performance of the proposed multi-source domain adaptation formulation using the weighted Rademacher complexity measure. Empirical comparisons with existing state-of-the-art domain adaptation methods using three real-world datasets demonstrate the effectiveness of the proposed approach.
From Stochastic Nonlinear Integrate-and-Fire to Generalized Linear Models
Mensi, Skander, Naud, Richard, Gerstner, Wulfram
Variability in single neuron models is typically implemented either by a stochastic Leaky-Integrate-and-Fire model or by a model of the Generalized Linear Model (GLM) family. We use analytical and numerical methods to relate state-of-the-art models from both schools of thought. First we find the analytical expressions relating the subthreshold voltage from the Adaptive Exponential Integrate-and-Fire model (AdEx) to the Spike-Response Model with escape noise (SRM as an example of a GLM). Then we calculate numerically the link-function that provides the firing probability given a deterministic membrane potential. We find a mathematical expression for this link-function and test the ability of the GLM to predict the firing probability of a neuron receiving complex stimulation. Comparing the prediction performance of various link-functions, we find that a GLM with an exponential link-function provides an excellent approximation to the Adaptive Exponential Integrate-and-Fire with colored-noise input. These results help to understand the relationship between the different approaches to stochastic neuron models.
Emergence of Multiplication in a Biophysical Model of a Wide-Field Visual Neuron for Computing Object Approaches: Dynamics, Peaks, & Fits
Many species show avoidance reactions in response to looming object approaches. In locusts, the corresponding escape behavior correlates with the activity of the lobula giant movement detector (LGMD) neuron. During an object approach, its firing rate was reported to gradually increase until a peak is reached, and then it declines quickly. The $\eta$-function predicts that the LGMD activity is a product between an exponential function of angular size $\exp(-\Theta)$ and angular velocity $\dot{\Theta}$, and that peak activity is reached before time-to-contact (ttc). The $\eta$-function has become the prevailing LGMD model because it reproduces many experimental observations, and even experimental evidence for the multiplicative operation was reported. Several inconsistencies remain unresolved, though. Here we address these issues with a new model ($\psi$-model), which explicitly connects $\Theta$ and $\dot{\Theta}$ to biophysical quantities. The $\psi$-model avoids biophysical problems associated with implementing $\exp(\cdot)$, implements the multiplicative operation of $\eta$ via divisive inhibition, and explains why activity peaks could occur after ttc. It consistently predicts response features of the LGMD, and provides excellent fits to published experimental data, with goodness of fit measures comparable to corresponding fits with the $\eta$-function.
Portmanteau Vocabularies for Multi-Cue Image Representation
Khan, Fahad S., Weijer, Joost, Bagdanov, Andrew D., Vanrell, Maria
We describe a novel technique for feature combination in the bag-of-words model of image classification. Our approach builds discriminative compound words from primitive cues learned independently from training images. Our main observation is that modeling joint-cue distributions independently is more statistically robust for typical classification problems than attempting to empirically estimate the dependent, joint-cue distribution directly. We use Information theoretic vocabulary compression to find discriminative combinations of cues and the resulting vocabulary of portmanteau words is compact, has the cue binding property, and supports individual weighting of cues in the final image representation. State-of-the-art results on both the Oxford Flower-102 and Caltech-UCSD Bird-200 datasets demonstrate the effectiveness of our technique compared to other, significantly more complex approaches to multi-cue image representation
Statistical Tests for Optimization Efficiency
Boyles, Levi, Korattikara, Anoop, Ramanan, Deva, Welling, Max
Learning problems such as logistic regression are typically formulated as pure optimization problems defined on some loss function. We argue that this view ignores the fact that the loss function depends on stochastically generated data which in turn determines an intrinsic scale of precision for statistical estimation. By considering the statistical properties of the update variables used during the optimization (e.g. gradients), we can construct frequentist hypothesis tests to determine the reliability of these updates. We utilize subsets of the data for computing updates, and use the hypothesis tests for determining when the batch-size needs to be increased. This provides computational benefits and avoids overfitting by stopping when the batch-size has become equal to size of the full dataset. Moreover, the proposed algorithms depend on a single interpretable parameter โ the probability for an update to be in the wrong direction โ which is set to a single value across all algorithms and datasets. In this paper, we illustrate these ideas on three L1 regularized coordinate algorithms: L1 -regularized L2 -loss SVMs, L1 -regularized logistic regression, and the Lasso, but we emphasize that the underlying methods are much more generally applicable.
Bayesian Partitioning of Large-Scale Distance Data
A Bayesian approach to partitioning distance matrices is presented. It is inspired by the 'Translation-Invariant Wishart-Dirichlet' process (TIWD) in (Vogt et al., 2010) and shares a number of advantageous properties like the fully probabilistic nature of the inference model, automatic selection of the number of clusters and applicability in semi-supervised settings. In addition, our method (which we call 'fastTIWD') overcomes the main shortcoming of the original TIWD, namely its high computational costs. The fastTIWD reduces the workload in each iteration of a Gibbs sampler from O(n^3) in the TIWD to O(n^2). Our experiments show that this cost reduction does not compromise the quality of the inferred partitions. With this new method it is now possible to 'mine' large relational datasets with a probabilistic model, thereby automatically detecting new and potentially interesting clusters.
Active Ranking using Pairwise Comparisons
Jamieson, Kevin G., Nowak, Robert
This paper examines the problem of ranking a collection of objects using pairwise comparisons (rankings of two objects). In general, the ranking of $n$ objects can be identified by standard sorting methods using $n\log_2 n$ pairwise comparisons. We are interested in natural situations in which relationships among the objects may allow for ranking using far fewer pairwise comparisons. {Specifically, we assume that the objects can be embedded into a $d$-dimensional Euclidean space and that the rankings reflect their relative distances from a common reference point in $\R^d$. We show that under this assumption the number of possible rankings grows like $n^{2d}$ and demonstrate an algorithm that can identify a randomly selected ranking using just slightly more than $d\log n$ adaptively selected pairwise comparisons, on average.} If instead the comparisons are chosen at random, then almost all pairwise comparisons must be made in order to identify any ranking. In addition, we propose a robust, error-tolerant algorithm that only requires that the pairwise comparisons are probably correct. Experimental studies with synthetic and real datasets support the conclusions of our theoretical analysis.
Let us first agree on what the term "semantics" means: An unorthodox approach to an age-old debate
Traditionally, semantics has been seen as a feature of human language. The advent of the information era has led to its widespread redefinition as an information feature. Contrary to this praxis, I define semantics as a special kind of information. Revitalizing the ideas of Bar-Hillel and Carnap I have recreated and re-established the notion of semantics as the notion of Semantic Information. I have proposed a new definition of information (as a description, a linguistic text, a piece of a story or a tale) and a clear segregation between two different types of information - physical and semantic information. I hope, I have clearly explained the (usually obscured and mysterious) interrelations between data and physical information as well as the relation between physical information and semantic information. Consequently, usually indefinable notions of "information", "knowledge", "memory", "learning" and "semantics" have also received their suitable illumination and explanation.