Goto

Collaborating Authors

 Country


Nystrom Approximation for Sparse Kernel Methods: Theoretical Analysis and Empirical Evaluation

AAAI Conferences

While if kernels are not Kernel methods (Schรถlkopf and Smola 2002; Xu et al. 2009) low rank, Nystrรถm approximations can usually lead to suboptimal have received a lot of attention in recent studies of machine performances. To alleviate the strong assumption in learning. These methods project data into high-dimensional the seeking of the approximation bounds, we take a more or even infinite-dimensional spaces via kernel mapping general assumption that the design matrix K ensuring the restricted functions. Despite the strong generalization ability induced isometric property (Koltchinskii 2011). In particular, by kernel methods, they usually suffer from the high computation the new assumption obeys the restricted eigenvalue condition complexity of calculating the kernel matrix (also (Koltchinskii 2011; Bickel, Ritov, and Tsybakov 2009), called Gram matrix). Although low-rank decomposition which has been shown to be more general than several techniques(e.g., Cholesky Decomposition (Fine and Scheinberg other similar assumptions used in sparsity literature (Candes 2002; Bach and Jordan 2005)), and truncating methods(e.g., and Tao 2007; Donoho, Elad, and Temlyakov 2006; Kernel Tapering (Shen, Xu, and Allebach 2014; Zhang and Huang 2008). Based on the restricted eigenvalue Furrer, Genton, and Nychka 2006)) can accelerate the calculation condition, we have provided error bounds for kernel approximation of the kernel matrix, they still need to compute the and recovery rate in sparse kernel regression.


Learning Word Representations from Relational Graphs

AAAI Conferences

If we already know a particular concept representations by considering the semantic relations between such as pets, we can describe a new concept such as dogs words. Specifically, given as input a relational graph, by stating the semantic relations that the new concept shares a directed labelled weighted graph where vertices represent with the existing concepts such as dogs belongs-to pets. Alternatively, words and edges represent numerous semantic relations we could describe a novel concept by listing all that exist between the corresponding words, we consider the the attributes it shares with existing concepts. In our example, problem of learning a vector representation for each vertex we can describe the concept dog by listing attributes (word) in the graph and a matrix representation for each label such as mammal, carnivorous, and domestic animal that it type (pattern). The learnt word representations are evaluated shares with another concept such as the cat. Therefore, both for their accuracy by using them to solve semantic word attributes and relations can be considered as alternative descriptors analogy questions on a benchmark dataset. of the same knowledge. This close connection between Our task of learning word attributes using relations between attributes and relations can be seen in knowledge representation words is challenging because of several reasons. First, schemes such as predicate logic, where attributes there can be multiple semantic relations between two words.


Inertial Hidden Markov Models: Modeling Change in Multivariate Time Series

AAAI Conferences

Faced with the problem of characterizing systematic changes in multivariate time series in an unsupervised manner, we derive and test two methods of regularizing hidden Markov models for this task. Regularization on state transitions provides smooth transitioning among states, such that the sequences are split into broad, contiguous segments. Our methods are compared with a recent hierarchical Dirichlet process hidden Markov model (HDP-HMM) and a baseline standard hidden Markov model, of which the former suffers from poor performance on moderate-dimensional data and sensitivity to parameter settings, while the latter suffers from rapid state transitioning, over-segmentation and poor performance on a segmentation task involving human activity accelerometer data from the UCI Repository. The regularized methods developed here are able to perfectly characterize change of behavior in the human activity data for roughly half of the real-data test cases, with accuracy of 94% and low variation of information. In contrast to the HDP-HMM, our methods provide simple, drop-in replacements for standard hidden Markov model update rules, allowing standard expectation maximization (EM) algorithms to be used for learning.


Active Manifold Learning via Gershgorin Circle Guided Sample Selection

AAAI Conferences

In this paper, we propose an interpretation of active learning from a pure algebraic view and combine it with semi-supervised manifold learning. The proposed active manifold learning algorithm aims to learn the low-dimensional parameter space of the manifold with high accuracy from smartly labeled samples. We demonstrate that this problem is equivalent to a condition number minimization problem of the alignment matrix. Focusing on this problem, we first give a theoretical upper bound for the solution. Then we develop a heuristic but effective sample selection algorithm with the help of the Gershgorin circle theorem. We investigate the rationality, the feasibility, the universality and the complexity of the proposed method and demonstrate that our method yields encouraging active learning results.


Phrase Type Sensitive Tensor Indexing Model for Semantic Composition

AAAI Conferences

Compositional semantic aims at constructing the meaning of phrases or sentences according to the compositionality of word meanings. In this paper, we propose to synchronously learn the representations of individual words and extracted high-frequency phrases. Representations of extracted phrases are considered as gold standard for constructing more general operations to compose the representation of unseen phrases. We propose a grammatical type specific model that improves the composition flexibility by adopting vector-tensor-vector operations. Our model embodies the compositional characteristics of traditional additive and multiplicative model. Empirical result shows that our model outperforms state-of-the-art composition methods in the task of computing phrase similarities.


Surpassing Human-Level Face Verification Performance on LFW with GaussianFace

AAAI Conferences

Face verification remains a challenging problem in very complex conditions with large variations such as pose, illumination, expression, and occlusions. This problemis exacerbated when we rely unrealistically on a singletraining data source, which is often insufficient to coverthe intrinsically complex face variations. This paperproposes a principled multi-task learning approachbased on Discriminative Gaussian Process Latent VariableModel (DGPLVM), named GaussianFace, for faceverification. In contrast to relying unrealistically on asingle training data source, our model exploits additional data from multiple source-domains to improve the generalization performance of face verification inan unknown target-domain. Importantly, our model can adapt automatically to complex data distributions, and therefore can well capture complex face variations inherent in multiple sources. To enhance discriminative power, we introduced a more efficient equivalent form of Kernel Fisher Discriminant Analysis to DGPLVM.To speed up the process of inference and prediction, we exploited the low rank approximation method. Extensive experiments demonstrated the effectiveness of the proposed model in learning from diverse data sources and generalizing to unseen domains. Specifically, the accuracy of our algorithm achieved an impressive accuracyrate of 98.52% on the well-known and challenging Labeled Faces in the Wild (LFW) benchmark. For the first time, the human-level performance in face verification (97.53%) on LFW is surpassed.


Optimizing Bag Features for Multiple-Instance Retrieval

AAAI Conferences

Multiple-Instance (MI) learning is an important supervised learning technique which deals with collections of instances called bags. While existing research in MI learning mainly focused on classification, in this paper we propose a new approach for MI retrieval to enable effective similarity retrieval of bags of instances, where training data is presented in the form of similar and dissimilar bag pairs. An embedded scheme is devised as encoding each bag into a single bag feature vector by exploiting a similarity-based transformation. In this way, the original MI problem is converted into a single-instance version. Furthermore, we develop a principled approach for optimizing bag features specific to similarity retrieval through leveraging pairwise label information at the bag level. The experimental results demonstrate the effectiveness of the proposed approach in comparison with the alternatives for MI retrieval.


Using Matched Samples to Estimate the Effects of Exercise on Mental Health via Twitter

AAAI Conferences

Recent work has demonstrated the value of social media monitoring for health surveillance (e.g., tracking influenza or depression rates). It is an open question whether such data can be used to make causal inferences (e.g., determining which activities lead to increased depression rates). Even in traditional, restricted domains, estimating causal effects from observational data is highly susceptible to confounding bias. In this work, we estimate the effect of exercise on mental health from Twitter, relying on statistical matching methods to reduce confounding bias. We train a text classifier to estimate the volume of a user's tweets expressing anxiety, depression, or anger, then compare two groups: those who exercise regularly (identified by their use of physical activity trackers like Nike+), and a matched control group. We find that those who exercise regularly have significantly fewer tweets expressing depression or anxiety; there is no significant difference in rates of tweets expressing anger. We additionally perform a sensitivity analysis to investigate how the many experimental design choices in such a study impact the final conclusions, including the quality of the classifier and the construction of the control group.


Learning to Manipulate Unknown Objects in Clutter by Reinforcement

AAAI Conferences

We present a fully autonomous robotic system for grasping objects in dense clutter. The objects are unknown and have arbitrary shapes. Therefore, we cannot rely on prior models. Instead, the robot learns online, from scratch, to manipulate the objects by trial and error. Grasping objects in clutter is significantly harder than grasping isolated objects, because the robot needs to push and move objects around in order to create sufficient space for the fingers. These pre-grasping actions do not have an immediate utility, and may result in unnecessary delays. The utility of a pre-grasping action can be measured only by looking at the complete chain of consecutive actions and effects. This is a sequential decision-making problem that can be cast in the reinforcement learning framework. We solve this problem by learning the stochastic transitions between the observed states, using nonparametric density estimation. The learned transition function is used only for re-calculating the values of the executed actions in the observed states, with different policies. Values of new state-actions are obtained by regressing the values of the executed actions. The state of the system at a given time is a depth (3D) image of the scene. We use spectral clustering for detecting the different objects in the image. The performance of our system is assessed on a robot with real-world objects.


Optimal Personalized Filtering Against Spear-Phishing Attacks

AAAI Conferences

To penetrate sensitive computer networks, attackers can use spear phishing to sidestep technical security mechanisms by exploiting the privileges of careless users. In order to maximize their success probability, attackers have to target the users that constitute the weakest links of the system. The optimal selection of these target users takes into account both the damage that can be caused by a user and the probability of a malicious e-mail being delivered to and opened by a user. Since attackers select their targets in a strategic way, the optimal mitigation of these attacks requires the defender to also personalize the e-mail filters by taking into account the users' properties. In this paper, we assume that a learned classifier is given and propose strategic per-user filtering thresholds for mitigating spear-phishing attacks. We formulate the problem of filtering targeted and non-targeted malicious e-mails as a Stackelberg security game. We characterize the optimal filtering strategies and show how to compute them in practice. Finally, we evaluate our results using two real-world datasets and demonstrate that the proposed thresholds lead to lower losses than non-strategic thresholds.