AAAI Conferences

In the U.S., individuals give more than 200 billion dollars to over 50 thousand charities each year, yet how people make these choices is not well understood. In this study, we use data from CharityNavigator.org and web browsing data from Bing toolbar to understand charitable giving choices. Our main goal is to use data on charities' overhead expenses to better understand efficiency in the charity marketplace. A preliminary analysis indicates that the average donor is "wasting" more than 15% of their contribution by opting for poorly run organizations as opposed to higher rated charities in the same Charity Navigator categorical group. However, charities within these groups may not represent good substitutes for each other.

Large Scale Local Online Similarity/Distance Learning Framework based on Passive/Aggressive

arXiv.org Machine Learning

Similarity/Distance measures play a key role in many machine learning, pattern recognition, and data mining algorithms, which leads to the emergence of metric learning field. Many metric learning algorithms learn a global distance function from data that satisfy the constraints of the problem. However, in many real-world datasets that the discrimination power of features varies in the different regions of input space, a global metric is often unable to capture the complexity of the task. To address this challenge, local metric learning methods are proposed that learn multiple metrics across the different regions of input space. Some advantages of these methods are high flexibility and the ability to learn a nonlinear mapping but typically achieves at the expense of higher time requirement and overfitting problem. To overcome these challenges, this research presents an online multiple metric learning framework. Each metric in the proposed framework is composed of a global and a local component learned simultaneously. Adding a global component to a local metric efficiently reduce the problem of overfitting. The proposed framework is also scalable with both sample size and the dimension of input data. To the best of our knowledge, this is the first local online similarity/distance learning framework based on PA (Passive/Aggressive). In addition, for scalability with the dimension of input data, DRP (Dual Random Projection) is extended for local online learning in the present work. It enables our methods to be run efficiently on high-dimensional datasets, while maintains their predictive performance. The proposed framework provides a straightforward local extension to any global online similarity/distance learning algorithm based on PA.

An Online Algorithm for Large Scale Image Similarity Learning

Neural Information Processing Systems

Learning a measure of similarity between pairs of objects is a fundamental problem in machine learning. It stands in the core of classification methods like kernel machines, and is particularly useful for applications like searching for images that are similar to a given image or finding videos that are relevant to a given video. In these tasks, users look for objects that are not only visually similar but also semantically related to a given object. Unfortunately, current approaches for learning similarity may not scale to large datasets with high dimensionality, especially when imposing metric constraints on the learned similarity. We describe OASIS, a method for learning pairwise similarity that is fast and scales linearly with the number of objects and the number of non-zero features.

Supervised Online Hashing via Similarity Distribution Learning

arXiv.org Artificial Intelligence

Hashing based visual search has attracted extensive research Online hashing has attracted extensive research attention attention in recent years due to the rapid growth of when facing streaming data. Most online hashing visual data on the Internet [7, 33, 8, 26, 12, 13, 30, 32, 25, methods, learning binary codes based on pairwise similarities 35, 27]. In various scenarios, online hashing has become of training instances, fail to capture the semantic relationship, a hot topic due to the emergence of handling the streaming and suffer from a poor generalization in largescale data, which aims to resolve an online retrieval task by applications due to large variations. In this paper, we updating the hash functions from sequentially arriving data propose to model the similarity distributions between the input instances. On one hand, online hashing takes advantages data and the hashing codes, upon which a novel supervised of traditional offline hashing methods, i.e., low storage cost online hashing method, dubbed as Similarity Distribution and efficiency of pairwise distance computation in the Hamming based Online Hashing (SDOH), is proposed, to keep space. On the other hand, it also merits in training the intrinsic semantic relationship in the produced Hamming efficiency and scalability for large-scale applications, since space. Specifically, we first transform the discrete the hash functions are updated instantly and solely based on similarity matrix into a probability matrix via a Gaussianbased the current streaming data, which is superior to traditional normalization to address the extremely imbalanced hashing methods based on a hashing model entirely trained distribution issue. And then, we introduce a scaling Student from scratch.

Learning Relative Similarity by Stochastic Dual Coordinate Ascent

AAAI Conferences

Learning relative similarity from pairwise instances is an important problem in machine learning and has a wide range of applications. Despite being studied for years, some existing methods solved by Stochastic Gradient Descent (SGD) techniques generally suffer from slow convergence. In this paper, we investigate the application of Stochastic Dual Coordinate Ascent (SDCA) technique to tackle the optimization task of relative similarity learning by extending from vector to matrix parameters. Theoretically, we prove the optimal linear convergence rate for the proposed SDCA algorithm, beating the well-known sublinear convergence rate by the previous best metric learning algorithms. Empirically, we conduct extensive experiments on both standard and large-scale data sets to validate the effectiveness of the proposed algorithm for retrieval tasks.