Supervised Learning
Orometric Methods in Bounded Metric Data
Stubbemann, Maximilian, Hanika, Tom, Stumme, Gerd
A large amount of data accommodated in knowledge graphs (KG) is actually metric. For example, the Wikidata KG contains a plenitude of metric facts about geographic entities like cities, chemical compounds or celestial objects. In this paper, we propose a novel approach that transfers orometric (topographic) measures to bounded metric spaces. While these methods were originally designed to identify relevant mountain peaks on the surface of the earth, we demonstrate a notion to use them for metric data sets in general. Notably, metric sets of items inclosed in knowledge graphs. Based on this we present a method for identifying outstanding items using the transferred valuations functions 'isolation' and 'prominence'. Building up on this we imagine an item recommendation process. To demonstrate the relevance of the novel valuations for such processes we use item sets from the Wikidata knowledge graph. We then evaluate the usefulness of 'isolation' and 'prominence' empirically in a supervised machine learning setting. In particular, we find structurally relevant items in the geographic population distributions of Germany and France.
A La Carte Embedding: Cheap but Effective Induction of Semantic Feature Vectors
This paper introduces a la carte embed-ding, a simple and general alternative to the usual word2vec-based approaches for building such representations that is based upon recent theoretical results for GloVe-like embeddings. Our method relies mainly on a linear transfor-mation that is efficiently learnable using pretrained word vectors and linear regression. This transform is applicable on the fly in the future when a new text feature or rare word is encountered, even if only a single usage example is available. We introduce a new dataset showing how the a la carte method requires fewer examples of words in con-text to learn high-quality embeddings and we obtain state-of-the-art results on a nonce task and some unsupervised document classification tasks.
Encoding high-cardinality string categorical variables
Cerda, Patricio, Varoquaux, Gaël
Statistical models usually require vector representations of categorical variables, using for instance one-hot encoding. This strategy breaks down when the number of categories grows, as it creates high-dimensional feature vectors. Additionally, for string entries, one-hot encoding does not capture information in their representation.Here, we seek low-dimensional encoding of high-cardinality string categorical variables. Ideally, these should be: scalable to many categories; interpretable to end users; and facilitate statistical analysis. We introduce two encoding approaches for string categories: Gamma-Poisson matrix factorization on substring counts, and the min-hash encoder, for fast approximation of string similarities. We show that min-hash turns set inclusions into inequality relations that are easier to learn. Both approaches are scalable and streamable. Experiments on real and simulated data show that these methods improve supervised learning with high-cardinality categorical variables. We recommend the following: if scalability is central, the min-hash encoder is the best option as it does not require any data fit; if interpretability is important, the Gamma-Poisson factorization is the best alternative, as it can be interpreted as one-hot encoding on inferred categories with informative feature names. Both models enable autoML on the original string entries as they remove the need for feature engineering or data cleaning.
DeepTrax: Embedding Graphs of Financial Transactions
Bruss, C. Bayan, Khazane, Anish, Rider, Jonathan, Serpe, Richard, Gogoglou, Antonia, Hines, Keegan E.
Financial transactions can be considered edges in a heterogeneous graph between entities sending money and entities receiving money. For financial institutions, such a graph is likely large (with millions or billions of edges) while also sparsely connected. It becomes challenging to apply machine learning to such large and sparse graphs. Graph representation learning seeks to embed the nodes of a graph into a Euclidean vector space such that graph topological properties are preserved after the transformation. In this paper, we present a novel application of representation learning to bipartite graphs of credit card transactions in order to learn embeddings of account and merchant entities. Our framework is inspired by popular approaches in graph embeddings and is trained on two internal transaction datasets. This approach yields highly effective embeddings, as quantified by link prediction AUC and F1 score. Further, the resulting entity vectors retain intuitive semantic similarity that is explored through visualizations and other qualitative analyses. Finally, we show how these embeddings can be used as features in downstream machine learning business applications such as fraud detection.
Cloud TPU Pods break AI training records Google Cloud Blog
Google Cloud's AI-optimized infrastructure makes it possible for businesses to train state-of-the-art machine learning models faster, at greater scale, and at lower cost. These advantages enabled Google Cloud Platform (GCP) to set three new performance records in the latest round of the MLPerf benchmark competition, the industry-wide standard for measuring ML performance. All three record-setting results ran on Cloud TPU v3 Pods, the latest generation of supercomputers that Google has built specifically for machine learning. These results showcased the speed of Cloud TPU Pods-- with each of the winning runs using less than two minutes of compute time. With these latest MLPerf benchmark results, Google Cloud is the first public cloud provider to outperform on-premise systems when running large-scale, industry-standard ML training workloads of Transformer, Single Shot Detector (SSD), and ResNet-50.
Learning Fair Representations for Kernel Models
Tan, Zilong, Yeom, Samuel, Fredrikson, Matt, Talwalkar, Ameet
Fairness has emerged as a key issue in machine learning as it is increasingly used in areas like hiring [Dastin, 2018], healthcare[Gupta and Mohammad, 2017], and criminal justice [Equivant, 2019]. In particular, models' predictions should not lead to decisions that discriminate on the basis of a legally protected attribute, such as race or gender. Among the proposals to address this issue, a growing body of work focuses on learning et al., 2017, del Barrio et al., 2018, Feldmanfair representations of data for downstream modeling [Calmon 2015, Johndrow and Lum, 2019, Kamiran and Calders, 2012]. Most of these approaches are modelet al., agnostic, which provides flexibility when working with the learned representations, but comes at the cost of potentially suboptimal results in terms of both fairness and accuracy. In this work, we present a new approach for fair representation learning that takes into account the target hypothesis class of models that will be learned from the representation. Specifically, we show how to leverage information about the reproducing kernel Hilbert space (RKHS) to learn a fair representation for kernel-based models with provable fairness and accuracy guarantees. Our approach builds on the classic Sufficient Dimension Reduction (SDR) framework [Li, 1991, Cook 1991, Cook, 1998, Fukumizu et al., 2004, 2009, Wu et al., 2009, Cook and Forzani, 2009]and Weisberg, which is used to compute a low-dimensional projection of the feature vector X that captures all information related to the response Y. Our key insight is that we can instead perform SDR with respect to the protected attributes S, and then take the orthogonal complement of the resulting projection to obtain a fair subspace of the RKHS that captures information in X unrelated to S. We show that functions in the fair subspace 2.2), and we leverage this fact to prove that our approachwill be independent of S under mild conditions (§
Universal Bayes consistency in metric spaces
Hanneke, Steve, Kontorovich, Aryeh, Sabato, Sivan, Weiss, Roi
We show that a recently proposed 1-nearest-neighbor-based multiclass learning algorithm is universally strongly Bayes consistent in all metric spaces where such Bayes consistency is possible, making it an optimistically universal Bayes-consistent learner. This is the first learning algorithm known to enjoy this property; by comparison, $k$-NN and its variants are not generally universally Bayes consistent, except under additional structural assumptions, such as an inner product, a norm, finite doubling dimension, or a Besicovitch-type property. The metric spaces in which universal Bayes consistency is possible are the essentially separable ones --- a new notion that we define, which is more general than standard separability. The existence of metric spaces that are not essentially separable is independent of the ZFC axioms of set theory. We prove that essential separability exactly characterizes the existence of a universal Bayes-consistent learner for the given metric space. In particular, this yields the first impossibility result for universal Bayes consistency. Taken together, these positive and negative results resolve the open problems posed in Kontorovich, Sabato, Weiss (2017).
Awareness of Voter Passion Greatly Improves the Distortion of Metric Social Choice
Abramowitz, Ben, Anshelevich, Elliot, Zhu, Wennan
We develop new voting mechanisms for the case when voters and candidates are located in an arbitrary unknown metric space, and the goal is to choose a candidate minimizing social cost: the total distance from the voters to this candidate. Previous work has often assumed that only ordinal preferences of the voters are known (instead of their true costs), and focused on minimizing distortion: the quality of the chosen candidate as compared with the best possible candidate. In this paper, we instead assume that a (very small) amount of information is known about the voter preference strengths, not just about their ordinal preferences. We provide mechanisms with much better distortion when this extra information is known as compared to mechanisms which use only ordinal information. We quantify tradeoffs between the amount of information known about preference strengths and the achievable distortion. We further provide advice about which type of information about preference strengths seems to be the most useful. Finally, we conclude by quantifying the ideal candidate distortion, which compares the quality of the chosen outcome with the best possible candidate that could ever exist, instead of only the best candidate that is actually in the running.
Collaborative Metric Learning with Memory Network for Multi-Relational Recommender Systems
Zhou, Xiao, Liu, Danyang, Lian, Jianxun, Xie, Xing
The success of recommender systems in modern online platforms is inseparable from the accurate capture of users' personal tastes. In everyday life, large amounts of user feedback data are created along with user-item online interactions in a variety of ways, such as browsing, purchasing, and sharing. These multiple types of user feedback provide us with tremendous opportunities to detect individuals' fine-grained preferences. Different from most existing recommender systems that rely on a single type of feedback, we advocate incorporating multiple types of user-item interactions for better recommendations. Based on the observation that the underlying spectrum of user preferences is reflected in various types of interactions with items and can be uncovered by latent relational learning in metric space, we propose a unified neural learning framework, named Multi-Relational Memory Network (MRMN). It can not only model fine-grained user-item relations but also enable us to discriminate between feedback types in terms of the strength and diversity of user preferences. Extensive experiments show that the proposed MRMN model outperforms competitive state-of-the-art algorithms in a wide range of scenarios, including e-commerce, local services, and job recommendations.
Identification of Tasks, Datasets, Evaluation Metrics, and Numeric Scores for Scientific Leaderboards Construction
Hou, Yufang, Jochim, Charles, Gleize, Martin, Bonin, Francesca, Ganguly, Debasis
While the fast-paced inception of novel tasks and new datasets helps foster active research in a community towards interesting directions, keeping track of the abundance of research activity in different areas on different datasets is likely to become increasingly difficult. The community could greatly benefit from an automatic system able to summarize scientific results, e.g., in the form of a leaderboard. In this paper we build two datasets and develop a framework (TDMS-IE) aimed at automatically extracting task, dataset, metric and score from NLP papers, towards the automatic construction of leaderboards. Experiments show that our model outperforms several baselines by a large margin. Our model is a first step towards automatic leaderboard construction, e.g., in the NLP domain.