Supervised Learning
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
- Europe > Italy (0.04)
- Asia > Middle East > Jordan (0.04)
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning (0.74)
- North America > United States > Massachusetts > Suffolk County > Boston (0.04)
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
- Europe > Germany > Berlin (0.04)
- Asia > Middle East > Jordan (0.04)
- Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning (0.52)
- Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.52)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.50)
- (3 more...)
Improved Deep Metric Learning with Multi-class N-pair Loss Objective
Deep metric learning has gained much popularity in recent years, following the success of deep learning. However, existing frameworks of deep metric learning based on contrastive loss and triplet loss often suffer from slow convergence, partially because they employ only one negative example while not interacting with the other negative classes in each update. In this paper, we propose to address this problem with a new metric learning objective called multi-class N -pair loss . The proposed objective function firstly generalizes triplet loss by allowing joint comparison among more than one negative examples - more specifically, N -1 negative examples - and secondly reduces the computational burden of evaluating deep embedding vectors via an efficient batch construction strategy using only N pairs of examples, instead of ( N +1) N . We demonstrate the superiority of our proposed loss to the triplet loss as well as other competing loss functions for a variety of tasks on several visual recognition benchmark, including fine-grained object recognition and verification, image clustering and retrieval, and face verification and identification.
- North America > United States > California (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
- Asia > Middle East > Jordan (0.04)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning (0.94)
- Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)
- Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning (0.78)
- Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.78)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
- Europe > France > Auvergne-Rhône-Alpes > Isère > Grenoble (0.05)
- North America > United States > New Mexico > Los Alamos County > Los Alamos (0.04)
- North America > United States > California > Los Angeles County > Long Beach (0.04)
- (5 more...)
- Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.64)
- Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning (0.46)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)
- North America > United States > New York (0.04)
- Asia > Middle East > Israel > Southern District > Beer-Sheva (0.04)
- North America > United States > California > San Diego County > San Diego (0.04)
- (2 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.51)
- Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning > Representation Of Examples (0.42)
- North America > United States > Massachusetts (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- Europe > United Kingdom > England > Greater London > London (0.04)
- (4 more...)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.87)
- Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning (0.73)
AssayMatch: Learning to Select Data for Molecular Activity Models
Fan, Vincent, Barzilay, Regina
The performance of machine learning models in drug discovery is highly dependent on the quality and consistency of the underlying training data. Due to limitations in dataset sizes, many models are trained by aggregating bioactivity data from diverse sources, including public databases such as ChEMBL. However, this approach often introduces significant noise due to variability in experimental protocols. We introduce AssayMatch, a framework for data selection that builds smaller, more homogenous training sets attuned to the test set of interest. AssayMatch leverages data attribution methods to quantify the contribution of each training assay to model performance. These attribution scores are used to finetune language embeddings of text-based assay descriptions to capture not just semantic similarity, but also the compatibility between assays. Unlike existing data attribution methods, our approach enables data selection for a test set with unknown labels, mirroring real-world drug discovery campaigns where the activities of candidate molecules are not known in advance. At test time, embeddings finetuned with AssayMatch are used to rank all available training data. We demonstrate that models trained on data selected by AssayMatch are able to surpass the performance of the model trained on the complete dataset, highlighting its ability to effectively filter out harmful or noisy experiments. We perform experiments on two common machine learning architectures and see increased prediction capability over a strong language-only baseline for 9/12 model-target pairs. AssayMatch provides a data-driven mechanism to curate higher-quality datasets, reducing noise from incompatible experiments and improving the predictive power and data efficiency of models for drug discovery. AssayMatch is available at https://github.com/Ozymandias314/AssayMatch.
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)
- Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning (0.46)
- Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.46)
- Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.34)
- North America > United States > Indiana > Tippecanoe County > West Lafayette (0.04)
- North America > United States > Indiana > Tippecanoe County > Lafayette (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning (0.52)
- Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.51)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
Model-Agnostic Private Learning
Raef Bassily, Abhradeep Guha Thakurta, Om Dipakbhai Thakkar
We design differentially private learning algorithms that are agnostic to the learning model assuming access to a limited amount of unlabeled public data. First, we provide a new differentially private algorithm for answering a sequence of m online classification queries (given by a sequence of m unlabeled public feature vectors) based on a private training set. Our algorithm follows the paradigm of subsample-and-aggregate, in which any generic non-private learner is trained on disjoint subsets of the private training set, and then for each classification query, the votes of the resulting classifiers ensemble are aggregated in a differentially private fashion. Our private aggregation is based on a novel combination of the distance-to-instability framework [26], and the sparse-vector technique [15, 18]. We show that our algorithm makes a conservative use of the privacy budget. In particular, if the underlying non-private learner yields a classification error of at most α (0, 1), then our construction answers more queries, by at least a factor of 1/α in some cases, than what is implied by a straightforward application of the advanced composition theorem for differential privacy. Next, we apply the knowledge transfer technique to construct a private learner that outputs a classifier, which can be used to answer an unlimited number of queries. In the P AC model, we analyze our construction and prove upper bounds on the sample complexity for both the realizable and the non-realizable cases. Similar to non-private sample complexity, our bounds are completely characterized by the VC dimension of the concept class.
- North America > United States > Ohio (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > United States > California > Santa Cruz County > Santa Cruz (0.04)
- (3 more...)
- Education > Educational Setting > Online (0.69)
- Information Technology > Security & Privacy (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.67)
- Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning > Representation Of Examples (0.34)